Linux 编程（bash 编程）

推荐帖子：【操作系统】Linux

1. 语法

2. 生物信息学示例

1.语法

⑴ 变量

① 概述：对于表示长字符串或循环之类的 bash 编程很有用

② 变量赋值：使用$符号

受保护_0

③ 为了减少歧义，可以使用花括号

受保护_1

⑵ for 循环

受保护_2

2.生物信息学示例

⑴ 受保护_4

① 解释：统计行数

② 在 rm、cp 和 mv 中使用通配符 (wc) 时要小心

⑵ 受保护_5

①^（插入符号）的作用：^在正则表达式中表示行的开头。 ^# 表示“以 # 开头的行”

② -v 的作用：-v 表示“反向匹配”。如果没有 -v，则仅打印以 # 开头的行

③ 的作用（竖线）：将上一个命令的输出作为下一个命令的输入传递（wc -l）

④ grep -v ^# gencode.v19.long_noncoding_RNAs.gtf：搜索除#开头的所有行

⑤ wc -l：输出上一个命令传递过来的行数

⑥ 解释：输出除去#开头的行数

⑶受保护_8

① 解释：输出包含独立单词“gene”的行数

② 如果没有 -w，grepgene 会搜索所有包含“gene”作为另一个单词的一部分的行：gene、Genetics、progene 等。

⑶ 对于 seq 1 3 中的 i；做 mkdir 样本${i};完成

① 反引号的作用 (`): Executes the command inside the backticks and returns the output

⑷ grep "^chr6" gencode.v19.long_noncoding_RNAs.gtf | perl -lane 'if ($F[3]>=28000000 & ($F[4]<=34000000)){print $_;}' | grep -w "gene"

① -l: Removes the newline character at the end of each line

② -a: Autosplit mode – splits each input line by whitespace and stores it in the @F array

○ The first, second, … elements can be accessed as $F[0], $F[1], …

○ If the file is tab-separated, the command can be written as perl -F'\t' -lane 'print $F[1]' input.csv

③ -n: Represents each line as $_

④ -e: Allows the perl program to be written on the command line

⑤ grep "^chr6" gencode.v19.long_noncoding_RNAs.gtf: Search for all lines starting with “chr6”

⑥ perl -lane 'if ($F[3]>=28000000 & ($F[4]<=34000000)){print $_;}': Implement Perl programming on the command line and search for lines where the third value (e.g., start) is ≥ 28M and the fourth value (e.g., end) is ≤ 34M

⑦ grep -w "gene": Search for “gene” in the lines passed from the previous command

⑧ Interpretation: Search for genes between 28M and 34M on chr6

⑸ grep -w "gene" gencode.v19.long_noncoding_RNAs.gtf | perl -lane 'm/gene_name \"(.*?)\"/; print $1' | less -S

① m/ gene_name "(.*)"/: Pattern matching

② m/gene_name *“(.*)*“ /: Handles quotation marks

③ m/gene_name "(.* ) "/: Captured group can be referenced as $1

○ .: Matches any character except a newline

○ *: Matches the preceding character or pattern zero or more times

○ a*: Matches empty string (“”), a, aa, aaa, etc.

○ .*: Matches any string (zero or more characters)

○ “(.*?)”: Non-greedy match – captures the shortest possible match inside the quotes; without ?, it would match until the end of the line

⑹ j='echo $i | sed 's/.bam//g"

① sed: Represents regular expressions within ‘ ‘

② s/.bam//: Substitution – s/pattern/replacement/

③ s/.bam //: Since the replacement string is empty, .bam is deleted

④ g: If .bam appears multiple times in a line, g removes all occurrences. Without g, only the first occurrence is removed

⑺ samtools view -h $i | perl -lane 'if (($F[0]=~/^@/ | ($F[6] eq"=")) {print $_;}' | samtools view -S -b - > $(j}.new.bam

① samtools

○ -h: Include SAM header

○ -S: Indicates that input is in SAM format (not required in newer samtools versions)

○ -b: Output in BAM format

② $F[0]=~ /^@/: If $F[0] starts with @ (header line)

③ $F[6] is “=”: If aligned to itself

④ Interpretation: Remove reads mapped to different chromosomes in paired mapping

⑻ plink --bfile input_prefix --geno X --mind Y --hwe p --make-bed --out output_prefix

① bfile: Use the prefix of a binary PLINK format file as input

② geno: Filter markers with a missing rate exceeding X%

③ mind: Filter individuals with a missing rate exceeding Y%

④ hwe: Filter markers that fail the Hardy-Weinberg equilibrium

⑤ make-bed: Output in binary PLINK format

⑼ plink --bfile input_prefix --record vcf-iid bgz --threads 3 --out output_prefix

① Convert to VCF format: Combine the .bed file and .bim file.

② Existing .bim files follow the order of minor/major alleles, but VCF files follow the order of major/minor alleles.

⑽ plink2 --vcf input.vcf.gz 剂量=DS --maf 0.05 --extract-if-info "R2>=0.70" --export vcf-4.2 vcf-剂量=DS-force --out 输出.r7.剂量

① 用于提取正确估算标记的 PLINK 命令

② 用量：默认不导入用量信息。我们可以指定要导入的字段作为剂量。

③ vcf-dosage：导出输出文件中的剂量。

④ extract-if-info：删除 INFO 键上未通过此比较的所有标记。

输入：2024.01.16 00:28

7059

Linux 编程（bash 编程）

1.语法

2.生物信息学示例

results matching ""

No results matching ""