Korean, Edit

Linux 编程(bash 编程)

推荐帖子:【操作系统】Linux


1. 语法

2. 生物信息学示例



1.语法

⑴ 变量

① 概述:对于表示长字符串或循环之类的 bash 编程很有用

② 变量赋值:使用$符号


受保护_0


③ 为了减少歧义,可以使用花括号


受保护_1


⑵ for 循环


受保护_2



2.生物信息学示例

受保护_4

解释:统计行数

② 在 rm、cp 和 mv 中使用通配符 (wc) 时要小心

受保护_5

①^(插入符号)的作用:^在正则表达式中表示行的开头。 ^# 表示“以 # 开头的行”

② -v 的作用:-v 表示“反向匹配”。如果没有 -v,则仅打印以 # 开头的行

的作用(竖线):将上一个命令的输出作为下一个命令的输入传递(wc -l)

grep -v ^# gencode.v19.long_noncoding_RNAs.gtf:搜索除#开头的所有行

wc -l:输出上一个命令传递过来的行数

解释:输出除去#开头的行数

受保护_8

解释:输出包含独立单词“gene”的行数

② 如果没有 -w,grepgene 会搜索所有包含“gene”作为另一个单词的一部分的行:gene、Genetics、progene 等。

⑶ 对于 seq 1 3 中的 i;做 mkdir 样本${i};完成

① 反引号的作用 (`): Executes the command inside the backticks and returns the output

grep "^chr6" gencode.v19.long_noncoding_RNAs.gtf | perl -lane 'if ($F[3]>=28000000 & ($F[4]<=34000000)){print $_;}' | grep -w "gene"

① -l: Removes the newline character at the end of each line

② -a: Autosplit mode – splits each input line by whitespace and stores it in the @F array

○ The first, second, … elements can be accessed as $F[0], $F[1], …

○ If the file is tab-separated, the command can be written as perl -F'\t' -lane 'print $F[1]' input.csv

③ -n: Represents each line as $_

④ -e: Allows the perl program to be written on the command line

grep "^chr6" gencode.v19.long_noncoding_RNAs.gtf: Search for all lines starting with “chr6”

perl -lane 'if ($F[3]>=28000000 & ($F[4]<=34000000)){print $_;}': Implement Perl programming on the command line and search for lines where the third value (e.g., start) is ≥ 28M and the fourth value (e.g., end) is ≤ 34M

grep -w "gene": Search for “gene” in the lines passed from the previous command

Interpretation: Search for genes between 28M and 34M on chr6

grep -w "gene" gencode.v19.long_noncoding_RNAs.gtf | perl -lane 'm/gene_name \"(.*?)\"/; print $1' | less -S

m/ gene_name "(.*)"/: Pattern matching

② m/gene_name *(.*)* /: Handles quotation marks

③ m/gene_name "(.* ) "/: Captured group can be referenced as $1

○ .: Matches any character except a newline

○ *: Matches the preceding character or pattern zero or more times

○ a*: Matches empty string (“”), a, aa, aaa, etc.

○ .*: Matches any string (zero or more characters)

○ “(.*?)”: Non-greedy match – captures the shortest possible match inside the quotes; without ?, it would match until the end of the line

j='echo $i | sed 's/.bam//g"

① sed: Represents regular expressions within ‘ ‘

s/.bam//: Substitution – s/pattern/replacement/

③ s/.bam //: Since the replacement string is empty, .bam is deleted

④ g: If .bam appears multiple times in a line, g removes all occurrences. Without g, only the first occurrence is removed

samtools view -h $i | perl -lane 'if (($F[0]=~/^@/ | ($F[6] eq"=")) {print $_;}' | samtools view -S -b - > $(j}.new.bam

① samtools

○ -h: Include SAM header

○ -S: Indicates that input is in SAM format (not required in newer samtools versions)

○ -b: Output in BAM format

② $F[0]=~ /^@/: If $F[0] starts with @ (header line)

③ $F[6] is “=”: If aligned to itself

Interpretation: Remove reads mapped to different chromosomes in paired mapping

plink --bfile input_prefix --geno X --mind Y --hwe p --make-bed --out output_prefix

① bfile: Use the prefix of a binary PLINK format file as input

② geno: Filter markers with a missing rate exceeding X%

③ mind: Filter individuals with a missing rate exceeding Y%

④ hwe: Filter markers that fail the Hardy-Weinberg equilibrium

⑤ make-bed: Output in binary PLINK format

plink --bfile input_prefix --record vcf-iid bgz --threads 3 --out output_prefix

① Convert to VCF format: Combine the .bed file and .bim file.

② Existing .bim files follow the order of minor/major alleles, but VCF files follow the order of major/minor alleles.

plink2 --vcf input.vcf.gz 剂量=DS --maf 0.05 --extract-if-info "R2>=0.70" --export vcf-4.2 vcf-剂量=DS-force --out 输出.r7.剂量

① 用于提取正确估算标记的 PLINK 命令

② 用量:默认不导入用量信息。我们可以指定要导入的字段作为剂量。

③ vcf-dosage:导出输出文件中的剂量。

④ extract-if-info:删除 INFO 键上未通过此比较的所有标记。


输入:2024.01.16 00:28

results matching ""

    No results matching ""