Korean, Edit

Chapter 11. Bioinformatics

Recommended Article: 【Biology】 Biology Table of Contents


1. Overview

2. Comparative Genomics

3. Functional Genomics

4. Epigenetics

5. Metagenomics

6. Transcriptomics

7. Proteomics

8. Metabolomics

9. Pharmacogenomics

10. Phenomics

11. Radiomics


a. Bioinformatics Analysis Table of Contents

b. Transcriptome Analysis Pipeline

c. Cell Type Marker Genes

d. Determining Cell Types with Seurat



1. Overview

⑴ Cancer types > 102

Cancer patients per year ~ 2 × 106

⑶ Transcription factor ~ 1600

⑷ Driver mutations ~ 105

⑸ Variant combinations ~ 100000C6

⑹ Cell types and states ~ 104

⑺ Gene combinations ~ 1013

⑻ Antibody sequences ~ 2032

⑼ Small molecules ~ 1060



2. Comparative Genomics

⑴ Features of the Human Genome

① The human genome is composed of 3.1 billion base pairs.

② Less than 1/3 is transcribed into RNA, and only about 5% encode proteins.

③ Genes encoding proteins amount to around 20,000 to 25,000: Similar to other mammals.

④ Genes are on average about 3,000 bases long.

⑤ All humans are at least 99.9% identical.

⑥ The human genome contains a significant amount of repetitive sequences.

⑦ Less than 7% of protein-coding genes are specific to vertebrates.

Prokaryotic Genes vs Eukaryotic Genes

Polycistronic mRNA vs Monocistronic mRNA (number of proteins encoded by one mRNA)

Intron (×) vs Intron (O)

Simultaneity of transcription and translation (O) vs Simultaneity of transcription and translation (×)

mRNA processing (×) vs mRNA processing (O)

⑶ Comparison of Genome Size and Gene Number in Various Organisms


Type of Organism Subcategory Species Genome Size (Mb: 106) Number of Protein-Coding Genes Protein Coding Sequence (%)
Prokaryotes   Mycoplasma 0.58 470 88
    E. coli 4.64 4,300 88
    Bacillus subtilis 4.20    
Eukaryotes Fungi Yeast 12.6 6,200 70
    Aspergillus 25.4    
  Protozoa Tetrahymena 190    
  Invertebrates C. elegans 100 21,000 25
    Drosophila 180 15,000 13
    Silkworm 490    
    Sea Urchin 845    
  Vertebrates Pufferfish 400    
    Human 3,000 ~23,500 1.5
    Mouse 3,300    
  Plants Arabidopsis 125 26,000 25
    Rice 440 35,000 ~ 50,000 10
    Pea 4,800    
    Corn 5,000    
    Wheat 17,000    

Table 1. Genome Sizes of Various Organisms


① Minimum gene count for life maintenance: Among 470 genes in M. genitalium, 337 are essential.

② Weak correlation between genome size and organism complexity.

③ Plant genomes are large due to frequent polyploidization.

⑷ Comparison of Single-Celled Prokaryotic and Eukaryotic Genomes


  E. coli Yeast
Genome Size (Base Pairs) 4,640,000 12,068,000
Number of Protein-Coding Genes 4,300 6,200
Metabolism 650 650
Energy Production and Storage 240 175
Membrane Transporters 280 250
DNA Replication, Repair, and Recombination 120 175
Transcription 230 400
Translation 180 350
Protein Delivery and Secretion 35 430
Cell Structure 180 250

Table 2. Comparison between E. coli and Yeast


⑸ Essential genes required for multicellular organism characteristics (e.g., Caenorhabditis elegans).


Function Protein Domain Genes
Transcription Regulation Zinc Finger; Homeobox 540
RNA Processing RNA Binding Domain 100
Action Potential Transmission Gated Ion Channel 80
Tissue Formation Collagen 170
Cell Interaction Extracellular Domain; Glycosyltransferase 330
Cell-Cell Signaling G Protein-Coupled Receptor, Protein Kinase, Protein Phosphatase 1,290

Table 3. Caenorhabditis elegans (C. elegans)


⑹ Comparison of human and mouse genomes

① Humans and mice have approximately a 50% difference in their nucleotide sequences and diverged around 75 million years ago.

② There is no significant difference in the genome size or the number of genes they possess; only the distribution of transposons, a type of repetitive sequence element, differs.

③ Genome composition: Approximately 180 fragmentation and recombination events have occurred, with over 90% of the genome moving as blocks (conserved synteny).

⑺ Comparison between human and chimpanzee

① The difference in genes between humans and chimpanzees is only 1.23%.

⑻ Comparison of mitochondria and chloroplasts

① Mitochondrial genomics: 16,569 bp. 37 genes.

○ Many mitochondrial proteins are derived from the nucleus.

○ Example: β-oxidation and TCA cycle enzymes are transported from the cytoplasm.

○ Some proteins are transcribed and translated from mitochondrial DNA.

○ Example: Electron transport chain proteins and ATP synthase are synthesized independently.

○ Termination codon: CAG

② Chloroplast genomics

○ Enzymes for Calvin cycle are synthesized independently.

○ The large subunit of Rubisco is produced in the chloroplast, while the small subunit is produced in the cytoplasm.

○ Not only β-oxidation and TCA cycle enzymes but also electron transport chain proteins and ATP synthase are transported from the cytoplasm.

③ Chloroplast genome is much larger than mitochondrial genome.

○ Mitochondria: Repetitive sequences, no introns.

○ Chloroplast: Repetitive sequences, many introns.

○ Most mitochondrial genes have moved to the nucleus.



3. Functional Genomics

⑴ Overview

① Definition: The study of all functions of DNA, including introns and regulatory elements.

② Utilizes sequencing technologies like WGS, WES, GWAS, Chip-seq.

⑵ Movement of Genetic Material

Virus

Bacterial recombination

Mobile DNA: Transposons, Retrotransposons, LINE, SINE

⑶ Intermediate-frequency Repeat Sequences

① VNTR (Variable Number Tandem Repeats, relatively long), STR (Short Tandem Repeats, relatively short), Telomeres.

② Genetic anticipation: As generations increase, repetitive sequences expand, leading to a higher likelihood of disease occurrence (e.g., Huntington’s disease).

⑷ High-frequency Repeat Sequences

① Highly condensed.

② Centromere, satellite.

⑸ Satellite DNA

① A-T rich repetitive DNA.

② Low buoyant density.

⑹ Multigene Families

① Homologous gene families (e.g., rRNA)

② Paralogous gene families (e.g., hemoglobin)

⑺ Single Nucleotide Polymorphism (SNP)

⑻ Copy Number Variation (CNV)

⑼ Loss of Heterozygosity (LOH)

⑽ Genomic Rearrangement

⑾ Rare Variant



4. Epigenetics

⑴ Overview

① Loop formation: Can occur when inverted repeat sequences are present on coding DNA.

② Intrinsic transcription terminators, t-RNA, telomere tetra G, etc. contribute to loop formation.

⑵ Subfields

① BS-seq (bisulfide sequencing)

② ChIP-seq (chromatin immunoprecipitation sequencing)

③ Hi-C sequencing (high throughput chromatin conformation capture sequencing)

④ ATAC-seq (bulk & single cell)

⑤ NOMe-seq



5. Metagenomics

⑴ Definition: Collection of all microbial genomes present in a given environment.

⑵ Also referred to as metagenome, community genomics, and pangenomics.



6. Transcriptomics

⑴ Definition

① Study of the functions of transcribed RNA.

② Uses RNA, which is significantly more sensitive compared to proteins.

⑵ Subfields

① Bulk transcriptomics (bulk RNA-seq)

② Single-cell transcriptomics (single cell RNA-seq): Method of the year in 2013.

③ Spatial transcriptomics (spatial RNA-seq): Method of the year in 2020.

④ Structural transcriptomics: Related to epigenetics.

⑤ Alternative splicing and isoform analysis: Method of the year in 2022.

⑥ RNA interference: miRNA, siRNA, etc.

⑦ Long non-coding RNA

⑧ Small RNA

⑨ Pseudogene: Transcribed but untranslated gene.

Type 1: Cases where replication occurred through retrotransposons, but introns and promoters were lost.

Type 2: Cases where genes were disabled due to accumulated mutations.



7. Proteomics

⑴ Overview

① Definition: The study of the expression patterns of translated proteins.

② Targets over a million proteins.

③ Transcriptomics explains only about 40% of actual proteomics.


image

Figure 1. mRNA abundance vs. protein abundance in NIH3T3 cells


④ Advantages: Detects biomarkers closely related to physiological phenomena.

⑤ Disadvantages: Less sensitivity compared to DNA and RNA.

⑵ Subfields

① Protein expression: Cytokine array, etc.

② PTM (post-translational modification)

③ Structural proteomics

○ Protein’s quaternary structure (i.e., multiple polypeptides composing a protein).

○ Amino acids that are far apart in primary structure may be close in reality.

○ Example: In trypsinogen, His and Ser, which form the catalytic triad, are distant in the primary structure but come together to form a single active site.

○ Generally, to analyze protein sequences, peptidases (proteases) are used to break them into fragments of a certain length or shorter.

④ Phospho-proteomics

⑤ Glycomics



8. Metabolomics

⑴ Metabolite profiling: Carried out in serum, plasma, urine, CSF, etc.

⑵ Tandem mass spec



9. Pharmacomics

⑴ Overview: Utilizes high-throughput screening technology.

⑵ Affymetrix GeneChip: HG-U133 Plus 2.0 Array, etc.

⑶ Luminex bead arrays (L1000)

⑷ Illumina Human HT-12 v4 Expression BeadChip Array

⑸ mRNA-seq (Illumina Hi-Seq)

⑹ GCP: Histone profiling

⑺ P100: Phosphoproteomics

⑻ KINOMEscan

⑼ KiNativ

⑽ MEMA

⑾ ELISA

⑿ RPPA

⒀ ATAC-seq

⒁ Cellarium

⒂ SWATH-MS



10. Phenomics

⑴ Cancer

⑵ Metabolic syndrome

⑶ Psychiatric disease



11. Radiomics

⑴ Definition: Fusion of nuclear medicine imaging and genomic information.



Input: 2021.06.12 13:56

Modified: 2022.03.17 13:44

results matching ""

    No results matching ""