Korean, Edit

Chapter 11. Bioinformatics

Recommended Article: 【Biology】 Biology Table of Contents


1. Overview

2. Comparative Genomics

3. Functional Genomics

4. Epigenetics

5. Metagenomics

6. Transcriptomics

7. Proteomics

8. Metabolomics

9. Pharmacogenomics

10. Phenomics

11. Radiomics


a. Bioinformatics Analysis Table of Contents

b. Transcriptome Analysis Pipeline

c. Cell Type Marker Genes

d. Determining Cell Types with Seurat



1. Overview

⑴ cancer types > 102

⑵ cancer patients / year ~ 2 × 106

⑶ driver mutations ~ 105

⑷ variant combinations ~ 100000C6

⑸ cell types and states ~ 104

⑹ gene combinations ~ 1013

⑺ antibody sequences ~ 2032

⑻ small molecules ~ 1060



2. Comparative Genomics

⑴ Features of the Human Genome

① The human genome is composed of 3.1 billion base pairs.

② Less than 1/3 is transcribed into RNA, and only about 5% encode proteins.

③ Genes encoding proteins amount to around 20,000 to 25,000: similar to other mammals.

④ Genes are on average about 3,000 bases long.

⑤ All humans are at least 99.9% identical.

⑥ The human genome contains a significant amount of repetitive sequences.

⑦ Less than 7% of protein-coding genes are specific to vertebrates.

Eukaryotic vs Prokaryotic Genes

Polycistronic mRNA vs Monocistronic mRNA (number of proteins encoded by one mRNA)

Eukaryotic ( × ) vs Eukaryotic ( O )

Simultaneity of transcription and translation ( O ) vs Simultaneity of transcription and translation ( × )

mRNA processing ( × ) vs mRNA processing ( O )

⑶ Comparison of Genome Size and Gene Number in Various Organisms


Type of Organism Subcategory Species Genome Size (Mb: 10^6) Number of Protein-Coding Genes Protein Coding Sequence (%)
Prokaryotes   Mycoplasma 0.58 470 88
    E. coli 4.64 4,300 88
    Archaea 4.20    
Eukaryotes Fungi Yeast 12.6 6,200 70
    Aspergillus 25.4    
  Protozoa Tetrahymena 190    
  Invertebrates C. elegans 100 21,000 25
    Drosophila 180 15,000 13
    Silkworm 490    
    Sea Urchin 845    
  Vertebrates Pufferfish 400    
    Human 3,000 ~23,500 1.5
    Mouse 3,300    
  Plants Arabidopsis 125 26,000 25
    Rice 440 35,000 ~ 50,000 10
    Pea 4,800    
    Corn 5,000    
    Wheat 17,000    

Table 1. Genome Sizes of Various Organisms


① Minimum gene count for life maintenance: Among 470 genes in M. genitalium, 337 are essential.

② Weak correlation between genome size and organism complexity.

③ Plant genomes are large due to frequent polyploidization.

⑷ Comparison of Genomes in Single-Celled Eukaryotes and Prokaryotes


Here’s the translation of the table:

  E. coli Yeast
Genome Size (Base Pairs) 4,640,000 12,068,000
Number of Protein-Coding Genes 4,300 6,200
Metabolism 650 650
Energy Production and Storage 240 175
Membrane Transporters 280 250
DNA Replication, Repair, and Recombination 120 175
Transcription 230 400
Translation 180 350
Protein Delivery and Secretion 35 430
Cell Structure 180 250

Table 2. Comparison between E. coli and Yeast


⑸ Essential Genes for Multicellular Organism Traits (e.g., C. elegans)


Function Protein Domain Genes
Transcription Regulation Zinc Finger; Homeobox 540
RNA Processing RNA Binding Domain 100
Action Potential Transmission Gated Ion Channel 80
Tissue Formation Collagen 170
Cell Interaction Extracellular Domain; Glycosyltransferase 330
Cell-Cell Signaling G Protein-Coupled Receptor, Protein Kinase, Protein Phosphatase 1,290

Table 3. C. elegans (Caenorhabditis elegans)


⑹ Comparison of Human (A) and Mouse (B) Genomes

① Human and mouse genomes differ by around 50% in DNA sequence and diverged about 75 million years ago.

② They have similar genome sizes and gene counts, with differences primarily in the distribution of transposons.

③ Genome composition: Around 180 segmental duplication events, over 90% of the genome has moved via blocks - conserved synteny.

④ Conserved synteny

⑺ Comparison between Human and Chimpanzee

① The difference in genes between humans and chimpanzees is only 1.23%.

⑻ Mitochondrial and Chloroplast Comparison

① Mitochondrial Genomics: 16,569 bp. 37 genes.

○ Many mitochondrial proteins are derived from the nucleus.

○ E.g., β-oxidation, TCA enzymes move from cytoplasm.

○ Some proteins are translated from mitochondrial DNA.

○ E.g., electron transport chain proteins, ATP synthesis enzymes are self-encoded.

○ Termination codon: CAG

② Chloroplast Genomics

○ Enzymes for Calvin cycle are self-encoded, Rubisco large subunits are chloroplastic, small subunits are cytoplasmic.

○ Cytoplasmic β-oxidation, TCA enzymes, electron transport chain proteins, ATP synthesis enzymes move from cytoplasm.

③ Chloroplast genome is much larger than mitochondrial genome.

○ Mitochondria: Repetitive sequences, no introns.

○ Chloroplast: Repetitive sequences, many introns.

○ Most mitochondrial genes have moved to the nucleus.



3. Functional Genomics

⑴ Overview

① Definition: The study of all functions of DNA, including introns and regulatory elements.

② Utilizes sequencing technologies like WGS, WES, GWAS, Chip-seq.

⑵ Movement of Genetic Material

Virus

Bacterial Recombination

Mobile DNA: Transposons, Retrotransposons, LINE, SINE

⑶ Intermediate Frequency Repeat Sequences

① VNTR (Variable Number Tandem Repeats, relatively long), STR (Short Tandem Repeats, relatively short), telomeres.

② Genetic anticipation: Generation ↑ → Repeat sequence ↑ → Disease onset ↑ (e.g., Parkinson’s disease)

⑷ High-Frequency Repeat Sequences

① Highly condensed, heterochromatin, centromeres

⑸ Satellite DNA

① A-T rich repetitive DNA

② Low buoyant density

⑹ Multigene Families

① Homologous gene families (e.g., rRNA)

② Paralogous gene families (e.g., hemoglobin)

⑺ Single Nucleotide Polymorphism (SNP)

⑻ Copy Number Variation (CNV)

⑼ Loss of Heterozygosity (LOH)

⑽ Genomic Rearrangement

⑾ Rare Variant



4. Epigenetics

⑴ Overview

① Loop Formation: Can occur when inverse repetitive sequences exist in encrypted DNA.

② Intrinsic transcription terminators, t-RNA, telomere tetra G, etc. contribute to loop formation.

⑵ Subfields

① BS-seq (bisulfide sequencing)

② ChIP-seq (chromatin immunoprecipitation sequencing)

③ Hi-C sequencing (high throughput chromatin conformation capture sequencing)

④ ATAC-seq (bulk & single cell)

⑤ NOMe-seq



5. Metagenomics

⑴ Definition: Collection of all microbial genomes present in a given environment.

⑵ Also known as metagenome, community genomics, pan-genomics.



6. Transcriptomics

⑴ Definition

① Study of the functions of transcribed RNA.

② Uses RNA that is significantly more sensitive than proteins.

⑵ Subfields

① Bulk transcriptomics (bulk RNA-seq)

② Single-cell transcriptomics (single cell RNA-seq): 2013’s technology of the year

③ Spatial transcriptomics (spatial RNA-seq): 2020’s technology of the year

④ Structural transcriptomics: Related to epigenetics

⑤ Alternative splicing and isoform analysis: 2022’s technology of the year

⑥ RNA interference: miRNA, siRNA, etc.

⑦ Long non-coding RNA

⑧ Small RNA

⑨ Pseudogene: Transcribed but untranslated gene



7. Proteomics

⑴ Overview

① Definition: Study of protein expression patterns

② Targets over a million proteins

③ Transcriptomics explains only about 40% of actual proteomics


image

Figure 1. mRNA abundance vs. protein abundance in NIH3T3 cells


④ Advantages: Detects biomarkers closely related to physiological phenomena

⑤ Disadvantages: Less sensitivity compared to DNA and RNA

⑵ Subfields

① Protein expression: Cytokine array, etc.

② PTM (post-translational modification)

③ Structural proteomics

○ Protein’s quaternary structure (i.e., multiple polypeptides composing a protein)

○ Amino acids that are far apart in primary structure may be close in reality

○ Example: His and Ser in triad of trypsin are distant in primary structure but form one active site

○ Peptidase (protease) breaks down protein sequences into fragments of a certain length for analysis

④ Phospho-proteomics

⑤ Glycomics



8. Metabolomics

⑴ Metabolite profiling: Carried out in serum, plasma, urine, CSF, etc.

⑵ Tandem mass spec



9. Pharmacomics

⑴ Overview: Utilizes high-throughput screening technology

⑵ Affymetrix GeneChip: HG-U133 Plus 2.0 Array, etc.

⑶ Luminex bead arrays (L1000)

⑷ Illumina Human HT-12 v4 Expression BeadChip Array

⑸ mRNA-seq (Illumina Hi-Seq)

⑹ GCP: Histone profiling

⑺ P100: Phosphoproteomics

⑻ KINOMEscan

⑼ KiNativ

⑽ MEMA

⑾ ELISA

⑿ RPPA

⒀ ATAC-seq

⒁ Cellarium

⒂ SWATH-MS



10. Phenomics

⑴ Cancer

⑵ Metabolic syndrome

⑶ Psychiatric disease



11. Radiomics

⑴ Definition: Fusion of nuclear medicine imaging and genomic information.



Input: 2021.06.12 13:56

Modified: 2022.03.17 13:44

results matching ""

    No results matching ""