Korean, Edit

Chapter 10. Genome Project and Sequencing Technology

Recommended Article: 【Biology】 Biology Table of Contents


1. Genome Project

2. Sequencing Technology


a. Pyrosequencing

b. Whole Genome Sequencing

c. Biological Informatics Analysis Table of Contents

d. Transcriptome Analysis Pipeline



1. Genome Project

⑴ Overview

① Started under the command of Watson in 1990: Initiated as a 15-year project by a coalition of 6 countries

② Collaborative research involving over 350 research institutions

○ 84.5% completion announced on June 11, 2000, with the draft

○ 99.99% accuracy final version released on April 15, 2003

○ Involvement of over 2,800 researchers over 13 years, costing 2.7 trillion won

③ Side effects of human genome research

○ Birth of biological informatics

○ Promotion of the development of the human protein production process

○ Insulin: The first protein with a determined sequence

○ Promotion of the development of automated DNA sequencing devices

○ Promotion of the genome analysis of other applicable organisms

Methodology 1: Stepwise Sequencing Method (Scientists’ Approach)

Step 1: Determining Restriction Enzyme Recognition Sites

○ Cutting DNA with restriction enzymes and electrophoresis reveals the sizes of each fragment

○ Treating two restriction enzymes in various ways reveals the relative distances between their recognition sites

Step 2: Constructing a Gene Map

○ Determining the relative distances of genes on the chromosome

○ Inferring the distance between genes through recombination rates

Step 3: Physical Map (DNA Map) Construction

○ Meaning of determining restriction enzyme recognition sites: Using the information of fragments with known sequences based on each restriction enzyme recognition site to cumulatively construct a physical map

○ Meaning of constructing a gene map: Once a physical map is constructed, it can be compared with a gene map. Introns exist between genes

○ Approach using a single library

Methodology 2: Shotgun Sequencing (Celera, J. Craig Venter) (Entrepreneurial Approach)

① Cutting one DNA in multiple ways

② Determining the sequence of fragments cut by one method

○ The length of the analysis sample is limited, so the DNA sequence cannot be determined all at once

③ Randomly arranging the sequences of bases in each method until a common result is obtained

④ Method based on computer science

⑷ Scientists’ Approach vs. Entrepreneurial Approach

① Scientists’ approach is dissatisfied with entrepreneurs’ appropriation of their contributions and investments

② Entrepreneurial approach is dissatisfied with scientists’ failure to disclose information, leading to the development of new methodologies

③ The final completion of the genome map is a joint effort of both.


image

Figure 1. Stepwise Sequencing Method and Shotgun Sequencing Method



2. Sequencing Technology

⑴ Overview

① DNA Sequencing: Applying the principle of DNA replication

○ Template: Each strand of DNA

○ Substrate: dNTP (dATP, dCTP, dGTP, dTTP)

○ NTP has an -OH group at the 2’ carbon and is a material for RNA synthesis

○ DNA polymerase: Phosphate of the next nucleotide binds to the 3’-OH of deoxyribose

○ Synthesis direction: 5’ → 3’, forming complementary base pairs with the template

○ ddNTP: DNA polymerization is terminated because the 3rd carbon lacks an OH group

② RNA Sequencing: Applying the principle of RNA transcription

in vitro cloning: The very first sequencing method

⑶ Dideoxy Chain Termination Method (= Sanger sequencing): Reported in 1977, Sanger’s second Nobel Prize

① Substrate: dNTP + ddNTP (in small quantities) + buffer (pH stabilized)

○ ddNTP lacks a 3’-OH group, so it terminates the polymerization reaction

○ If ddNTP is added in large quantities, all template DNAs are quickly terminated

② Primer

○ Example: p32-primer (CTAG)

③ 1st. Addition of template DNA and polymerase

④ 2nd. Heating to separate the complementary strand after polymerization

⑤ 3rd. Electrophoresis followed by reading the sequence on X-ray film or fluorescence examination


image

Figure 2. Process of Dideoxy Chain Termination Method


⑥ Advantages: Can read very long strands, still used in laboratories

⑦ Disadvantages: Requires a large amount of the same DNA strand

⑷ Dye-dideoxy chain termination method: Using laser

① Add a small amount of ddNTP to 4-color fluorescent dNTP.

② Automatic DNA sequencing is possible.


image

Figure 3. Process of the Dye-dideoxy chain termination method


Pyrosequencing

① Definition: A DNA sequencing method that relies on the proportional luminescence produced based on the amount of pyrophosphate generated during DNA synthesis.

② Diagram


image

Figure 4. Pyrosequencing diagram


③ Process


image

image

image

image

image

image

Figure 5. Pyrosequencing process


⑹ Illumina solid-phase amplification (ref)


image

Figure 6. Illumina solid-phase amplification


image

Figure 7. Fluorescent color distribution photo


① 1st. Fragmentation: Randomly cut the given DNA sample.

② 2nd. Gel-based size selection: Size of each DNA fragment can be limited if necessary.

③ 3rd. Adaptor binding: Attach an adapter to both ends of all DNA sample fragments.

④ 4th. Amplification

○ 4th - 1st. Denature DNA into single-strands.

○ 4th - 2nd. Attach single-stranded DNA to the Illumina flow cell.

○ 4th - 3rd. Add enzymes to allow single-stranded DNA to form bridges on a solid-phase substrate.

○ 4th - 4th. After adding primers to the single-stranded DNA bridges, primers can bind to the bridges.

○ 4th - 5th. Add unlabeled single-stranded DNA and induce DNA synthesis: Forms double-stranded DNA bridges.

○ 4th - 6th. Denature to turn double-stranded DNA bridges into anchored single-stranded DNA.

○ 4th - 7th. Repeat the above six steps to create anchored single-stranded clusters with the same base sequence.

○ **Feature: Anchored single-stranded clusters form millions of clusters.

⑤ 5th. Sequencing by synthesis (SBS)

○ 4th - 1st. Add 4 types of labeled reversible terminators, primers, and DNA polymerase according to the base type.

○ 4th - 2nd. When labeled reversible terminators form phosphodiester bonds, fluorescence is emitted.

○ 4th - 3rd. Obtain a fluorescent color distribution image of each cluster.

○ 4th - 4th. Washing

○ 4th - 5th. Repeat the above four steps to determine the entire base sequence.

Type 1. Single-end sequencing (SES): Sequencing with only one adapter.

Type 2. Paired-end sequencing (PES): Sequencing with both adapters.

○ Initially, sequence with one adapter (Read1 acquisition), then sequence with the opposite adapter (Read2 acquisition).

○ Read1 and Read2 from the same DNA fragment can be easily matched since they come from the same cluster.

○ Advantages: Higher accuracy (due to Read1 and Read2 comparison), easy detection of DNA variations, easy analysis of repetitive sequences, and easy mapping between different species.

○ Disadvantages: Higher cost and more steps required than SES.

⑺ WGS (Whole Genome Sequencing)

① SNV, insertion, deletion, structural variant, CNV

② Sequencing depth > 30X

⑻ WES (Whole Exon Sequencing)

① Only SNV, insertion, deletion, SNP in protein-coding genes

② Sequencing depth > 50X ~ 100X

③ Cost-effective

⑼ RNA-seq

① 1st. Microdissection: Separating specific tissues for RNA extraction.

○ LCM (Laser Capture Microdissection): Cutting specific tissues with a laser beam. Robust but labor-intensive.

○ TOMO-seq: Using cryosection and computer-based 3D sectioning. Not suitable for clinical purposes.

○ Transcriptome in vivo analysis

○ ProximID

○ STRP-seq

② 2nd. Attach poly T recognizing the poly A tail of RNA.

③ 3rd. Fragment RNA.

④ 4th. Attach primers to RNA.

⑤ 5th. First cDNA synthesis.

⑥ 6th. Second cDNA synthesis.

⑦ 7th. Process the 3’ and 5’ ends of RNA.

⑧ 8th. Ligate DNA sequencing adapters.

⑨ 9th. Amplify ligated fragments with PCR.

Application 1. dUTP method: A representative method for strand-specific sequencing.

○ Background: Used for studying biological functions based on RNA orientation (e.g., regulation of antisense miRNA).

Step 1. DNA &RNA hybrid: Synthesize cDNA (first or anti-sense strand) using dT primers and reverse transcriptase, targeting mRNA poly-A tails.

5’-//-U-//-AAAAAA-3’

3’-//-A-//-TTTTTT-5’

Step 2. ds _ cDNA_: Use dUTP instead of dTTP to synthesize cDNA (first strand) as the template for cDNA (second or sense strand).

3’-//-A-//-TTTTTT-5’

5’-//-U-//-AAAAAA-3’

Step 3. ligated _ds cDNA_: Connect Y-adaptors to both ends of ds cDNA.

Step 4. Treatment with UDG (uracil-DNA glycosylase) breaks down the second strand, which contains uracil.

Step 5. Amplify the remaining reverse antisense strand (first strand) to create the library.

○ In the library raw data, “_1.fastq” represents the first strand, while “_2.fastq” represents the second strand.

○ Thus, _2.fastq represents the original RNA profile.

Single-cell sequencing

① Types: scDNA-seq, scRNA-seq (2013 Technology of the Year), single-cell epigenetics sequencing

Step 1. Isolation of single cells

Method 1. Simple isolation: Very early method.

Method 2. Based on FACS or LCM (laser microdissection)

Method 3. Acoustic separation

○ Separates single cells hydrodynamically, causing minimal impact on cells.

○ CyTOF (cytometry by time of flight) is a representative method.

Method 4. Immuno-magnetic separation

○ Attach magnets to cells.

○ Can obtain a large number of cells.

○ Divided into cases with and without centrifugation requirements.

○ Droplet-based platform and plate-based platform have different library size.

Step 2. Reverse transcription

Step 3. cDNA amplification

Step 4. Library construction: e.g., Drop-seq

⑥ Single-cell genomics (scDNA-seq) + Single-cell transcriptomics (scRNA-seq)

○ Allows understanding the relationship between genomic mutation patterns and gene expression in transcriptomes.

○ Technologies for separating DNA and RNA: G&T seq, SIDR-seq, DNTR-seq

Single Nucleus RNA Sequencing (snRNA-seq)

Purpose 1. Muscles are multinucleated cells, so they need to be analyzed at the nuclear level as they are not captured by scRNA-seq.

Purpose 2. snRNA-seq captures more various RNA, including introns, pre-mRNA, non-coding RNA, compared to scRNA-seq.

Purpose 3. In snRNA-seq, nuclear RNA is primarily captured, while cytoplasmic RNA is also captured (although in small amounts).

Spatial Sequencing ( Supplement)


image

Figure 8. Overview of spatial sequencing


image

Table 1. Comparison of different spatial transcriptomic technologies


Type 1. Spatial genomics

Example 1. Tumor research: Tumors are heterogeneous.

Example 2. Spleen research: Mature immune cells have diverse genetic compositions.

Type 2. Spatial transcriptomics: 2020 Technology of the Year

2-1. Spot-based spatial transcriptomics: Many genes + few spots


image

Figure 9. Betchmark Study of spot-based spatial transcriptomics


ST (Spatial Transcriptomics)

○ Barcoded oligos are randomly arranged on a functionalized surface, capturing mRNA released from the mounted tissues and/or cells.

10X Visium

○ Principle: Attach spot-specific oligonucleotides to each spot to hybridize with tissue-derived RNA, obtaining spotwise transcriptomes.

○ Surface area: 6.5 mm × 6.5 mm

○ Thickness: 10 ~ 20 μm

○ Number of spots: Up to 4992 (Based on previous version of Visium HD)

○ Distance between spots: 100 μm

○ Diameter of spots: 55 μm

○ Sensitivity: 10,000 transcripts per spot

Type 1. Direct Visium (oligo-dT based method): Captures mRNAs with poly dT. Only applicable to FF (fresh-frozen) samples.


image

Figure 10. Principle of Visium FF


Type 2. probe-based Visium

○ It can be done in both FF (Fresh Frozen) and FFPE (Formalin-Fixed Paraffin-Embedded) samples. In particular, FFPE (formalin-fixed paraffin-embedded) samples cannot undergo direct Visium due to RNA degradation, where mRNA molecules are fragmented into various pieces.

○ To identify the target mRNA, all three pairs of LHS and RHS must be ligated together: each probe’s length is 25 base pairs. RTL (probe-based RNA-templated ligation chemistry) is utilized for this purpose.


image

Figure 11. Principle of probe-based Visium


○ Advantage: Superior data quality compared to direct Visium.

○ Disadvantage: Limited freedom in analysis compared to Visium FF, as only genes specified by the probe are detected.

○ For Visium FFPE, starting from June 2024, 10x will discontinue the Visium FFPE service, not using CytAssist.

○ The CytAssist images represent the distribution of gene expression and are used for image alignment.

○ 10X Visium HD

○ The basic data consists of spots with a diameter of 2 μm, and additional data binned at 8 μm and 16 μm are also provided.

Slide-seq and Slide-seq V2

○ Employs random spatial bead spreading and in situ sequencing decoding.

○ 97% of spots consist of one or two cell types.

HDST

○ Deposits beads with combinatorial barcodes on patterned wafers which are then decoded with serial hybridization.

○ NanoString GeoMx

○ Nanostring lost a patent dispute with 10x Genomics as of Nov ‘23 (ref) → The bankruptsy of Nanostring (ref)

○ Stereo-seq: Higher spatial resolution than Visium

○ Utilizes Illumina or MGI sequencing for oligo patterning on flow cells, and barcode calling is performed directly on the sequencer.

○ Diameter: 220 nm

○ Distance between spots: 500 or 715 nm

○ Seq-Scope: Higher spatial resolution than Visium

○ Utilizes Illumina or MGI sequencing for oligo patterning on flow cells, and barcode calling is performed directly on the sequencer.

○ PIXEL-seq

XYZeq

○ Tissue is placed on a spatially barcoded microwell array for an initial round of reverse transcription, after which whole cells are removed and undergo single-cell sequencing.

sci-Space

○ Tissue is placed on a glass slide bearing spatially gridded hashing oligos; tissue is then permeabilized to enable oligo transfer and then imaged; nuclei are then extracted, fixed and sequenced.

○ sci-RNA-seq

○ TIVA-seq

○ NICHE-seq

ZipSeq

○ It uses patterned illumination and photocaged oligonucleotides to serially print ‘zipcodes’ onto live cells in intact tissues in real time.

DBiT-seq

○ Delivers barcoded oligos directly to tissue through orthogonal microfluidics in a predetermined spatial distribution.

○ CITE-seq (ref1, ref2): Enables parallel comparison of spatial transcriptomics and antibody distribution


image

Figure 12. Diagram of CITE-seq


○ Connect the 5’ end of oligonucleotide to an antibody using streptavidin-biotin.

○ The oligonucleotide can hybridize complementarily with the oligo-dT primer.

○ Streptavidin-biotin bond can dissociate under reducing conditions.

○ Recently, perturb-CITE-seq was also developed.

○ SPOTS

○ Spatial PrOtein and Transcriptome Sequencing

○ Indirectly assess protein level on Visium using polyadenylated DNA-barcoded antibody

○ Open-ST

○ MAGIC-seq

2-2. FISH based spatial transcriptomics: Few genes + many spots

○ ISS( in situ sequencing): Technique to sequence RNA at its original location in tissue. Sequencing by ligation

Type 1. The first ISS

Type 2. ISS with Padlock probe

○ Reverse transcriptase creates cDNA of the RNA target

○ Padlock probe can hybridize to two regions of the cDNA

○ Target sequence amplification occurs through RCA (rolling-circle amplification)

○ RCA product is sequenced in situ by ligation

Type 3. ISS using fluorescent probes and cross-linking

Type 4. barcode based methods

Type 5. gap-filled ISS

smFISH(single molecule FISH) (2008)

seqFISH(sequential FISH) (2014): DNAse I-based digestion and sequential staining and imaging rounds to decode transcripts

seqFISH+: Genome-scale transcriptome investigation separating individual transcripts into fluorescence spectra, employing 20 probes per encoding round.

Vizgen - MERSCOPE (Technology name: MERFISH (multiplexed error-robust FISH))

○ Direct probe hybridization without separate amplification mechanism.

○ Each FISH probe corresponds 1:1 with each gene (though this assumption may not always hold).

○ Employing error correction in barcode assignment for robust barcode calling in noisy FISH-based images.

Step 1. Photograph multiple times with fluorescence varying over time for each FISH probe

Step 2. Reverse identify genes based on binary code read from each RNA


image

Figure 13. Principle of MERFISH


10x - Xenium

○ Small amount of padlock probe + rolling circle amplification

Step 1. Padlock probe binds complementary RNA transcript in a pincer shape, forming a loop

Step 2. RCA (rolling circle amplification): RNA transcript amplified after loop formation

Step 3. Hybridize each RNA transcript with a fluorescent probe, then perform fluorescent imaging → washing

Step 4. Repeat Step 3 and decode labels for each gene from the generated images


image

Figure 14. Principle of Xenium


Nanostring - CosMx

○ Small amount of probe + branch chain hybridization

○ Nanostring won in the U.S. against 10x for violating antitrust laws in July ‘23 (ref) → The bankruptsy of Nanostring (ref)


image

Figure 15. Principle of CosMx


○ FISSEQ and oligoFISSEQ

○ Veranome

○ Rebus

○ BOLORAMIS

○ STARmap: Sequencing by ligation

○ SEDAL sequencing

○ ExSeq

○ BaristaSeq: Sequencing by synthesis

○ BARSeq and BARSeq2

○ HybISS

○ SABER

○ clampFISH

○ split-FISH

○ SCRINSHOT

○ PLISH

○ osmFISH

○ ExFISH

○ par-seqFISH

○ EASI-FISH

○ SGA

○ corrFISH

Type 3. Spatial proteomics: Broadly classified into mass spectrometry-based and imaging-based methods

○ SWITCH

○ MxIF

○ t-CyCIF

○ IBEX

○ DEI

○ CODEX

○ immuno-SABER

○ TSA

○ Opal IHC

○ MIBI

○ IMC

○ HD-MIBI

○ GeoMx Digital Spatial Profiler (DSP): 100 mm scale

○ GeoMX DSP stains tissues with suites of antibodies or gene probes fused to UV-cleavable DNA barcodes.

4i multiplexed imaging

⒀ Other sequencing technologies

① TCR-seq (T cell receptor sequencing): Sequencing used to track T cell subtypes and clones.

② Invade-seq: A sequencing technique for analyzing the host-microbiome.

③ long-read sequencing: 2022 Technology of the Year Technology of the Year (Reference)

○ Less sequencing gap compared to short-read sequencing


image

Figure 16. long-read sequencing and short-read sequencing


Advantage 1. AS Analysis(alternative splicing analysis): Can identify alternative splicing events, isoforms, etc.

Advantage 2. Easier integration of epigenetics and transcriptomics

Example 1. Pacific Biosciences SMRT (single molecule real-time) sequencing: Average read length is ~20 kb

Example 2. Oxford Nanopore Sequencing: Average read length is ~100 kb

④ non-invasive sequencing

○ A technology that allows sequencing without breaking cells

Halo-seq: A technique for obtaining the transcriptome of RNAs adjacent to a specific protein.

Step 1. Attach a HaloTag domain to a specific target.

Step 2. This HaloTag generates an alkyne handle radical by ejecting a hydrogen radical H· from a radical-producing Halo ligand injected with an alkyne handle.

○ R-H → R· + H·

Step 3. Similarly, the HaloTag generates an RNA radical by ejecting a hydrogen radical H· from RNA.

○ RNA-H → RNA· + H·

Step 4. The alkyne handle radical combines with the RNA radical.

Step 5. React alkyne-RNA with biotin azide to produce biotinylated RNA.

Step 6. Separate only the biotinylated RNA using affinity chromatography with streptavidin.

Step 7. RNA-seq allows for the detection of RNAs close to the specific target.

○ Reason: Radicals are unstable and cannot travel long distances.


image

Figure 17. Principle of Halo-seq


⑥ multi-NTT seq (nanobody tethered transposition followed by sequencing)

Epigenomics Sequencing(epigenomics sequencing)

3D Sequencing

⑨ Temporal Sequencing

Record-seq

Live-seq

TMI

molecular recording

⑩ Spatiotemporal Omics

ORBIT (single-molecule DNA origami rotation measurement)

○ 4D spatiotemporal MRI or hyperpolarized MR

in vivo 4D omics with transparent mice

⒁ NGS (next-generation sequencing) Summary

① Cost of genome analysis

○ 2001: Human Genome Project benchmark $100 million / person

○ 2007: 100 billion won / 4 years

○ 2008: 454 Life Sciences standard $1,000,000 / person. 1.5 billion won / 4.5 months

○ 2009: Helicos BioSciences standard $48,000 / person

○ Predicted to be sufficient with one million won by 2014 (Nature 456, 23-25, 2008)

② Scale of genome analysis


image

Figure 18. Trend of genome analysis scale


③ Relationship between depth and coverage

○ sequencing depth (read depth): Indicates how many times a specific nucleotide appears on average


image

Figure 19. Definition of depth


○ “10x” means it was read 10 times

○ Can be defined for each nucleotide

○ coverage (c)

○ c: = LN / G

○ L: read length

○ N: number of reads

○ G: haploid genome length

Comparison of depth and coverage

○ Sequencing depth represents total read number

○ Coverage represents the relationship between sequence reads and reference (e.g. whole genome, al locus)

○ Otherwise, depth and coverage are very similar concepts

④ Relationship between bulk and read

○ bulk: total RNA production

○ In case of equal depth, as bulk increases, RNA read count is inversely proportional, causing irrationality

○ Example: In spatial transcriptomics, bulk is typically large and depth is low, resulting in low RNA read count

○ Normalization: Various methods have been introduced to resolve this irrationality

⑤ Relationship between read count and number of reads

○ If read length is less than 250 bp, it is impossible to detect sequence error

○ Relationship between read length and number of reads per run: there is a trade-off


image

Figure 20. Relationship between read length and number of reads per run


⑥ Relationship between transcriptome read count and gene expression

○ read count: Actual number of transcripts

○ gene expression: Value corrected from read count through normalization process



Entered: 2015.07.02 23:31

Updated: 2022.03.13 13:11

results matching ""

    No results matching ""