Chapter 10. Genome Project and Sequencing Technology
Recommended Article: 【Biology】 Biology Table of Contents
c. Bioinformatics Analysis Table of Contents
1. Genome Project
⑴ Overview
① In 1990, under the leadership of Watson and Francis Collins, it was launched as a 15-year project by a consortium of six countries.
② Collaborative research involving over 350 research institutions
○ On June 11, 2000, 84.5% was completed, and a draft was announced.
○ On April 15, 2003, the final version with 99.99% accuracy was released.
○ More than 2,800 researchers participated over 13 years, with a total cost of $2.7 billion.
③ Side effects of human genome research
○ Birth of bioinformatics.
○ Facilitating the development of human protein production processes
○ Insulin: The first protein with a determined sequence.
○ Facilitating the development of automated DNA sequencing instruments
○ Promoting the genomic analysis of other biologically applicable organisms
⑵ Methodology 1. Stepwise sequencing method: Scientists’ approach
① Step 1: Determining restriction enzyme recognition sites
○ After cutting DNA with restriction enzymes and performing electrophoresis, the size of each fragment can be determined.
○ Treating two restriction enzymes in various ways reveals the relative distances between their recognition sites.
② Step 2: Constructing a gene map
○ Determining the relative distances of genes on the chromosome.
○ Inferring the distance between genes through recombination rates.
③ Step 3: Physical map (DNA map) construction
○ Significance of determining restriction enzyme recognition sites: By using each restriction enzyme recognition site as a reference, a physical map is progressively constructed based on the sequence information of the obtained fragments.
○ Significance of gene mapping: Once a physical map is created, it can be compared with the gene map. Introns exist between genes.
○ Approach Using a single library.
⑶ Methodology 2. Shotgun Sequencing (Celera, J. Craig Venter): Entrepreneurial approach
① Cutting one DNA in multiple ways.
② Determining the entire nucleotide sequence of fragments cut by a single method.
○ Since the length of the analyzed sample is limited, the complete DNA sequence cannot be determined at once.
③ In each method, nucleotide sequences are arranged randomly and repeated until consistent results are obtained.
④ A method based on computer science.
⑷ Scientists’ approach vs. Entrepreneurial approach
① Scientific Community: The Human Genome Project (HGP), led by Watson and Francis Collins. The final cost was $2.7 billion.
② Entrepreneurial Sector: Celera Genomics (launched in 1998), led by Craig Venter. The final cost was $300 million.
③ Both groups released a draft of the human genome in 2001.
④ The scientific community was dissatisfied, believing that the entrepreneurial sector was taking credit for their contributions and investments.
⑤ The entrepreneurial sector was frustrated with the scientific community for not sharing data, leading to the development of new methodologies.
⑥ The final completion of the genome map was acknowledged as a joint achievement of both groups.
Figure 1. Stepwise Sequencing Method and Shotgun Sequencing Method
2. Sequencing Technology
⑴ Overview
① DNA Sequencing: Applying the principle of DNA replication.
○ Template: Each strand of DNA
○ Substrate: dNTP (dATP, dCTP, dGTP, dTTP)
○ NTPs have a hydroxyl (-OH) group attached to the 2’ carbon and serve as the building blocks for RNA polymerization.
○ DNA polymerase: Phosphate of the next nucleotide binds to the 3’-OH of deoxyribose.
○ Synthesis direction: 5’ → 3’, forming complementary base pairs with the template.
○ ddNTP: DNA polymerization is terminated because the 3rd carbon lacks an OH group.
② RNA Sequencing: Applying the principle of RNA transcription.
③ Sequencing by Synthesis
○ Definition: A method of determining nucleotide sequences by detecting signals emitted when a polymerase incorporates nucleotides into the primer sequence.
○ Type 1. CRT (Cyclic Reversible Termination): A cyclic process consisting of incorporation, imaging, and deprotection.
○ Type 2. SNA (Single Nucleotide Addition): Adds only one nucleotide at a time.
○ Examples: Ion semiconductor sequencing (PGM), Pyrosequencing (Roche 454)
④ Sequencing by Ligation
○ Definition: A method of determining nucleotide sequences by detecting signals emitted when a ligase connects a hybridized sequence to the primer sequence.
○ 1 or more nucleotide can be incorporated at a time (e.g., dibase).
○ Examples: Nanoball sequencing, Thermo Fisher SOLiD
⑵ In vitro cloning: The very first sequencing method
⑶ Dideoxy Chain Termination Method (= Sanger sequencing): Reported in 1977. Sanger’s second Nobel Prize.
① Substrate: dNTP + ddNTP (in small quantities) + buffer (pH stabilization)
○ ddNTP lacks a 3’-OH group, so it terminates the polymerization reaction.
○ If ddNTP is added in large quantities, all template DNAs are quickly terminated.
② Primer
○ Example: p32-primer (CTAG)
③ 1st. Addition of template DNA and polymerase.
④ 2nd. After the polymerization reaction, heat is applied to separate the replicated strands.
⑤ 3rd. After electrophoresis, the sequence is read from an X-ray film or detected using fluorescence.
Figure 2. Process of Dideoxy Chain Termination Method
⑥ Advantages: It can read very long strands, so it is still used in laboratories.
⑦ Disadvantages: Requires a large amount of the same DNA strand.
⑷ Dye-dideoxy chain termination method: Using laser.
① A small amount of ddNTPs is added to dNTPs, each labeled with one of four fluorescent dyes.
② Enables automated DNA sequencing.
Figure 3. Process of the Dye-dideoxy chain termination method
⑸ Ion Semiconductor Sequencing (PGM)
① Sequencing by Synthesis.
② Prone to insertion/deletion errors.
⑹ Pyrosequencing (Roche 454)
① Definition: A DNA sequencing method that relies on the proportional luminescence produced based on the amount of pyrophosphate generated during DNA synthesis. ~400 bp per read.
② Sequencing by Synthesis: Light is emitted when a nucleotide is incorporated, appearing as peaks in a pyrogram.
③ Prone to insertion/deletion errors.
④ No longer in use.
Figure 4. Pyrosequencing diagram
Figure 5. Pyrosequencing process
⑺ Nanoball sequencing
① Sequenching by Ligation
⑻ Thermo Fisher SOLiD
① Sequenching by Ligation
② No longer in use.
⑼ Illumina solid-phase amplification (ref, YouTube clip)
Figure 6. Illumina solid-phase amplification
Figure 7. Fluorescent color distribution photo
① 1st. Fragmentation: Randomly cut the given DNA sample.
② 2nd. Gel-based size selection: If necessary, the size of each DNA fragment can be restricted.
③ 3rd. Adaptor binding: Adapters are attached to both ends of all DNA sample fragments using a transposome.
④ 4th. Amplification
○ 4th - 1st. Denature DNA into single-strands.
○ 4th - 2nd. Attach single-stranded DNA to the Illumina flow cell.
○ 4th - 3rd. Add enzymes to allow single-stranded DNA to form bridges on a solid-phase substrate.
○ 4th - 4th. After adding primers to the single-stranded DNA bridges, primers can bind to the bridges.
○ 4th - 5th. Add unlabeled single-stranded DNA and induce DNA synthesis: Forms double-stranded DNA bridges.
○ 4th - 6th. Through denaturation, the double-stranded DNA bridge is converted into anchored single-stranded DNA.
○ 4th - 7th. Repeat the above six steps to create anchored single-stranded clusters with the same base sequence.
○ Feature: Anchored single-stranded clusters form millions of clusters.
⑤ 5th. Only the forward strand is retained to prevent unwanted priming.
⑥ 6th. Sequencing by synthesis (SBS)
○ 6th - 1st. Four types of labeled reversible terminators (based on the nucleotide type), primers, and DNA polymerase are added.
○ 6th - 2nd. When labeled reversible terminator nucleotides form phosphodiester bonds, fluorescence is emitted.
○ 6th - 3rd. Obtain a fluorescent color distribution image of each cluster.
○ 6th - 4th. Washing
○ 6th - 5th. Repeat the above four steps to determine the entire nucleotide sequence.
○ Type 1. Single-end sequencing (SES): Sequencing with only one adapter.
○ Type 2. Paired-end sequencing (PES): Sequencing with both adapters.
○ Initially, sequence with one adapter (Read1 acquisition), then sequence with the opposite adapter (Read2 acquisition).
○ Read1 and Read2 from the same DNA fragment can be easily matched since they come from the same cluster.
○ Advantages: Higher accuracy (due to Read1 and Read2 comparison), easy detection of DNA variations, easy analysis of repetitive sequences, and easy mapping between different species.
○ Disadvantages: Higher cost and more steps required than SES.
⑦ ~100 bp / read: Recently, up to 250 bp.
⑽ WGS (Whole Genome Sequencing)
① SNV, insertion, deletion, structural variant, CNV
② Sequencing depth > 30X
⑾ WES (Whole Exon Sequencing)
① SNVs, insertions, deletions, and SNPs only in protein-coding genes.
② Sequencing depth > 50X ~ 100X
③ Cost-effective.
⑿ RNA-seq
① Overview
○ Microarray data has raw data in the form of continuous values, whereas RNA-seq has raw data in the form of count data.
○ Compared to microarray data, RNA-seq can robustly capture signals even for genes with very low or very high expression levels.
② 1st. Microdissection: Separating specific tissues for RNA extraction.
○ LCM (Laser Capture Microdissection): Cutting specific tissues with a laser beam. Robust but labor-intensive.
○ TOMO-seq: Using cryosection and computer-based 3D sectioning. Not suitable for clinical purposes.
○ ProximID
○ STRP-seq
③ 2nd. Poly T is attached to recognize the poly A tail of RNA.
④ 3rd. Fragment RNA: 200-400 nt.
⑤ 4th. Attach primers to RNA.
⑥ 5th. First cDNA synthesis.
⑦ 6th. Second cDNA synthesis.
⑧ 7th. Process the 3’ and 5’ ends of RNA.
⑨ 8th. Ligate DNA sequencing adapters.
⑩ 9th. Amplify ligated fragments with PCR.
⑪ Application 1. dUTP method: A representative method for strand-specific sequencing.
○ Background: Used for studying biological functions based on RNA orientation (e.g., regulation of antisense miRNA).
○ Step 1. DNA &RNA hybrid: Synthesize cDNA (first or anti-sense strand) using dT primers (targeting mRNA poly-A tails) and reverse transcriptase.
5’-//-U-//-AAAAAA-3’
3’-//-A-//-TTTTTT-5’
○ Step 2. ds cDNA: Using dUTP instead of dTTP, the second (or sense) strand of cDNA is synthesized using the first strand of cDNA as a template.
3’-//-A-//-TTTTTT-5’
5’-//-U-//-AAAAAA-3’
○ Step 3. Ligated ds cDNA: Connect Y-adaptors to both ends of ds cDNA.
○ Step 4. When treated with UDG (uracil-DNA glycosylase), the second strand of DNA containing uracil is degraded.
○ Step 5. Amplify the remaining reverse antisense strand (first strand) to create the library.
○ In the library raw data,
_1.fastq
represents the first strand, while_2.fastq
represents the second strand.
○ Thus, _2.fastq represents the original RNA profile.
⒀ Single-cell sequencing
① Types
○ scDNA-seq
○ scRNA-seq (Method of the Year in 2013): Chromium, Smart-seq, etc.
○ single-cell epigenetics sequencing
② Step 1. Isolation of single cells
○ Method 1. Simple isolation: Very early method.
○ Method 2. Based on FACS or LCM (laser microdissection)
○ Method 3. Acoustic separation
○ Separates single cells hydrodynamically, causing minimal impact on cells.
○ CyTOF (cytometry by time of flight) is a representative method.
○ Method 4. Immuno-magnetic separation
○ Attach magnets to cells.
○ Can obtain a large number of cells.
○ Classified into cases where centrifugation is required and where it is not.
○ It can be largely divided into droplet-based platforms and plate-based platforms.
③ Step 2. Reverse transcription
④ Step 3. cDNA amplification
⑤ Step 4. Library construction: Drop-seq, etc.
⑥ Single-cell genomics (scDNA-seq) + Single-cell transcriptomics (scRNA-seq)
○ It allows understanding of how mutation patterns in the genome are related to gene expression in the transcriptome.
○ Technologies for separating DNA and RNA: G&T seq, SIDR-seq, DNTR-seq
⒁ Single Nucleus RNA Sequencing (snRNA-seq)
① Purpose 1. Muscles are multinucleated cells, so they need to be analyzed at the nuclear level as they are not captured by scRNA-seq.
② Purpose 2. snRNA-seq captures more various RNA, including introns, pre-mRNA, non-coding RNA, compared to scRNA-seq.
③ Purpose 3. In snRNA-seq, nuclear RNA is predominantly captured: But a small amount of cytoplasmic RNA is also detected.
⒂ Spatial Omics (▶ Supplement)
① Overview
Figure 8. Overview of spatial omics
Table 1. Comparison of different spatial omics technologies
Year | Method | Sample | Target | Resolution | Single-cell? | Probe | Area | # of cells | # of genes |
---|---|---|---|---|---|---|---|---|---|
Spot-based Spatial Transcriptomics | |||||||||
2014 | Tomo-seq | Fresh-frozen | Transcriptome-wide | - | No | - | - | - | 12,000 genes per section |
2016 | Visium | Fresh-frozen and FFPE | A-tailed RNA transcripts and targeted genes | 100 μm | No | 6.5 mm × 6.5 mm | 20,000 ~ 40,000 cells | ~1.700 genes per spot | |
2017 | Geo-seq | Fresh-frozen | Transcriptome-wide | - | Yes | - | - | 5-40 cells per section | ~8,000 genes |
2019 | Slide-seq/V2 | Fresh-frozen | A-tailed RNA transcripts | 10 μm | No | Beads | Φ3.0 mm | - | 550 UMI per read |
2019 | HDST | Fresh-frozen | A-tailed RNA transcripts | 2 μm | No | Beads | 5.7 mm × 2.4 mm | ~60,000 cells per individual hexagonal | 44 UMI per 5× beads |
2020 | DBiT-seq | FFPE | A-tailed RNA transcripts | 25 μm | No | Free probe | 2.5 mm × 2.5 mm | ~50,000 cells | 1,000 genes per pixel |
2021 | Seq-Scope | Fresh-frozen | A-tailed RNA transcripts | ~0.6 μm | Yes | In situ synthesize | 0.8 mm × 1 mm | - | ~1,617 genes per cell |
2021 | PIC | Fresh-frozen | Transcriptome-wide | 75-5,000 μm | Yes | Photo-caged oligonucleotides | Scales with acquisition time | 10 or more cells in tissue sections | ~8,000 genes |
2021 | sci-Space | Fresh-frozen | A-tailed RNA transcripts | 222 μm | Yes | Beads | 18 mm × 18 mm | 121,909 cells | 1,231 genes per cell |
2022 | Stereo-seq | Fresh-frozen | A-tailed RNA transcripts | 0.5 μm | Yes | In situ synthesize | Up to 132 mm × 132 mm | 100,000-16,900,000 cells | 1,910 UMI and 792 genes per cell |
2022 | Pixel-seq | Fresh-frozen | A-tailed RNA transcripts | ~1 μm | Yes | In situ synthesize | 75 mm × 25 mm | 15,000 cells per section | ~500 genes per cell |
2023 | Slide-tags | Fresh-frozen | A-tailed RNA transcripts | 10 μm | Yes | Beads | Φ3.0 mm | 81,000 cells | 2,377 UMI per cell |
2024 | Visium HD | Fresh-frozen, FFPE | Targeted transcriptomics | 2 μm | Yes | 6.5 mm × 6.5 mm | 20,000-40,000 cells | 7,605 probes | |
2024 | Open-ST | Fresh-frozen | A-tailed RNA | 0.6 μm | Yes | In situ synthesize | 6.3 mm × 89 mm | - | ~1,000 UMI per cell |
Image-based Spatial Transcriptomics | |||||||||
2013 | ISS | Cells, tissue sections | Targeted RNA | - | Yes | Padlock probe | 222 × 166 μm² | 450 cells covering an area of 0.16 mm² | 39 genes |
2014 | FISSEQ | Cells, tissue sections | Untargeted RNA | - | Yes | Fluorescent probe | 10 mm × 10 mm | - | 4,171 genes with size >5 pixels |
2015 | MERFISH | Cells, tissue sections | Targeted RNA | - | Yes | Oligonucleotide probes | Scales with acquisition time | ~100, up to 40,000 cells | 130 genes, up to 10,050 genes |
2018 | STARmap | Tissue sections | Targeted RNA | - | Yes | Padlock probe | 1.7 mm × 1.4 mm × 0.1 mm | > 30,000 cells | 160 to 1,020 genes per sample |
2019 | seqFISH+ | Cells, tissue sections | Targeted RNA | - | Yes | Oligonucleotide probes | Scales with acquisition time | 2,963 cells | 10,000 genes |
2021 | ExSeq | Cells, tissue sections | Untargeted / targeted RNA | - | Yes | Padlock probe | 0.933 mm × 1.140 mm × 0.02 mm | ~2,000 cells | - |
2021 | EASI-FISH | Tissue sections | Targeted RNA | 0.23 × 0.23 × 0.42 μm | Yes | - | - | 80,000 cells | - |
2022 | EEL FISH | Tissue sections | Targeted RNA | - | Yes | Primary probe | Scales with acquisition time | 128,000 cells | 883 genes |
2023 | STARmap PLUS | Tissue sections | Targeted RNA and antibody-based protein | - | Yes | Padlock probe | 194 mm × 194 mm × 345 mm | - | 1,022 genes |
Multi-omics-based | |||||||||
2018 | CODEX | Cells, tissue sections | Antibody-based protein | - | Yes | - | - | - | - |
2020 | OligoFISSEQ | Cells, tissue sections | Genomics | - | No | Barcoded oligopaint probes | - | - | - |
2020 | DBiT-seq | Fresh-frozen | A-tailed RNA transcripts and antibody-based protein | 10 / 25 / 50 μm | No | Free probe | 2.5 mm × 2.5 mm | ~50,000 cells | ~4,000 genes per 50 μm pixel |
2021 | DNA seqFISH+ | Cells | Genomics | N/A (5,616.5 ± 1,551.4 dots per cell) | Yes | Primary probe | Scales with acquisition time | 446 cells | 3,660 chromosomal loci per cell |
2021 | IGS | Cells, tissue sections | Genomics | 400-500 nm | Yes | Hairpin DNA | - | 113 cells | - |
2021 | slide-DNA-seq | Fresh-frozen | Genomics | 10 μm | Yes | Beads | Φ3 mm | 2,274 cells | - |
2022 | CAD-HCR | Cells | Targeted RNA and antibody-based protein | - | Yes | Padlock probe | - | - | - |
2022 | MOSAICA | Cells, FFPE | Targeted RNA and antibody-based protein | - | Yes | Primary code | - | - | - |
2022 | SM-Omics | Fresh-frozen | A-tailed RNA transcripts and antibody-based protein | 55 μm | No | 6.5 mm × 6.5 mm | - | - | |
2022 | Spatial-CUT&Tag | Fresh-frozen | Epigenomics | 20 μm | No | - | - | - | H3K27me3: 9,735; H3K4me3: 3,686 per 20 μm pixel size |
2022 | Spatial-TREX | Fresh-frozen | A-tailed RNA transcripts and CloneIDs | Single cell/55 μm | Yes | - | - | 65,160 cells | 5,000-10,000 genes |
2022 | Perturb-map | Fresh-frozen | A-tailed RNA transcripts and CRISPR-targeted genes | 55 μm | Yes | - | - | - | - |
2022 | slide-TCR-seq | Fresh-frozen | A-tailed RNA transcripts and targeted T cell receptor genes | 10 μm | No | Beads | Φ3.0 mm | - | - |
2022 | SmT | Fresh-frozen | Host transcriptome- and microbiome-wide | 55 μm | No | 6.5 mm × 6.5 mm | - | - | |
2022 | RIBOmap | Cells, tissue sections | Ribosome-bound mRNAs | - | Yes | Tri-probes (split DNA probe, padlock probes, primer probes) | - | 60,481 cells | 5,413 genes |
2022 | epigenomic MERFISH | Cells, tissue sections | Epigenomics | - | Yes | Oligonucleotide probes | Scales with acquisition time | ~5,400 cells | 3 histone modifications, H3K4me3: 127 genes/loci; H3K27ac: 142 target genomic loci |
2023 | spatial ATAC | Fresh-frozen | Epigenomics | 55 μm | No | Splint oligonucleotide, spatially barcoded surface oligonucleotides | - | - | - |
2023 | spatial-CITE-seq | Fresh-frozen | A-tailed RNA transcripts and antibody-based protein | 50 μm | No | Free | 2.5 mm × 2.5 mm | 37,500-50,000 cells | 411 genes and 153 proteins per pixel |
2023 | Stereo-CITE-seq | Fresh-frozen | Targeted RNA and antibody-based protein | 500 nm | Yes | In situ synthesize | Up to 132 mm × 132 mm | 100,000-16,900,000 cells | 4.05K UMI per bin50 |
2023 | Spatial-ATAC-RNA-seq | Fresh-frozen | A-tailed RNA transcripts and transposase chromatin | 20 / 50 μm | no | Free | 2.5 mm × 2.5 mm | 37,500-50,000 cells | - |
2023 | Ex-ST | Fresh-frozen | A-tailed RNA transcripts | 20 μm | No | 6.5 mm × 6.5 mm | 8,000-16,000 cells | ~500 gens per spot | |
2023 | immuno-SABER | Cells | Antibody-based protein | - | Yes | - | - | - | - |
2023 | scDVP | Fresh-frozen | Proteins | - | No | - | - | - | 1,700 proteins |
2023 | scSpaMet | Fresh-frozen, FFPE | Metabolomics and targeted multiplexed protein | - | Yes | - | - | - | - |
Table 2. Comparisn of different spatial omics technologies
Figure 9. Comparison of different spot-based spatial omics technologies
② Type 1. Spatial genomics
○ Example 1. Tumor research: Tumors are heterogeneous. Also, copy number variation can be analyzed.
○ Example 2. Spleen research: Mature immune cells have diverse genetic compositions.
③ Type 2. Spatial transcriptomics: Method of the Year in 2020
○ Example 1. Brain cortex: White matter, Gray matter, L6, L5, L4, L3, L2, L1
○ Example 2. Spleen: Red pulp, White pulp, Intermediate zone
○ Example 3. Skin: Epidermis, Dermis, Smooth muscle
○ Example 4. Kidney: Cortex, Henle loop, Collecting duct
○ Example 5. Liver: Periportal, Periportal, Intermediate
○ Example 6. Testis: I-III, IV-VI, VII-VIII, IX-XII
○ Example 7. Leaf: Upper epidermal wall, Lower epidermal wall
④ 2-1. Spot-based spatial transcriptomics: Many genes + few spots
○ ST (Spatial Transcriptomics)
○ A method where barcoded oligos are randomly dispersed onto the tissue, followed by capturing mRNA from each tissue region.
○ 10X Visium
○ Principle: Attaching spot-specific oligonucleotides to each spot to hybridize with tissue-derived RNA → obtaining spotwise transcriptomes.
○ Surface area: 6.5 mm × 6.5 mm
○ Thickness: 10 ~ 20 μm
○ Number of spots: Up to 4992 (Based on previous version of Visium HD)
○ Distance between spots: 100 μm
○ Diameter of spots: 55 μm
○ Sensitivity: 10,000 transcripts per spot
○ Type 1. Direct Visium (oligo-dT based method)
○ Capturing mRNA using poly dT.
○ Applicable only to FF (fresh-frozen) samples: Reagents used for FFPE are not compatible with direct Visium.
Figure 10. Principle of Visium FF
○ Type 2. Probe-based Visium
○ It can be done in both FF (Fresh Frozen) and FFPE (Formalin-Fixed Paraffin-Embedded) samples. In particular, FFPE (formalin-fixed paraffin-embedded) samples cannot undergo direct Visium due to RNA degradation, where mRNA molecules are fragmented into various pieces.
○ To identify the target mRNA, all three pairs of LHS and RHS must be ligated together: each probe’s length is 25 base pairs. RTL (probe-based RNA-templated ligation chemistry) is utilized for this purpose.
Figure 11. Principle of probe-based Visium
○ Advantage: Superior data quality compared to direct Visium.
○ Disadvantage: Limited freedom in analysis compared to Visium FF, as only genes specified by the probe are detected.
○ Starting from June 2024, 10x will discontinue the Visium FFPE service, except for CytAssist.
○ The CytAssist images represent the distribution of gene expression and are used for image alignment.
○ 10X Visium HD
○ The basic data consists of spots with a diameter of 2 μm, and additional data binned at 8 μm and 16 μm are also provided.
○ Slide-seq and Slide-seq V2 (Diameter of 10 μm)
○ A method where spatial beads are randomly dispersed, followed by in-situ sequencing.
○ 97% of spots consist of one or two cell types.
○ HDST
○ Placing barcoded beads on a patterned wafer followed by serial hybridization.
○ NanoString GeoMx
○ The patent dispute between Nanostring and 10x Genomics in 2023 (ref1, ref2) → The bankruptsy of Nanostring (ref) → M&A of Nanostring (ref)
○ Stereo-seq
○ Oligo patterning engraved on a flow cell is read using Illumina or MGI sequencing, followed by barcode calling.
○ Diameter: 220 nm
○ Distance between spots: 500 or 715 nm
○ Seq-Scope
○ Higher spatial resolution than Visium.
○ Oligo patterning engraved on a flow cell is read using Illumina or MGI sequencing, followed by barcode calling.
○ PIC(photo-isolation chemistry)
○ PIXEL-seq
○ XYZeq (Diameter of 500 μm)
○ Placing tissue on a spatially barcoded microwell → single reverse transcription → cell removal → single-cell sequencing.
○ sci-Space (sc-space; Diameter of 222 μm)
○ Placing tissue on a slide with spatially gridded hashing oligos → tissue permeabilization (oligo transfer) → imaging → nuclear removal → cell fixation → sequencing.
○ sci-RNA-seq
○ TIVA-seq
○ NICHE-seq
○ ZipSeq
○ An imaging technology that processes cells with photocaged oligonucleotides to observe real-time patterned illumination (i.e., zipcode) of RNAs.
○ DBiT-seq
○ Placing barcoded oligos at fixed positions on orthogonal microfluidics to obtain location-specific transcriptomes of the tissue.
○ Slide-Tags (Diameter of 10 μm)
○ CITE-seq (ref1, ref2): Enables parallel comparison of spatial transcriptomics and antibody distribution
Figure 12. Diagram of CITE-seq
○ Connect the 5’ end of oligonucleotide to an antibody using streptavidin-biotin.
○ The oligonucleotide can hybridize complementarily with the oligo-dT primer.
○ Streptavidin-biotin bond can dissociate under reducing conditions.
○ Recently, perturb-CITE-seq was also developed.
○ SPOTS
○ Spatial PrOtein and Transcriptome Sequencing
○ Indirectly assess protein level on Visium using polyadenylated DNA-barcoded antibody.
○ Open-ST
○ MAGIC-seq
④ 2-2. Image-based spatial transcriptomics: Few genes + many spots
○ ISS (In situ sequencing): Technique to sequence RNA at its original location in tissue. Sequencing by ligation
○ Type 1. The first ISS
○ Type 2. ISS with Padlock probe
○ Reverse transcriptase creates cDNA of the RNA target.
○ Padlock probe can hybridize to two regions of the cDNA.
○ Target sequence amplification occurs through Rolling-circle amplification (RCA).
○ RCA product is sequenced in situ by ligation.
○ Type 4. Barcode-based methods
○ Type 5. Gap-filled ISS
○ Fluorescence in situ hybridization (FISH)
○ smFISH (Single molecule FISH) (2008)
○ seqFISH (Sequential FISH) (2014): RNA transcript signals are obtained through DNase I treatment followed by sequential staining and imaging.
○ seqFISH+: A technology that uses 20 probes per encoding round and efficiently divides fluorescence channels to obtain genome-scale transcriptomes.
○ Vizgen - MERSCOPE (Technology name: MERFISH (multiplexed error-robust FISH)): ISH-based.
○ Direct probe hybridization without separate amplification mechanism.
○ Each FISH probe corresponds 1:1 with each gene (though this assumption may not always hold).
○ An error correction method is used for barcode assignment (i.e., barcode calling).
○ Step 1. Fluorescence status varies over time for each FISH probe, capturing multiple images.
○ Step 2. Inferring the corresponding gene by decoding the binary code read from each RNA.
Figure 13. Principle of MERFISH
○ 10x - Xenium: ISS-based.
○ Small amount of padlock probe + Rolling circle amplification (RCA)
○ Step 1. Padlock probe binds complementary RNA transcript in a pincer shape, forming a loop.
○ Step 2. RCA (Rolling circle amplification): After loop formation, the corresponding RNA transcript is amplified.
○ Step 3. Hybridize each RNA transcript with a fluorescent probe, then perform fluorescent imaging → Washing.
○ Step 4. Repeat Step 3 and decode labels for each gene from the generated images.
Figure 14. Principle of Xenium
○ Nanostring - CosMx: ISH-based.
○ Small amount of probe + Branch chain hybridization
○ The patent dispute between Nanostring and 10x Genomics (2023) (ref1, ref2) → The bankruptsy of Nanostring (ref) → M&A of Nanostring (ref)
Figure 15. Principle of CosMx
○ FISSEQ and oligoFISSEQ
○ Veranome
○ Rebus
○ BOLORAMIS
○ STARmap and STARmap PLUS: Sequencing by ligation
○ SEDAL sequencing
○ ExSeq
○ BaristaSeq: Sequencing by synthesis
○ BARSeq and BARSeq2
○ HybISS
○ SABER
○ clampFISH
○ split-FISH
○ SCRINSHOT
○ PLISH
○ osmFISH
○ ExFISH
○ par-seqFISH
○ EASI-FISH
○ SGA
○ corrFISH
○ EEL FISH
⑤ Type 3. Spatial proteomics: Broadly classified into mass spectrometry-based and imaging-based methods.
○ SWITCH
○ MxIF
○ t-CyCIF
○ IBEX
○ DEI
○ CODEX
○ immuno-SABER
○ TSA
○ Opal IHC
○ MIBI
○ IMC
○ HD-MIBI
○ GeoMx Digital Spatial Profiler (DSP): 100 mm scale
○ Using antibodies or gene probes conjugated with UV-cleavable DNA barcodes.
⒃ Other sequencing technologies
① TCR-seq (T cell receptor sequencing): Sequencing used to track T cell subtypes and clones.
② Invade-seq: A sequencing technique for analyzing the host-microbiome.
③ Long-read sequencing: Method of the Year in 2022 (ref)
Short-read seq | Long-read seq | |
---|---|---|
Release Year | Early 2000s | Mid-2010s |
Average Read Length | 150-300 bp | 5,000-10,000 bp |
Accuracy | 99.9% | 95-99% |
Table 3. Difference between short-read seq and long-read seq
Figure 16. Difference between short-read seq and long-read seq
○ Without sequencing gaps, the following analyses can be performed.
Figure 16. long-read sequencing and short-read sequencing
○ Advantage 1. AS analysis(alternative splicing analysis): Can identify alternative splicing events, isoforms, etc.
○ Advantage 2. CNV analysis(copy number variation analysis): For example, the number of repeat sequences is crucial in Huntington’s disease.
○ Advantage 3. Easier integration of epigenetics and transcriptomics
○ Example 1. Pacific Biosciences SMRT (single molecule real-time) sequencing: Average read length is ~20 kb
○ Example 2. Oxford Nanopore Sequencing: Average read length is ~100 kb
○ Average read length: ~100 kb
○ Maximum read length: >2 Mb
○ High error rate: 5-10%. For reference, the error rate of Illumina sequencing is around 0.1-1%.
④ Non-invasive sequencing
○ A technology that allows sequencing without breaking cells.
⑤ Halo-seq: A technique for obtaining the transcriptome of RNAs adjacent to a specific protein.
○ Step 1. Attach a HaloTag domain to a specific target.
○ Step 2. The HaloTag functions as a radical-producing Halo ligand, releasing a hydrogen radical (H∙) from the injected alkyne handle to generate an alkyne handle radical.
○ R-H → R· + H·
○ Step 3. Similarly, the HaloTag generates an RNA radical by releasing a hydrogen radical H· from RNA.
○ RNA-H → RNA· + H·
○ Step 4. The alkyne handle radical combines with the RNA radical.
○ Step 5. React alkyne-RNA with biotin azide to produce biotinylated RNA.
○ Step 6. Separate only the biotinylated RNA using affinity chromatography with streptavidin.
○ Step 7. RNA-seq allows for the detection of RNAs close to the specific target.
○ Reason: Radicals are unstable and cannot travel long distances.
Figure 17. Principle of Halo-seq
⑥ multi-NTT seq (Nanobody tethered transposition followed by sequencing)
⑦ Epigenomics Sequencing(epigenomics sequencing)
⑨ Temporal Sequencing
○ Live-seq
○ TMI
⑩ Spatiotemporal Omics
○ ORBIT (single-molecule DNA origami rotation measurement)
○ 4D spatiotemporal MRI or hyperpolarized MR
○ In vivo 4D omics with transparent mice
⒄ Summary of NGS (next-generation sequencing)
① Cost of genome analysis
○ 2001: $100 million per person based on the Human Genome Project.
○ 2007: $0.1 billion over 4 years.
○ 2008: 454 $1,000,000 / person based on 454 Life Sciences.
○ 2009: $48,000 / person based on Helicos BioSciences.
○ Predicted to be sufficient with $1,000 by 2014 (Nature 456, 23-25, 2008).
② Scale of genome analysis
Figure 18. Trend of genome analysis scale
③ Relationship between depth and coverage
○ sequencing depth (read depth): Indicates how many times a specific nucleotide appears on average.
Figure 19. Definition of depth
○ “10x” means it was read 10 times.
○ Can be defined for each nucleotide.
○ Coverage (c)
○ c := LN / G
○ L: Read length
○ N: Number of reads
○ G: Haploid genome length
○ Total read number is expressed as sequencing depth.
○ The relationship between sequence reads and the reference (e.g., whole genome, a locus) is expressed as coverage.
○ Apart from this distinction, depth and coverage are very similar concepts.
④ Relationship between bulk and read
○ Bulk: total RNA production
○ When the depth is the same, an irrationality arises where RNA read count is inversely proportional to the increase in bulk size. (ref)
○ Example: In spatial transcriptomics, bulk is typically large and depth is low, resulting in low RNA read count.
○ Normalization: Various methods have been introduced to resolve this irrationality.
⑤ Relationship between read count and number of reads
○ If read length is less than 250 bp, it is impossible to detect sequence error.
○ Relationship between read length and number of reads per run: There is a trade-off.
Figure 20. Relationship between read length and number of reads per run
⑥ Relationship between transcriptome read count and gene expression
○ Read count: Actual number of transcripts.
○ Gene expression: Value corrected from read count through normalization process.
Entered: 2015.07.02 23:31
Updated: 2022.03.13 13:11