Korean, Edit

Chapter 10. Genome Project and Sequencing Technology

Recommended Article: 【Biology】 Biology Table of Contents


1. Genome Project

2. Sequencing Technology


a. Pyrosequencing

b. Epigenome Sequencing

c. Bioinformatics Analysis Table of Contents

d. Transcriptomics Pipeline



1. Genome Project

⑴ Overview

① In 1990, under the leadership of Watson and Francis Collins, it was launched as a 15-year project by a consortium of six countries.

② Collaborative research involving over 350 research institutions

○ On June 11, 2000, 84.5% was completed, and a draft was announced.

○ On April 15, 2003, the final version with 99.99% accuracy was released.

○ More than 2,800 researchers participated over 13 years, with a total cost of $2.7 billion.

③ Side effects of human genome research

○ Birth of bioinformatics.

○ Facilitating the development of human protein production processes

○ Insulin: The first protein with a determined sequence.

○ Facilitating the development of automated DNA sequencing instruments

○ Promoting the genomic analysis of other biologically applicable organisms

Methodology 1. Stepwise sequencing method: Scientists’ approach

Step 1: Determining restriction enzyme recognition sites

○ After cutting DNA with restriction enzymes and performing electrophoresis, the size of each fragment can be determined.

○ Treating two restriction enzymes in various ways reveals the relative distances between their recognition sites.

Step 2: Constructing a gene map

○ Determining the relative distances of genes on the chromosome.

○ Inferring the distance between genes through recombination rates.

Step 3: Physical map (DNA map) construction

○ Significance of determining restriction enzyme recognition sites: By using each restriction enzyme recognition site as a reference, a physical map is progressively constructed based on the sequence information of the obtained fragments.

○ Significance of gene mapping: Once a physical map is created, it can be compared with the gene map. Introns exist between genes.

○ Approach Using a single library.

Methodology 2. Shotgun Sequencing (Celera, J. Craig Venter): Entrepreneurial approach

① Cutting one DNA in multiple ways.

② Determining the entire nucleotide sequence of fragments cut by a single method.

○ Since the length of the analyzed sample is limited, the complete DNA sequence cannot be determined at once.

③ In each method, nucleotide sequences are arranged randomly and repeated until consistent results are obtained.

④ A method based on computer science.

⑷ Scientists’ approach vs. Entrepreneurial approach

① Scientific Community: The Human Genome Project (HGP), led by Watson and Francis Collins. The final cost was $2.7 billion.

② Entrepreneurial Sector: Celera Genomics (launched in 1998), led by Craig Venter. The final cost was $300 million.

③ Both groups released a draft of the human genome in 2001.

④ The scientific community was dissatisfied, believing that the entrepreneurial sector was taking credit for their contributions and investments.

⑤ The entrepreneurial sector was frustrated with the scientific community for not sharing data, leading to the development of new methodologies.

⑥ The final completion of the genome map was acknowledged as a joint achievement of both groups.


image

Figure 1. Stepwise Sequencing Method and Shotgun Sequencing Method



2. Sequencing Technology

⑴ Overview

① DNA Sequencing: Applying the principle of DNA replication.

○ Template: Each strand of DNA

○ Substrate: dNTP (dATP, dCTP, dGTP, dTTP)

○ NTPs have a hydroxyl (-OH) group attached to the 2’ carbon and serve as the building blocks for RNA polymerization.

○ DNA polymerase: Phosphate of the next nucleotide binds to the 3’-OH of deoxyribose.

○ Synthesis direction: 5’ → 3’, forming complementary base pairs with the template.

○ ddNTP: DNA polymerization is terminated because the 3rd carbon lacks an OH group.

② RNA Sequencing: Applying the principle of RNA transcription.

③ Sequencing by Synthesis

○ Definition: A method of determining nucleotide sequences by detecting signals emitted when a polymerase incorporates nucleotides into the primer sequence.

Type 1. CRT (Cyclic Reversible Termination): A cyclic process consisting of incorporation, imaging, and deprotection.

Type 2. SNA (Single Nucleotide Addition): Adds only one nucleotide at a time.

○ Examples: Ion semiconductor sequencing (PGM), Pyrosequencing (Roche 454)

④ Sequencing by Ligation

○ Definition: A method of determining nucleotide sequences by detecting signals emitted when a ligase connects a hybridized sequence to the primer sequence.

○ 1 or more nucleotide can be incorporated at a time (e.g., dibase).

○ Examples: Nanoball sequencing, Thermo Fisher SOLiD

⑵ In vitro cloning: The very first sequencing method

⑶ Dideoxy Chain Termination Method (= Sanger sequencing): Reported in 1977. Sanger’s second Nobel Prize.

① Substrate: dNTP + ddNTP (in small quantities) + buffer (pH stabilization)

○ ddNTP lacks a 3’-OH group, so it terminates the polymerization reaction.

○ If ddNTP is added in large quantities, all template DNAs are quickly terminated.

② Primer

○ Example: p32-primer (CTAG)

③ 1st. Addition of template DNA and polymerase.

④ 2nd. After the polymerization reaction, heat is applied to separate the replicated strands.

⑤ 3rd. After electrophoresis, the sequence is read from an X-ray film or detected using fluorescence.


image

Figure 2. Process of Dideoxy Chain Termination Method


⑥ Advantages: It can read very long strands, so it is still used in laboratories.

⑦ Disadvantages: Requires a large amount of the same DNA strand.

⑷ Dye-dideoxy chain termination method: Using laser.

① A small amount of ddNTPs is added to dNTPs, each labeled with one of four fluorescent dyes.

② Enables automated DNA sequencing.


image

Figure 3. Process of the Dye-dideoxy chain termination method


⑸ Ion Semiconductor Sequencing (PGM)

① Sequencing by Synthesis.

② Prone to insertion/deletion errors.

Pyrosequencing (Roche 454)

① Definition: A DNA sequencing method that relies on the proportional luminescence produced based on the amount of pyrophosphate generated during DNA synthesis. ~400 bp per read.

② Sequencing by Synthesis: Light is emitted when a nucleotide is incorporated, appearing as peaks in a pyrogram.

③ Prone to insertion/deletion errors.

④ No longer in use.


image

Figure 4. Pyrosequencing diagram


image

image

image

image

image

image

Figure 5. Pyrosequencing process


⑺ Nanoball sequencing

① Sequenching by Ligation

⑻ Thermo Fisher SOLiD

① Sequenching by Ligation

② No longer in use.

⑼ Illumina solid-phase amplification (ref, YouTube clip)


image

Figure 6. Illumina solid-phase amplification


image

Figure 7. Fluorescent color distribution photo


① 1st. Fragmentation: Randomly cut the given DNA sample.

② 2nd. Gel-based size selection: If necessary, the size of each DNA fragment can be restricted.

③ 3rd. Adaptor binding: Adapters are attached to both ends of all DNA sample fragments using a transposome.

④ 4th. Amplification

○ 4th - 1st. Denature DNA into single-strands.

○ 4th - 2nd. Attach single-stranded DNA to the Illumina flow cell.

○ 4th - 3rd. Add enzymes to allow single-stranded DNA to form bridges on a solid-phase substrate.

○ 4th - 4th. After adding primers to the single-stranded DNA bridges, primers can bind to the bridges.

○ 4th - 5th. Add unlabeled single-stranded DNA and induce DNA synthesis: Forms double-stranded DNA bridges.

○ 4th - 6th. Through denaturation, the double-stranded DNA bridge is converted into anchored single-stranded DNA.

○ 4th - 7th. Repeat the above six steps to create anchored single-stranded clusters with the same base sequence.

Feature: Anchored single-stranded clusters form millions of clusters.

⑤ 5th. Only the forward strand is retained to prevent unwanted priming.

⑥ 6th. Sequencing by synthesis (SBS)

○ 6th - 1st. Four types of labeled reversible terminators (based on the nucleotide type), primers, and DNA polymerase are added.

○ 6th - 2nd. When labeled reversible terminator nucleotides form phosphodiester bonds, fluorescence is emitted.

○ 6th - 3rd. Obtain a fluorescent color distribution image of each cluster.

○ 6th - 4th. Washing

○ 6th - 5th. Repeat the above four steps to determine the entire nucleotide sequence.

Type 1. Single-end sequencing (SES): Sequencing with only one adapter.

Type 2. Paired-end sequencing (PES): Sequencing with both adapters.

○ Initially, sequence with one adapter (Read1 acquisition), then sequence with the opposite adapter (Read2 acquisition).

○ Read1 and Read2 from the same DNA fragment can be easily matched since they come from the same cluster.

○ Advantages: Higher accuracy (due to Read1 and Read2 comparison), easy detection of DNA variations, easy analysis of repetitive sequences, and easy mapping between different species.

○ Disadvantages: Higher cost and more steps required than SES.

⑦ ~100 bp / read: Recently, up to 250 bp.

⑽ WGS (Whole Genome Sequencing)

① SNV, insertion, deletion, structural variant, CNV

② Sequencing depth > 30X

⑾ WES (Whole Exon Sequencing)

① SNVs, insertions, deletions, and SNPs only in protein-coding genes.

② Sequencing depth > 50X ~ 100X

③ Cost-effective.

⑿ RNA-seq

① Overview

○ Microarray data has raw data in the form of continuous values, whereas RNA-seq has raw data in the form of count data.

○ Compared to microarray data, RNA-seq can robustly capture signals even for genes with very low or very high expression levels.

② 1st. Microdissection: Separating specific tissues for RNA extraction.

○ LCM (Laser Capture Microdissection): Cutting specific tissues with a laser beam. Robust but labor-intensive.

TOMO-seq: Using cryosection and computer-based 3D sectioning. Not suitable for clinical purposes.

Transcriptome in vivo analysis

ProximID

STRP-seq

③ 2nd. Poly T is attached to recognize the poly A tail of RNA.

④ 3rd. Fragment RNA: 200-400 nt.

⑤ 4th. Attach primers to RNA.

⑥ 5th. First cDNA synthesis.

⑦ 6th. Second cDNA synthesis.

⑧ 7th. Process the 3’ and 5’ ends of RNA.

⑨ 8th. Ligate DNA sequencing adapters.

⑩ 9th. Amplify ligated fragments with PCR.

Application 1. dUTP method: A representative method for strand-specific sequencing.

○ Background: Used for studying biological functions based on RNA orientation (e.g., regulation of antisense miRNA).

Step 1. DNA &RNA hybrid: Synthesize cDNA (first or anti-sense strand) using dT primers (targeting mRNA poly-A tails) and reverse transcriptase.

5’-//-U-//-AAAAAA-3’

3’-//-A-//-TTTTTT-5’

Step 2. ds cDNA: Using dUTP instead of dTTP, the second (or sense) strand of cDNA is synthesized using the first strand of cDNA as a template.

3’-//-A-//-TTTTTT-5’

5’-//-U-//-AAAAAA-3’

Step 3. Ligated ds cDNA: Connect Y-adaptors to both ends of ds cDNA.

Step 4. When treated with UDG (uracil-DNA glycosylase), the second strand of DNA containing uracil is degraded.

Step 5. Amplify the remaining reverse antisense strand (first strand) to create the library.

○ In the library raw data, _1.fastq represents the first strand, while _2.fastq represents the second strand.

○ Thus, _2.fastq represents the original RNA profile.

Single-cell sequencing

① Types

○ scDNA-seq

○ scRNA-seq (Method of the Year in 2013): Chromium, Smart-seq, etc.

○ single-cell epigenetics sequencing

Step 1. Isolation of single cells

Method 1. Simple isolation: Very early method.

Method 2. Based on FACS or LCM (laser microdissection)

Method 3. Acoustic separation

○ Separates single cells hydrodynamically, causing minimal impact on cells.

○ CyTOF (cytometry by time of flight) is a representative method.

Method 4. Immuno-magnetic separation

○ Attach magnets to cells.

○ Can obtain a large number of cells.

○ Classified into cases where centrifugation is required and where it is not.

○ It can be largely divided into droplet-based platforms and plate-based platforms.

Step 2. Reverse transcription

Step 3. cDNA amplification

Step 4. Library construction: Drop-seq, etc.

⑥ Single-cell genomics (scDNA-seq) + Single-cell transcriptomics (scRNA-seq)

○ It allows understanding of how mutation patterns in the genome are related to gene expression in the transcriptome.

○ Technologies for separating DNA and RNA: G&T seq, SIDR-seq, DNTR-seq

Single Nucleus RNA Sequencing (snRNA-seq)

Purpose 1. Muscles are multinucleated cells, so they need to be analyzed at the nuclear level as they are not captured by scRNA-seq.

Purpose 2. snRNA-seq captures more various RNA, including introns, pre-mRNA, non-coding RNA, compared to scRNA-seq.

Purpose 3. In snRNA-seq, nuclear RNA is predominantly captured: But a small amount of cytoplasmic RNA is also detected.

Spatial Omics ( Supplement)

① Overview


image

Figure 8. Overview of spatial omics


image

Table 1. Comparison of different spatial omics technologies


Year Method Sample Target Resolution Single-cell? Probe Area # of cells # of genes
Spot-based Spatial Transcriptomics                  
2014 Tomo-seq Fresh-frozen Transcriptome-wide - No - - - 12,000 genes per section
2016 Visium Fresh-frozen and FFPE A-tailed RNA transcripts and targeted genes  100 μm No Print 6.5 mm × 6.5 mm 20,000 ~ 40,000 cells ~1.700 genes per spot
2017 Geo-seq Fresh-frozen Transcriptome-wide - Yes - - 5-40 cells per section ~8,000 genes
2019 Slide-seq/V2 Fresh-frozen A-tailed RNA transcripts 10 μm No Beads Φ3.0 mm - 550 UMI per read
2019 HDST Fresh-frozen A-tailed RNA transcripts 2 μm No Beads 5.7 mm × 2.4 mm ~60,000 cells per individual hexagonal 44 UMI per 5× beads
2020 DBiT-seq FFPE A-tailed RNA transcripts 25 μm No Free probe 2.5 mm × 2.5 mm ~50,000 cells 1,000 genes per pixel
2021 Seq-Scope Fresh-frozen A-tailed RNA transcripts ~0.6 μm Yes In situ synthesize 0.8 mm × 1 mm - ~1,617 genes per cell
2021 PIC Fresh-frozen Transcriptome-wide 75-5,000 μm Yes Photo-caged oligonucleotides Scales with acquisition time 10 or more cells in tissue sections ~8,000 genes
2021 sci-Space Fresh-frozen A-tailed RNA transcripts 222 μm Yes Beads 18 mm × 18 mm 121,909 cells 1,231 genes per cell
2022 Stereo-seq Fresh-frozen A-tailed RNA transcripts 0.5 μm Yes In situ synthesize Up to 132 mm × 132 mm 100,000-16,900,000 cells 1,910 UMI and 792 genes per cell
2022 Pixel-seq Fresh-frozen A-tailed RNA transcripts ~1 μm Yes In situ synthesize 75 mm × 25 mm 15,000 cells per section ~500 genes per cell
2023 Slide-tags Fresh-frozen A-tailed RNA transcripts 10 μm Yes Beads Φ3.0 mm 81,000 cells 2,377 UMI per cell
2024 Visium HD Fresh-frozen, FFPE Targeted transcriptomics 2 μm Yes Print 6.5 mm × 6.5 mm 20,000-40,000 cells 7,605 probes
2024 Open-ST Fresh-frozen A-tailed RNA 0.6 μm Yes In situ synthesize 6.3 mm × 89 mm - ~1,000 UMI per cell
Image-based Spatial Transcriptomics                  
2013 ISS Cells, tissue sections Targeted RNA - Yes Padlock probe 222 × 166 μm² 450 cells covering an area of 0.16 mm² 39 genes
2014 FISSEQ Cells, tissue sections Untargeted RNA - Yes Fluorescent probe 10 mm × 10 mm - 4,171 genes with size >5 pixels
2015 MERFISH Cells,  tissue sections Targeted RNA - Yes Oligonucleotide probes Scales with acquisition time ~100, up to 40,000 cells 130 genes, up to 10,050 genes
2018 STARmap Tissue sections Targeted RNA - Yes Padlock probe 1.7 mm × 1.4 mm × 0.1 mm > 30,000 cells 160 to 1,020 genes per sample
2019 seqFISH+ Cells, tissue sections Targeted RNA - Yes Oligonucleotide probes Scales with acquisition time 2,963 cells 10,000 genes
2021 ExSeq Cells, tissue sections Untargeted / targeted RNA - Yes Padlock probe 0.933 mm × 1.140 mm × 0.02 mm ~2,000 cells -
2021 EASI-FISH Tissue sections Targeted RNA 0.23 × 0.23 × 0.42 μm Yes - - 80,000 cells -
2022 EEL FISH Tissue sections Targeted RNA - Yes Primary probe Scales with acquisition time 128,000 cells 883 genes
2023 STARmap PLUS Tissue sections Targeted RNA and antibody-based protein - Yes Padlock probe 194 mm × 194 mm × 345 mm - 1,022 genes
Multi-omics-based                  
2018 CODEX Cells, tissue sections Antibody-based protein - Yes - - - -
2020 OligoFISSEQ  Cells, tissue sections Genomics - No Barcoded oligopaint probes - - -
2020 DBiT-seq Fresh-frozen A-tailed RNA transcripts and antibody-based protein 10 / 25 / 50 μm No Free probe 2.5 mm × 2.5 mm ~50,000 cells ~4,000 genes per 50 μm pixel
2021 DNA seqFISH+ Cells Genomics N/A (5,616.5 ± 1,551.4 dots per cell) Yes Primary probe Scales with acquisition time 446 cells 3,660 chromosomal loci per cell
2021 IGS Cells, tissue sections Genomics 400-500 nm Yes Hairpin DNA - 113 cells -
2021 slide-DNA-seq Fresh-frozen Genomics 10 μm Yes Beads Φ3 mm 2,274 cells -
2022 CAD-HCR Cells Targeted RNA and antibody-based protein - Yes Padlock probe - - -
2022 MOSAICA Cells, FFPE Targeted RNA and antibody-based protein - Yes Primary code - - -
2022 SM-Omics Fresh-frozen A-tailed RNA transcripts and antibody-based protein 55 μm No Print 6.5 mm × 6.5 mm - -
2022 Spatial-CUT&Tag Fresh-frozen Epigenomics 20 μm No - - - H3K27me3: 9,735; H3K4me3: 3,686 per 20 μm pixel size
2022 Spatial-TREX Fresh-frozen A-tailed RNA transcripts and CloneIDs Single cell/55 μm Yes - - 65,160 cells 5,000-10,000 genes
2022 Perturb-map Fresh-frozen A-tailed RNA transcripts and CRISPR-targeted genes 55 μm Yes - - - -
2022 slide-TCR-seq Fresh-frozen A-tailed RNA transcripts and targeted T cell receptor genes 10 μm No Beads Φ3.0 mm - -
2022 SmT Fresh-frozen Host transcriptome- and microbiome-wide 55 μm No Print 6.5 mm × 6.5 mm - -
2022 RIBOmap Cells, tissue sections Ribosome-bound mRNAs - Yes Tri-probes (split DNA probe, padlock probes, primer probes) - 60,481 cells 5,413 genes
2022 epigenomic MERFISH Cells, tissue sections Epigenomics - Yes Oligonucleotide probes Scales with acquisition time ~5,400 cells 3 histone modifications, H3K4me3: 127 genes/loci; H3K27ac: 142 target genomic loci
2023 spatial ATAC Fresh-frozen Epigenomics 55 μm No Splint oligonucleotide, spatially barcoded surface oligonucleotides - - -
2023 spatial-CITE-seq Fresh-frozen A-tailed RNA transcripts and antibody-based protein 50 μm No Free 2.5 mm × 2.5 mm 37,500-50,000 cells 411 genes and 153 proteins per pixel
2023 Stereo-CITE-seq Fresh-frozen Targeted RNA and antibody-based protein 500 nm Yes In situ synthesize Up to 132 mm × 132 mm 100,000-16,900,000 cells 4.05K UMI per bin50
2023 Spatial-ATAC-RNA-seq Fresh-frozen A-tailed RNA transcripts and transposase chromatin 20 / 50 μm no Free 2.5 mm × 2.5 mm 37,500-50,000 cells -
2023 Ex-ST Fresh-frozen A-tailed RNA transcripts 20 μm No Print 6.5 mm × 6.5 mm 8,000-16,000 cells ~500 gens per spot
2023 immuno-SABER Cells Antibody-based protein - Yes - - - -
2023 scDVP Fresh-frozen Proteins - No - - - 1,700 proteins
2023 scSpaMet Fresh-frozen, FFPE Metabolomics and targeted multiplexed protein - Yes - - - -

Table 2. Comparisn of different spatial omics technologies


image

Figure 9. Comparison of different spot-based spatial omics technologies


Type 1. Spatial genomics

Example 1. Tumor research: Tumors are heterogeneous. Also, copy number variation can be analyzed.

Example 2. Spleen research: Mature immune cells have diverse genetic compositions.

Type 2. Spatial transcriptomics: Method of the Year in 2020

Example 1. Brain cortex: White matter, Gray matter, L6, L5, L4, L3, L2, L1

Example 2. Spleen: Red pulp, White pulp, Intermediate zone

Example 3. Skin: Epidermis, Dermis, Smooth muscle

Example 4. Kidney: Cortex, Henle loop, Collecting duct

Example 5. Liver: Periportal, Periportal, Intermediate

Example 6. Testis: I-III, IV-VI, VII-VIII, IX-XII

Example 7. Leaf: Upper epidermal wall, Lower epidermal wall

2-1. Spot-based spatial transcriptomics: Many genes + few spots

ST (Spatial Transcriptomics)

○ A method where barcoded oligos are randomly dispersed onto the tissue, followed by capturing mRNA from each tissue region.

10X Visium

○ Principle: Attaching spot-specific oligonucleotides to each spot to hybridize with tissue-derived RNA → obtaining spotwise transcriptomes.

○ Surface area: 6.5 mm × 6.5 mm

○ Thickness: 10 ~ 20 μm

○ Number of spots: Up to 4992 (Based on previous version of Visium HD)

○ Distance between spots: 100 μm

○ Diameter of spots: 55 μm

○ Sensitivity: 10,000 transcripts per spot

Type 1. Direct Visium (oligo-dT based method)

○ Capturing mRNA using poly dT.

○ Applicable only to FF (fresh-frozen) samples: Reagents used for FFPE are not compatible with direct Visium.


image

Figure 10. Principle of Visium FF


Type 2. Probe-based Visium

○ It can be done in both FF (Fresh Frozen) and FFPE (Formalin-Fixed Paraffin-Embedded) samples. In particular, FFPE (formalin-fixed paraffin-embedded) samples cannot undergo direct Visium due to RNA degradation, where mRNA molecules are fragmented into various pieces.

○ To identify the target mRNA, all three pairs of LHS and RHS must be ligated together: each probe’s length is 25 base pairs. RTL (probe-based RNA-templated ligation chemistry) is utilized for this purpose.


image

Figure 11. Principle of probe-based Visium


○ Advantage: Superior data quality compared to direct Visium.

○ Disadvantage: Limited freedom in analysis compared to Visium FF, as only genes specified by the probe are detected.

○ Starting from June 2024, 10x will discontinue the Visium FFPE service, except for CytAssist.

○ The CytAssist images represent the distribution of gene expression and are used for image alignment.

○ 10X Visium HD

○ The basic data consists of spots with a diameter of 2 μm, and additional data binned at 8 μm and 16 μm are also provided.

Slide-seq and Slide-seq V2 (Diameter of 10 μm)

○ A method where spatial beads are randomly dispersed, followed by in-situ sequencing.

○ 97% of spots consist of one or two cell types.

HDST

○ Placing barcoded beads on a patterned wafer followed by serial hybridization.

○ NanoString GeoMx

○ The patent dispute between Nanostring and 10x Genomics in 2023 (ref1, ref2) → The bankruptsy of Nanostring (ref) → M&A of Nanostring (ref)

○ Stereo-seq

○ Oligo patterning engraved on a flow cell is read using Illumina or MGI sequencing, followed by barcode calling.

○ Diameter: 220 nm

○ Distance between spots: 500 or 715 nm

○ Seq-Scope

○ Higher spatial resolution than Visium.

○ Oligo patterning engraved on a flow cell is read using Illumina or MGI sequencing, followed by barcode calling.

PIC(photo-isolation chemistry)

○ PIXEL-seq

XYZeq (Diameter of 500 μm)

○ Placing tissue on a spatially barcoded microwell → single reverse transcription → cell removal → single-cell sequencing.

sci-Space (sc-space; Diameter of 222 μm)

○ Placing tissue on a slide with spatially gridded hashing oligos → tissue permeabilization (oligo transfer) → imaging → nuclear removal → cell fixation → sequencing.

○ sci-RNA-seq

○ TIVA-seq

○ NICHE-seq

ZipSeq

○ An imaging technology that processes cells with photocaged oligonucleotides to observe real-time patterned illumination (i.e., zipcode) of RNAs.

DBiT-seq

○ Placing barcoded oligos at fixed positions on orthogonal microfluidics to obtain location-specific transcriptomes of the tissue.

○ Slide-Tags (Diameter of 10 μm)

○ CITE-seq (ref1, ref2): Enables parallel comparison of spatial transcriptomics and antibody distribution


image

Figure 12. Diagram of CITE-seq


○ Connect the 5’ end of oligonucleotide to an antibody using streptavidin-biotin.

○ The oligonucleotide can hybridize complementarily with the oligo-dT primer.

○ Streptavidin-biotin bond can dissociate under reducing conditions.

○ Recently, perturb-CITE-seq was also developed.

○ SPOTS

○ Spatial PrOtein and Transcriptome Sequencing

○ Indirectly assess protein level on Visium using polyadenylated DNA-barcoded antibody.

○ Open-ST

○ MAGIC-seq

2-2. Image-based spatial transcriptomics: Few genes + many spots

○ ISS (In situ sequencing): Technique to sequence RNA at its original location in tissue. Sequencing by ligation

Type 1. The first ISS

Type 2. ISS with Padlock probe

○ Reverse transcriptase creates cDNA of the RNA target.

○ Padlock probe can hybridize to two regions of the cDNA.

○ Target sequence amplification occurs through Rolling-circle amplification (RCA).

○ RCA product is sequenced in situ by ligation.

Type 3. ISS using fluorescent probes and cross-linking

Type 4. Barcode-based methods

Type 5. Gap-filled ISS

Fluorescence in situ hybridization (FISH)

smFISH (Single molecule FISH) (2008)

seqFISH (Sequential FISH) (2014): RNA transcript signals are obtained through DNase I treatment followed by sequential staining and imaging.

seqFISH+: A technology that uses 20 probes per encoding round and efficiently divides fluorescence channels to obtain genome-scale transcriptomes.

Vizgen - MERSCOPE (Technology name: MERFISH (multiplexed error-robust FISH)): ISH-based.

○ Direct probe hybridization without separate amplification mechanism.

○ Each FISH probe corresponds 1:1 with each gene (though this assumption may not always hold).

○ An error correction method is used for barcode assignment (i.e., barcode calling).

Step 1. Fluorescence status varies over time for each FISH probe, capturing multiple images.

Step 2. Inferring the corresponding gene by decoding the binary code read from each RNA.


image

Figure 13. Principle of MERFISH


10x - Xenium: ISS-based.

○ Small amount of padlock probe + Rolling circle amplification (RCA)

Step 1. Padlock probe binds complementary RNA transcript in a pincer shape, forming a loop.

Step 2. RCA (Rolling circle amplification): After loop formation, the corresponding RNA transcript is amplified.

Step 3. Hybridize each RNA transcript with a fluorescent probe, then perform fluorescent imaging → Washing.

Step 4. Repeat Step 3 and decode labels for each gene from the generated images.


image

Figure 14. Principle of Xenium


Nanostring - CosMx: ISH-based.

○ Small amount of probe + Branch chain hybridization

○ The patent dispute between Nanostring and 10x Genomics (2023) (ref1, ref2) → The bankruptsy of Nanostring (ref) → M&A of Nanostring (ref)


image

Figure 15. Principle of CosMx


○ FISSEQ and oligoFISSEQ

○ Veranome

○ Rebus

○ BOLORAMIS

○ STARmap and STARmap PLUS: Sequencing by ligation

○ SEDAL sequencing

○ ExSeq

○ BaristaSeq: Sequencing by synthesis

○ BARSeq and BARSeq2

○ HybISS

○ SABER

○ clampFISH

○ split-FISH

○ SCRINSHOT

○ PLISH

○ osmFISH

○ ExFISH

○ par-seqFISH

○ EASI-FISH

○ SGA

○ corrFISH

○ EEL FISH

Type 3. Spatial proteomics: Broadly classified into mass spectrometry-based and imaging-based methods.

○ SWITCH

○ MxIF

○ t-CyCIF

○ IBEX

○ DEI

○ CODEX

○ immuno-SABER

○ TSA

○ Opal IHC

○ MIBI

○ IMC

○ HD-MIBI

○ GeoMx Digital Spatial Profiler (DSP): 100 mm scale

○ Using antibodies or gene probes conjugated with UV-cleavable DNA barcodes.

4i multiplexed imaging

⒃ Other sequencing technologies

① TCR-seq (T cell receptor sequencing): Sequencing used to track T cell subtypes and clones.

② Invade-seq: A sequencing technique for analyzing the host-microbiome.

③ Long-read sequencing: Method of the Year in 2022 (ref)


  Short-read seq Long-read seq
Release Year Early 2000s Mid-2010s
Average Read Length 150-300 bp 5,000-10,000 bp
Accuracy 99.9% 95-99%

Table 3. Difference between short-read seq and long-read seq


image

Figure 16. Difference between short-read seq and long-read seq


○ Without sequencing gaps, the following analyses can be performed.


image

Figure 16. long-read sequencing and short-read sequencing


Advantage 1. AS analysis(alternative splicing analysis): Can identify alternative splicing events, isoforms, etc.

Advantage 2. CNV analysis(copy number variation analysis): For example, the number of repeat sequences is crucial in Huntington’s disease.

Advantage 3. Easier integration of epigenetics and transcriptomics

Example 1. Pacific Biosciences SMRT (single molecule real-time) sequencing: Average read length is ~20 kb

Example 2. Oxford Nanopore Sequencing: Average read length is ~100 kb

○ Average read length: ~100 kb

○ Maximum read length: >2 Mb

○ High error rate: 5-10%. For reference, the error rate of Illumina sequencing is around 0.1-1%.

④ Non-invasive sequencing

○ A technology that allows sequencing without breaking cells.

Halo-seq: A technique for obtaining the transcriptome of RNAs adjacent to a specific protein.

Step 1. Attach a HaloTag domain to a specific target.

Step 2. The HaloTag functions as a radical-producing Halo ligand, releasing a hydrogen radical (H∙) from the injected alkyne handle to generate an alkyne handle radical.

○ R-H → R· + H·

Step 3. Similarly, the HaloTag generates an RNA radical by releasing a hydrogen radical H· from RNA.

○ RNA-H → RNA· + H·

Step 4. The alkyne handle radical combines with the RNA radical.

Step 5. React alkyne-RNA with biotin azide to produce biotinylated RNA.

Step 6. Separate only the biotinylated RNA using affinity chromatography with streptavidin.

Step 7. RNA-seq allows for the detection of RNAs close to the specific target.

○ Reason: Radicals are unstable and cannot travel long distances.


image

Figure 17. Principle of Halo-seq


⑥ multi-NTT seq (Nanobody tethered transposition followed by sequencing)

Epigenomics Sequencing(epigenomics sequencing)

3D Sequencing

⑨ Temporal Sequencing

Record-seq

Live-seq

TMI

Molecular recording

⑩ Spatiotemporal Omics

ORBIT (single-molecule DNA origami rotation measurement)

○ 4D spatiotemporal MRI or hyperpolarized MR

○ In vivo 4D omics with transparent mice

⒄ Summary of NGS (next-generation sequencing)

① Cost of genome analysis

○ 2001: $100 million per person based on the Human Genome Project.

○ 2007: $0.1 billion over 4 years.

○ 2008: 454 $1,000,000 / person based on 454 Life Sciences.

○ 2009: $48,000 / person based on Helicos BioSciences.

○ Predicted to be sufficient with $1,000 by 2014 (Nature 456, 23-25, 2008).

② Scale of genome analysis


image

Figure 18. Trend of genome analysis scale


③ Relationship between depth and coverage

○ sequencing depth (read depth): Indicates how many times a specific nucleotide appears on average.


image

Figure 19. Definition of depth


○ “10x” means it was read 10 times.

○ Can be defined for each nucleotide.

○ Coverage (c)

○ c := LN / G

○ L: Read length

○ N: Number of reads

○ G: Haploid genome length

Comparison of depth and coverage

○ Total read number is expressed as sequencing depth.

○ The relationship between sequence reads and the reference (e.g., whole genome, a locus) is expressed as coverage.

○ Apart from this distinction, depth and coverage are very similar concepts.

④ Relationship between bulk and read

○ Bulk: total RNA production

○ When the depth is the same, an irrationality arises where RNA read count is inversely proportional to the increase in bulk size. (ref)

○ Example: In spatial transcriptomics, bulk is typically large and depth is low, resulting in low RNA read count.

○ Normalization: Various methods have been introduced to resolve this irrationality.

⑤ Relationship between read count and number of reads

○ If read length is less than 250 bp, it is impossible to detect sequence error.

○ Relationship between read length and number of reads per run: There is a trade-off.


image

Figure 20. Relationship between read length and number of reads per run


⑥ Relationship between transcriptome read count and gene expression

○ Read count: Actual number of transcripts.

○ Gene expression: Value corrected from read count through normalization process.



Entered: 2015.07.02 23:31

Updated: 2022.03.13 13:11

results matching ""

    No results matching ""