Chapter 10. Genome Project and Sequencing Technology

Recommended Article: 【Biology】 Biology Table of Contents

b. Bioinformatics Analysis Table of Contents

1. Genome Project

⑴ Overview

① In 1990, under the leadership of Watson and Francis Collins, it was launched as a 15-year project by a consortium of six countries.

② Collaborative research involving over 350 research institutions

○ On June 11, 2000, 84.5% was completed, and a draft was announced.

○ On April 15, 2003, the final version with 99.99% accuracy was released.

○ More than 2,800 researchers participated over 13 years, with a total cost of $2.7 billion.

③ Side effects of human genome research

○ Birth of bioinformatics.

○ Facilitating the development of human protein production processes

○ Insulin: The first protein with a determined sequence.

○ Facilitating the development of automated DNA sequencing instruments

○ Promoting the genomic analysis of other biologically applicable organisms

⑵ Methodology 1. Stepwise sequencing method: Scientists’ approach

① Step 1: Determining restriction enzyme recognition sites

○ After cutting DNA with restriction enzymes and performing electrophoresis, the size of each fragment can be determined.

○ Treating two restriction enzymes in various ways reveals the relative distances between their recognition sites.

② Step 2: Constructing a gene map

○ Determining the relative distances of genes on the chromosome.

○ Inferring the distance between genes through recombination rates.

③ Step 3: Physical map (DNA map) construction

○ Significance of determining restriction enzyme recognition sites: By using each restriction enzyme recognition site as a reference, a physical map is progressively constructed based on the sequence information of the obtained fragments.

○ Significance of gene mapping: Once a physical map is created, it can be compared with the gene map. Introns exist between genes.

○ Approach Using a single library.

⑶ Methodology 2. Shotgun Sequencing (Celera, J. Craig Venter): Entrepreneurial approach

① Cutting one DNA in multiple ways.

② Determining the entire nucleotide sequence of fragments cut by a single method.

○ Since the length of the analyzed sample is limited, the complete DNA sequence cannot be determined at once.

③ In each method, nucleotide sequences are arranged randomly and repeated until consistent results are obtained.

④ A method based on computer science.

⑷ Scientists’ approach vs. Entrepreneurial approach

① Scientific Community: The Human Genome Project (HGP), led by Watson and Francis Collins. The final cost was $2.7 billion.

② Entrepreneurial Sector: Celera Genomics (launched in 1998), led by Craig Venter. The final cost was $300 million.

③ Both groups released a draft of the human genome in 2001.

④ The scientific community was dissatisfied, believing that the entrepreneurial sector was taking credit for their contributions and investments.

⑤ The entrepreneurial sector was frustrated with the scientific community for not sharing data, leading to the development of new methodologies.

⑥ The final completion of the genome map was acknowledged as a joint achievement of both groups.

Figure 1. Stepwise Sequencing Method and Shotgun Sequencing Method

⑸ Bioinformatics Consortium After Genome Project

① ENCODE Project

② 1000 Genome Project

③ GTEx consortium

④ 4D nucleus consortium

⑤ Pan-genome consortium

⑥ Cellxgene census

⑦ Billion Cells Project

⑧ GREGoR consortium

⑨ Atlas Project

⑩ Zoonomia Consortium: 241 genomes + additional primate genomes

⑪ autism sequencing consortium

⑫ SCHEMA consortium

2. Sequencing Technology

⑴ Overview

① DNA Sequencing: Applying the principle of DNA replication.

○ Template: Each strand of DNA

○ Substrate: dNTP (dATP, dCTP, dGTP, dTTP)

○ NTPs have a hydroxyl (-OH) group attached to the 2’ carbon and serve as the building blocks for RNA polymerization.

○ DNA polymerase: Phosphate of the next nucleotide binds to the 3’-OH of deoxyribose.

○ Synthesis direction: 5’ → 3’, forming complementary base pairs with the template.

○ ddNTP: DNA polymerization is terminated because the 3^rd carbon lacks an OH group.

② RNA Sequencing: Applying the principle of RNA transcription.

③ Sequencing by Synthesis

○ Definition: A method of determining nucleotide sequences by detecting signals emitted when a polymerase incorporates nucleotides into the primer sequence.

○ Type 1. CRT (Cyclic Reversible Termination): A cyclic process consisting of incorporation, imaging, and deprotection.

○ Type 2. SNA (Single Nucleotide Addition): Adds only one nucleotide at a time.

○ Examples: Ion semiconductor sequencing (PGM), Pyrosequencing (Roche 454)

④ Sequencing by Ligation

○ Definition: A method of determining nucleotide sequences by detecting signals emitted when a ligase connects a hybridized sequence to the primer sequence.

○ 1 or more nucleotide can be incorporated at a time (e.g., dibase).

○ Examples: Nanoball sequencing, Thermo Fisher SOLiD

⑵ In vitro cloning: The very first sequencing method

⑶ Dideoxy Chain Termination Method (= Sanger sequencing): Reported in 1977. Sanger’s second Nobel Prize.

① Substrate: dNTP + ddNTP (in small quantities) + buffer (pH stabilization)

○ ddNTP lacks a 3’-OH group, so it terminates the polymerization reaction.

○ If ddNTP is added in large quantities, all template DNAs are quickly terminated.

② Primer

○ Example: p32-primer (CTAG)

③ 1^st. Addition of template DNA and polymerase.

④ 2^nd. After the polymerization reaction, heat is applied to separate the replicated strands.

⑤ 3^rd. After electrophoresis, the sequence is read from an X-ray film or detected using fluorescence.

Figure 2. Process of Dideoxy Chain Termination Method

⑥ Advantages: It can read very long strands, so it is still used in laboratories.

⑦ Disadvantages: Requires a large amount of the same DNA strand.

⑷ Dye-dideoxy chain termination method: Using laser.

① A small amount of ddNTPs is added to dNTPs, each labeled with one of four fluorescent dyes.

② Enables automated DNA sequencing.

Figure 3. Process of the Dye-dideoxy chain termination method

⑸ Ion Semiconductor Sequencing (PGM)

① Sequencing by Synthesis.

② Prone to insertion/deletion errors.

⑹ Pyrosequencing (Roche 454)

① Definition: A DNA sequencing method that relies on the proportional luminescence produced based on the amount of pyrophosphate generated during DNA synthesis. ~400 bp per read.

② Sequencing by Synthesis: Light is emitted when a nucleotide is incorporated, appearing as peaks in a pyrogram.

③ Prone to insertion/deletion errors.

④ No longer in use.

Figure 4. Pyrosequencing diagram

Figure 5. Pyrosequencing process

⑺ Nanoball sequencing

① Sequenching by Ligation

⑻ Thermo Fisher SOLiD

① Sequenching by Ligation

② No longer in use.

⑼ Illumina solid-phase amplification (ref, YouTube clip)

Figure 6. Illumina solid-phase amplification

Figure 7. Fluorescent color distribution photo

① 1^st. Fragmentation: Randomly cut the given DNA sample.

② 2^nd. Gel-based size selection: If necessary, the size of each DNA fragment can be restricted.

③ 3^rd. Adaptor binding: Adapters are attached to both ends of all DNA sample fragments using a transposome.

④ 4^th. Amplification

○ 4^th - 1^st. Denature DNA into single-strands.

○ 4^th - 2^nd. Attach single-stranded DNA to the Illumina flow cell.

○ 4^th - 3^rd. Add enzymes to allow single-stranded DNA to form bridges on a solid-phase substrate.

○ 4^th - 4^th. After adding primers to the single-stranded DNA bridges, primers can bind to the bridges.

○ 4^th - 5^th. Add unlabeled single-stranded DNA and induce DNA synthesis: Forms double-stranded DNA bridges.

○ 4^th - 6^th. Through denaturation, the double-stranded DNA bridge is converted into anchored single-stranded DNA.

○ 4^th - 7^th. Repeat the above six steps to create anchored single-stranded clusters with the same base sequence.

○ Feature: Anchored single-stranded clusters form millions of clusters.

⑤ 5^th. Only the forward strand is retained to prevent unwanted priming.

⑥ 6^th. Sequencing by synthesis (SBS)

○ 6^th - 1^st. Four types of labeled reversible terminators (based on the nucleotide type), primers, and DNA polymerase are added.

○ 6^th - 2^nd. When labeled reversible terminator nucleotides form phosphodiester bonds, fluorescence is emitted.

○ 6^th - 3^rd. Obtain a fluorescent color distribution image of each cluster.

○ 6^th - 4^th. Washing

○ 6^th - 5^th. Repeat the above four steps to determine the entire nucleotide sequence.

○ Type 1. Single-end sequencing (SES): Sequencing with only one adapter.

○ Type 2. Paired-end sequencing (PES): Sequencing with both adapters.

○ Initially, sequence with one adapter (Read1 acquisition), then sequence with the opposite adapter (Read2 acquisition).

○ Read1 and Read2 from the same DNA fragment can be easily matched since they come from the same cluster.

○ Advantages: Higher accuracy (due to Read1 and Read2 comparison), easy detection of DNA variations, easy analysis of repetitive sequences, and easy mapping between different species.

○ Disadvantages: Higher cost and more steps required than SES.

⑦ ~100 bp / read: Recently, up to 250 bp.

⑽ WGS (Whole Genome Sequencing)

① SNV, insertion, deletion, structural variant, CNV

② Sequencing depth > 30X

⑾ WES (Whole Exon Sequencing)

① SNVs, insertions, deletions, and SNPs only in protein-coding genes.

② Sequencing depth > 50X ~ 100X

③ Cost-effective.

⑿ RNA-seq

① Overview

○ Microarray data has raw data in the form of continuous values, whereas RNA-seq has raw data in the form of count data.

○ Compared to microarray data, RNA-seq can robustly capture signals even for genes with very low or very high expression levels.

② 1^st. Microdissection: Separating specific tissues for RNA extraction.

○ LCM (Laser Capture Microdissection): Cutting specific tissues with a laser beam. Robust but labor-intensive.

○ TOMO-seq: Using cryosection and computer-based 3D sectioning. Not suitable for clinical purposes.

○ Transcriptome in vivo analysis

○ ProximID

○ STRP-seq

③ 2^nd. Poly T is attached to recognize the poly A tail of RNA.

④ 3^rd. Fragment RNA: 200-400 nt.

⑤ 4^th. Attach primers to RNA.

⑥ 5^th. First cDNA synthesis.

⑦ 6^th. Second cDNA synthesis.

⑧ 7^th. Process the 3’ and 5’ ends of RNA.

⑨ 8^th. Ligate DNA sequencing adapters.

⑩ 9^th. Amplify ligated fragments with PCR.

⑪ Application 1. dUTP method: A representative method for strand-specific sequencing.

○ Background: Used for studying biological functions based on RNA orientation (e.g., regulation of antisense miRNA).

○ Step 1. DNA &RNA hybrid: Synthesize cDNA (first or anti-sense strand) using dT primers (targeting mRNA poly-A tails) and reverse transcriptase.

5’-//-U-//-AAAAAA-3’

3’-//-A-//-TTTTTT-5’

○ Step 2. ds cDNA: Using dUTP instead of dTTP, the second (or sense) strand of cDNA is synthesized using the first strand of cDNA as a template.

3’-//-A-//-TTTTTT-5’

5’-//-U-//-AAAAAA-3’

○ Step 3. Ligated ds cDNA: Connect Y-adaptors to both ends of ds cDNA.

○ Step 4. When treated with UDG (uracil-DNA glycosylase), the second strand of DNA containing uracil is degraded.

○ Step 5. Amplify the remaining reverse antisense strand (first strand) to create the library.

○ In the library raw data, _1.fastq represents the first strand, while _2.fastq represents the second strand.

○ Thus, _2.fastq represents the original RNA profile.

⒀ Single-cell sequencing

① Types

○ scDNA-seq

○ scRNA-seq (Method of the Year in 2013): Chromium, Smart-seq, etc.

○ single-cell epigenetics sequencing

② Step 1. Isolation of single cells

○ Method 1. Simple isolation: Very early method.

○ Method 2. Based on FACS or LCM (laser microdissection)

○ Method 3. Acoustic separation

○ Separates single cells hydrodynamically, causing minimal impact on cells.

○ CyTOF (cytometry by time of flight) is a representative method.

○ Method 4. Immuno-magnetic separation

○ Attach magnets to cells.

○ Can obtain a large number of cells.

○ Classified into cases where centrifugation is required and where it is not.

③ Step 2. Reverse transcription

④ Step 3. cDNA amplification

⑤ Step 4. Library construction: Drop-seq, etc.

⑥ Single-cell genomics (scDNA-seq) + Single-cell transcriptomics (scRNA-seq)

○ It allows understanding of how mutation patterns in the genome are related to gene expression in the transcriptome.

○ Technologies for separating DNA and RNA: G&T seq, SIDR-seq, DNTR-seq

⒁ Single Nucleus RNA Sequencing (snRNA-seq)

① Purpose 1. Muscles are multinucleated cells, so they need to be analyzed at the nuclear level as they are not captured by scRNA-seq.

② Purpose 2. snRNA-seq captures more various RNA, including introns, pre-mRNA, non-coding RNA, compared to scRNA-seq.

③ Purpose 3. In snRNA-seq, nuclear RNA is predominantly captured: But a small amount of cytoplasmic RNA is also detected.

④ Purpose 4. Since snRNA-seq is mostly nascent RNA, it reflects the state of the cell better than scRNA-seq

⒂ Spatial Omics (▶ Supplement)

① Overview

Figure 8. Overview of spatial omics

Year	Method	Sample	Target	Resolution	Single-cell?	Probe	Area	# of cells	# of genes
Spot-based Spatial Transcriptomics
2014	Tomo-seq	Fresh-frozen	Transcriptome-wide	-	No	-	-	-	12,000 genes per section
2016	Visium	Fresh-frozen and FFPE	A-tailed RNA transcripts and targeted genes	100 μm	No	Print	6.5 mm × 6.5 mm	20,000 ~ 40,000 cells	~1.700 genes per spot
2017	Geo-seq	Fresh-frozen	Transcriptome-wide	-	Yes	-	-	5-40 cells per section	~8,000 genes
2019	Slide-seq/V2	Fresh-frozen	A-tailed RNA transcripts	10 μm	No	Beads	Φ3.0 mm	-	550 UMI per read
2019	HDST	Fresh-frozen	A-tailed RNA transcripts	2 μm	No	Beads	5.7 mm × 2.4 mm	~60,000 cells per individual hexagonal	44 UMI per 5× beads
2020	DBiT-seq	FFPE	A-tailed RNA transcripts	25 μm	No	Free probe	2.5 mm × 2.5 mm	~50,000 cells	1,000 genes per pixel
2021	Seq-Scope	Fresh-frozen	A-tailed RNA transcripts	~0.6 μm	Yes	In situ synthesize	0.8 mm × 1 mm	-	~1,617 genes per cell
2021	PIC	Fresh-frozen	Transcriptome-wide	75-5,000 μm	Yes	Photo-caged oligonucleotides	Scales with acquisition time	10 or more cells in tissue sections	~8,000 genes
2021	sci-Space	Fresh-frozen	A-tailed RNA transcripts	222 μm	Yes	Beads	18 mm × 18 mm	121,909 cells	1,231 genes per cell
2022	Stereo-seq	Fresh-frozen	A-tailed RNA transcripts	0.5 μm	Yes	In situ synthesize	Up to 132 mm × 132 mm	100,000-16,900,000 cells	1,910 UMI and 792 genes per cell
2022	Pixel-seq	Fresh-frozen	A-tailed RNA transcripts	~1 μm	Yes	In situ synthesize	75 mm × 25 mm	15,000 cells per section	~500 genes per cell
2023	Slide-tags	Fresh-frozen	A-tailed RNA transcripts	10 μm	Yes	Beads	Φ3.0 mm	81,000 cells	2,377 UMI per cell
2024	Visium HD	Fresh-frozen, FFPE	Targeted transcriptomics	2 μm	Yes	Print	6.5 mm × 6.5 mm	20,000-40,000 cells	7,605 probes
2024	Open-ST	Fresh-frozen	A-tailed RNA	0.6 μm	Yes	In situ synthesize	6.3 mm × 89 mm	-	~1,000 UMI per cell
Image-based Spatial Transcriptomics
2013	ISS	Cells, tissue sections	Targeted RNA	-	Yes	Padlock probe	222 × 166 μm²	450 cells covering an area of 0.16 mm²	39 genes
2014	FISSEQ	Cells, tissue sections	Untargeted RNA	-	Yes	Fluorescent probe	10 mm × 10 mm	-	4,171 genes with size >5 pixels
2015	MERFISH	Cells, tissue sections	Targeted RNA	-	Yes	Oligonucleotide probes	Scales with acquisition time	~100, up to 40,000 cells	130 genes, up to 10,050 genes
2018	STARmap	Tissue sections	Targeted RNA	-	Yes	Padlock probe	1.7 mm × 1.4 mm × 0.1 mm	> 30,000 cells	160 to 1,020 genes per sample
2019	seqFISH+	Cells, tissue sections	Targeted RNA	-	Yes	Oligonucleotide probes	Scales with acquisition time	2,963 cells	10,000 genes
2021	ExSeq	Cells, tissue sections	Untargeted / targeted RNA	-	Yes	Padlock probe	0.933 mm × 1.140 mm × 0.02 mm	~2,000 cells	-
2021	EASI-FISH	Tissue sections	Targeted RNA	0.23 × 0.23 × 0.42 μm	Yes	-	-	80,000 cells	-
2022	EEL FISH	Tissue sections	Targeted RNA	-	Yes	Primary probe	Scales with acquisition time	128,000 cells	883 genes
2023	STARmap PLUS	Tissue sections	Targeted RNA and antibody-based protein	-	Yes	Padlock probe	194 mm × 194 mm × 345 mm	-	1,022 genes
Multi-omics-based
2018	CODEX	Cells, tissue sections	Antibody-based protein	-	Yes	-	-	-	-
2020	OligoFISSEQ	Cells, tissue sections	Genomics	-	No	Barcoded oligopaint probes	-	-	-
2020	DBiT-seq	Fresh-frozen	A-tailed RNA transcripts and antibody-based protein	10 / 25 / 50 μm	No	Free probe	2.5 mm × 2.5 mm	~50,000 cells	~4,000 genes per 50 μm pixel
2021	DNA seqFISH+	Cells	Genomics	N/A (5,616.5 ± 1,551.4 dots per cell)	Yes	Primary probe	Scales with acquisition time	446 cells	3,660 chromosomal loci per cell
2021	IGS	Cells, tissue sections	Genomics	400-500 nm	Yes	Hairpin DNA	-	113 cells	-
2021	slide-DNA-seq	Fresh-frozen	Genomics	10 μm	Yes	Beads	Φ3 mm	2,274 cells	-
2022	CAD-HCR	Cells	Targeted RNA and antibody-based protein	-	Yes	Padlock probe	-	-	-
2022	MOSAICA	Cells, FFPE	Targeted RNA and antibody-based protein	-	Yes	Primary code	-	-	-
2022	SM-Omics	Fresh-frozen	A-tailed RNA transcripts and antibody-based protein	55 μm	No	Print	6.5 mm × 6.5 mm	-	-
2022	Spatial-CUT&Tag	Fresh-frozen	Epigenomics	20 μm	No	-	-	-	H3K27me3: 9,735; H3K4me3: 3,686 per 20 μm pixel size
2022	Spatial-TREX	Fresh-frozen	A-tailed RNA transcripts and CloneIDs	Single cell/55 μm	Yes	-	-	65,160 cells	5,000-10,000 genes
2022	Perturb-map	Fresh-frozen	A-tailed RNA transcripts and CRISPR-targeted genes	55 μm	Yes	-	-	-	-
2022	slide-TCR-seq	Fresh-frozen	A-tailed RNA transcripts and targeted T cell receptor genes	10 μm	No	Beads	Φ3.0 mm	-	-
2022	SmT	Fresh-frozen	Host transcriptome- and microbiome-wide	55 μm	No	Print	6.5 mm × 6.5 mm	-	-
2022	RIBOmap	Cells, tissue sections	Ribosome-bound mRNAs	-	Yes	Tri-probes (split DNA probe, padlock probes, primer probes)	-	60,481 cells	5,413 genes
2022	epigenomic MERFISH	Cells, tissue sections	Epigenomics	-	Yes	Oligonucleotide probes	Scales with acquisition time	~5,400 cells	3 histone modifications, H3K4me3: 127 genes/loci; H3K27ac: 142 target genomic loci
2023	spatial ATAC	Fresh-frozen	Epigenomics	55 μm	No	Splint oligonucleotide, spatially barcoded surface oligonucleotides	-	-	-
2023	spatial-CITE-seq	Fresh-frozen	A-tailed RNA transcripts and antibody-based protein	50 μm	No	Free	2.5 mm × 2.5 mm	37,500-50,000 cells	411 genes and 153 proteins per pixel
2023	Stereo-CITE-seq	Fresh-frozen	Targeted RNA and antibody-based protein	500 nm	Yes	In situ synthesize	Up to 132 mm × 132 mm	100,000-16,900,000 cells	4.05K UMI per bin50
2023	Spatial-ATAC-RNA-seq	Fresh-frozen	A-tailed RNA transcripts and transposase chromatin	20 / 50 μm	no	Free	2.5 mm × 2.5 mm	37,500-50,000 cells	-
2023	Ex-ST	Fresh-frozen	A-tailed RNA transcripts	20 μm	No	Print	6.5 mm × 6.5 mm	8,000-16,000 cells	~500 gens per spot
2023	immuno-SABER	Cells	Antibody-based protein	-	Yes	-	-	-	-
2023	scDVP	Fresh-frozen	Proteins	-	No	-	-	-	1,700 proteins
2023	scSpaMet	Fresh-frozen, FFPE	Metabolomics and targeted multiplexed protein	-	Yes	-	-	-	-

Table 1. Comparisn of different spatial omics technologies

Figure 9. Comparison of different spot-based spatial omics technologies

② Type 1. Spatial genomics

○ Example 1. Tumor research: Tumors are heterogeneous. Also, copy number variation can be analyzed.

○ Example 2. Spleen research: Mature immune cells have diverse genetic compositions.

③ Type 2. Spatial transcriptomics: Method of the Year in 2020

○ Example 1. Brain cortex: White matter, Gray matter, L6, L5, L4, L3, L2, L1

○ Example 2. Spleen: Red pulp, White pulp, Intermediate zone

○ Example 3. Skin: Epidermis, Dermis, Smooth muscle

○ Example 4. Kidney: Cortex, Henle loop, Collecting duct

○ Example 5. Liver: Periportal, Periportal, Intermediate

○ Example 6. Testis: I-III, IV-VI, VII-VIII, IX-XII

○ Example 7. Leaf: Upper epidermal wall, Lower epidermal wall

④ 2-1. Spot-based spatial transcriptomics: Many genes + few spots

○ ST (Spatial Transcriptomics)

○ A method where barcoded oligos are randomly dispersed onto the tissue, followed by capturing mRNA from each tissue region.

○ 10X Visium

○ Principle: Attaching spot-specific oligonucleotides to each spot to hybridize with tissue-derived RNA → obtaining spotwise transcriptomes.

○ Surface area: 6.5 mm × 6.5 mm

○ Thickness: 10 ~ 20 μm

○ Number of spots: Up to 4992 (Based on previous version of Visium HD)

○ Distance between spots: 100 μm

○ Diameter of spots: 55 μm

○ Sensitivity: 10,000 transcripts per spot

○ Type 1. Direct Visium (oligo-dT based method)

○ Capturing mRNA using poly dT.

○ Applicable only to FF (fresh-frozen) samples: Reagents used for FFPE are not compatible with direct Visium.

Figure 10. Principle of Visium FF

○ Type 2. Probe-based Visium

○ It can be done in both FF (Fresh Frozen) and FFPE (Formalin-Fixed Paraffin-Embedded) samples. In particular, FFPE (formalin-fixed paraffin-embedded) samples cannot undergo direct Visium due to RNA degradation, where mRNA molecules are fragmented into various pieces.

○ To identify the target mRNA, all three pairs of LHS and RHS must be ligated together: each probe’s length is 25 base pairs. RTL (probe-based RNA-templated ligation chemistry) is utilized for this purpose.

Figure 11. Principle of probe-based Visium

○ Advantage: Superior data quality compared to direct Visium.

○ Disadvantage: Limited freedom in analysis compared to Visium FF, as only genes specified by the probe are detected.

○ Starting from June 2024, 10x will discontinue the Visium FFPE service, except for CytAssist.

○ The CytAssist images represent the distribution of gene expression and are used for image alignment.

○ 10x Visium HD

○ The basic data consists of spots with a diameter of 2 μm, and additional data binned at 8 μm and 16 μm are also provided.

○ Slide-seq and Slide-seq V2 (Diameter of 10 μm)

○ A method where spatial beads are randomly dispersed, followed by in-situ sequencing.

○ 97% of spots consist of one or two cell types.

○ Since it labels the cell nucleus itself, nuclear segmentation is not required.

○ Curio Seeker: The commercialized version of Slide-seq V2.

○ HDST

○ Placing barcoded beads on a patterned wafer followed by serial hybridization.

○ NanoString GeoMx

○ The patent dispute between Nanostring and 10x Genomics in 2023 (ref1, ref2) → The bankruptsy of Nanostring (ref) → M&A of Nanostring (ref)

○ Stereo-seq

○ Oligo patterning engraved on a flow cell is read using Illumina or MGI sequencing, followed by barcode calling.

○ Diameter: 220 nm

○ Distance between spots: 500 or 715 nm

○ Seq-Scope

○ Higher spatial resolution than Visium.

○ Oligo patterning engraved on a flow cell is read using Illumina or MGI sequencing, followed by barcode calling.

○ PIC(photo-isolation chemistry)

○ PIXEL-seq

○ XYZeq (Diameter of 500 μm)

○ Placing tissue on a spatially barcoded microwell → single reverse transcription → cell removal → single-cell sequencing.

○ sci-Space (sc-space; Diameter of 222 μm)

○ Placing tissue on a slide with spatially gridded hashing oligos → tissue permeabilization (oligo transfer) → imaging → nuclear removal → cell fixation → sequencing.

○ sci-RNA-seq

○ TIVA-seq

○ NICHE-seq

○ ZipSeq

○ An imaging technology that processes cells with photocaged oligonucleotides to observe real-time patterned illumination (i.e., zipcode) of RNAs.

○ DBiT-seq

○ Placing barcoded oligos at fixed positions on orthogonal microfluidics to obtain location-specific transcriptomes of the tissue.

○ Slide-Tags (Diameter of 10 μm)

○ CITE-seq (ref1, ref2): Enables parallel comparison of spatial transcriptomics and antibody distribution

Figure 12. Diagram of CITE-seq

○ Connect the 5’ end of oligonucleotide to an antibody using streptavidin-biotin.

○ The oligonucleotide can hybridize complementarily with the oligo-dT primer.

○ Streptavidin-biotin bond can dissociate under reducing conditions.

○ Recently, perturb-CITE-seq was also developed.

○ SPOTS

○ Spatial PrOtein and Transcriptome Sequencing

○ Indirectly assess protein level on Visium using polyadenylated DNA-barcoded antibody.

○ Open-ST

○ Nova-ST

○ MAGIC-seq

○ spatial Geo-seq

④ 2-2. Image-based spatial transcriptomics: Few genes + many spots

○ ISS (In situ sequencing): Technique to sequence RNA at its original location in tissue. Sequencing by ligation

○ Type 1. The first ISS

○ Type 2. ISS with Padlock probe

○ Reverse transcriptase creates cDNA of the RNA target.

○ Padlock probe can hybridize to two regions of the cDNA.

○ Target sequence amplification occurs through Rolling-circle amplification (RCA).

○ RCA product is sequenced in situ by ligation.

○ Type 3. ISS using fluorescent probes and cross-linking

○ Type 4. Barcode-based methods

○ Type 5. Gap-filled ISS

○ Fluorescence in situ hybridization (FISH)

○ smFISH (Single molecule FISH) (2008)

○ seqFISH (Sequential FISH) (2014): RNA transcript signals are obtained through DNase I treatment followed by sequential staining and imaging. 12 pseudocolors, each requiring 4 rounds. Each barcode contains 5 pseudocolors.

○ seqFISH+: A technology that uses 20 probes per encoding round and efficiently divides fluorescence channels to obtain genome-scale transcriptomes.

○ Vizgen - MERSCOPE (Technology name: MERFISH (multiplexed error-robust FISH)): ISH-based.

○ Direct probe hybridization without separate amplification mechanism.

○ Each FISH probe corresponds 1:1 with each gene (though this assumption may not always hold).

○ An error correction method is used for barcode assignment (i.e., barcode calling): Hamming code

○ Step 1. Fluorescence status varies over time for each FISH probe, capturing multiple images.

○ Step 2. Inferring the corresponding gene by decoding the binary code read from each RNA.

Figure 13. Principle of MERFISH

○ 10x - Xenium: ISS-based.

○ Small amount of padlock probe + Rolling circle amplification (RCA)

○ Step 1. Padlock probe binds complementary RNA transcript in a pincer shape, forming a loop.

○ Step 2. RCA (Rolling circle amplification): After loop formation, the corresponding RNA transcript is amplified.

○ Step 3. Hybridize each RNA transcript with a fluorescent probe, then perform fluorescent imaging → Washing.

○ Step 4. Repeat Step 3 and decode labels for each gene from the generated images.

Figure 14. Principle of Xenium

○ Nanostring - CosMx: ISH-based.

○ Small amount of probe + Branch chain hybridization

○ The patent dispute between Nanostring and 10x Genomics (2023) (ref1, ref2) → The bankruptsy of Nanostring (ref) → M&A of Nanostring (ref)

Figure 15. Principle of CosMx

○ FISSEQ and oligoFISSEQ

○ Veranome

○ Rebus

○ BOLORAMIS

○ STARmap and STARmap PLUS: Sequencing by ligation

○ SEDAL sequencing

○ ExSeq

○ BaristaSeq: Sequencing by synthesis

○ BARSeq and BARSeq2

○ HybISS

○ SABER

○ clampFISH

○ split-FISH

○ SCRINSHOT

○ PLISH

○ osmFISH

○ ExFISH

○ par-seqFISH

○ EASI-FISH(expansion-assisted iterative fluorescence FISH)

○ SGA

○ corrFISH

○ EEL FISH(enhanced electric FISH)

○ VeraFISH

○ DARTFISH

○ RAEFISH

⑤ Type 3. Spatial proteomics: Broadly classified into mass spectrometry-based and imaging-based methods.

○ SWITCH

○ MxIF

○ t-CyCIF

○ IBEX

○ DEI

○ CODEX

○ immuno-SABER

○ TSA

○ Opal IHC

○ MIBI

○ IMC

○ HD-MIBI

○ GeoMx Digital Spatial Profiler (DSP): 100 mm scale

○ Using antibodies or gene probes conjugated with UV-cleavable DNA barcodes.

○ 4i multiplexed imaging

⒃ Other sequencing technologies

① TCR-seq (T cell receptor sequencing): Sequencing used to track T cell subtypes and clones.

② Invade-seq: A sequencing technique for analyzing the host-microbiome.

③ Long-read sequencing: Method of the Year in 2022 (ref)

	Short-read seq	Long-read seq
Release Year	Early 2000s	Mid-2010s
Average Read Length	150-300 bp	5,000-10,000 bp
Accuracy	99.9%	95-99%

Table 2. Difference between short-read seq and long-read seq

Figure 16. Difference between short-read seq and long-read seq

○ Without sequencing gaps, the following analyses can be performed.

Figure 16. long-read sequencing and short-read sequencing

○ Advantage 1. AS analysis(alternative splicing analysis): Can identify alternative splicing events, isoforms, etc.

○ Advantage 2. CNV analysis(copy number variation analysis): For example, the number of repeat sequences is crucial in Huntington’s disease.

○ Advantage 3. Easier integration of epigenetics and transcriptomics

○ Example 1. Pacific Biosciences SMRT (single molecule real-time) sequencing: Average read length is ~20 kb

○ Example 2. Oxford Nanopore Sequencing: Average read length is ~100 kb

○ Average read length: ~100 kb

○ Maximum read length: >2 Mb

○ High error rate: 5-10%. For reference, the error rate of Illumina sequencing is around 0.1-1%.

④ Non-invasive sequencing

○ A technology that allows sequencing without breaking cells.

⑤ Halo-seq: A technique for obtaining the transcriptome of RNAs adjacent to a specific protein.

○ Step 1. Attach a HaloTag domain to a specific target.

○ Step 2. The HaloTag functions as a radical-producing Halo ligand, releasing a hydrogen radical (H∙) from the injected alkyne handle to generate an alkyne handle radical.

○ R-H → R· + H·

○ Step 3. Similarly, the HaloTag generates an RNA radical by releasing a hydrogen radical H· from RNA.

○ RNA-H → RNA· + H·

○ Step 4. The alkyne handle radical combines with the RNA radical.

○ Step 5. React alkyne-RNA with biotin azide to produce biotinylated RNA.

○ Step 6. Separate only the biotinylated RNA using affinity chromatography with streptavidin.

○ Step 7. RNA-seq allows for the detection of RNAs close to the specific target.

○ Reason: Radicals are unstable and cannot travel long distances.

Figure 17. Principle of Halo-seq

⑥ multi-NTT seq (Nanobody tethered transposition followed by sequencing)

⑦ Epigenomics Sequencing(epigenomics sequencing)

⑧ 3D Sequencing

⑨ Temporal Sequencing

○ Record-seq

○ Live-seq

○ TMI

○ Molecular recording

⑩ Spatiotemporal Omics

○ ORBIT (single-molecule DNA origami rotation measurement)

○ 4D spatiotemporal MRI or hyperpolarized MR

○ In vivo 4D omics with transparent mice

○ CytoTape

⑪ Spatial multi-omics

○ Lunaphore: Spatial transcriptomics + Spatial proteomics

○ DBiT ARP-seq (ATAC + RNA + Protein)

○ DBiT CTRP-seq (CUT&Tag (H3K27me3) + RNA + Protein)

○ Patho-DBiT: Spatial multi-omics on RNA biology (e.g., lncRNA, microRNA, miscRNA, rRNA, snoRNA, snRNA, tRNA, Y RNA)

⒄ Summary of NGS (next-generation sequencing)

① Cost of genome analysis

○ 2001: $100 million per person based on the Human Genome Project.

○ 2007: $0.1 billion over 4 years.

○ 2008: 454 $1,000,000 / person based on 454 Life Sciences.

○ 2009: $48,000 / person based on Helicos BioSciences.

○ Predicted to be sufficient with $1,000 by 2014 (Nature 456, 23-25, 2008).

② Scale of genome analysis

Figure 18. Trend of genome analysis scale

③ Relationship between depth and coverage

○ sequencing depth (read depth): Indicates how many times a specific nucleotide appears on average.

Figure 19. Definition of depth

○ “10x” means it was read 10 times.

○ Can be defined for each nucleotide.

○ Coverage (c)

○ c := LN / G

○ L: Read length

○ N: Number of reads

○ G: Haploid genome length

○ Comparison of depth and coverage

○ Total read number is expressed as sequencing depth.

○ The relationship between sequence reads and the reference (e.g., whole genome, a locus) is expressed as coverage.

○ Apart from this distinction, depth and coverage are very similar concepts.

④ Relationship between bulk and read

○ Bulk: total RNA production

○ When the depth is the same, an irrationality arises where RNA read count is inversely proportional to the increase in bulk size. (ref)

○ Example: In spatial transcriptomics, bulk is typically large and depth is low, resulting in low RNA read count.

○ Normalization: Various methods have been introduced to resolve this irrationality.

⑤ Relationship between read count and number of reads

○ If read length is less than 250 bp, it is impossible to detect sequence error.

○ Relationship between read length and number of reads per run: There is a trade-off.

Figure 20. Relationship between read length and number of reads per run

⑥ Relationship between transcriptome read count and gene expression

○ Read count: Actual number of transcripts.

○ Gene expression: Value corrected from read count through normalization process.

Entered: 2015.07.02 23:31

Updated: 2022.03.13 13:11

75

Chapter 10. Genome Project and Sequencing Technology

1. Genome Project

2. Sequencing Technology

results matching ""

No results matching ""