Korean, Edit

Proteomics Analysis Pipeline

Recommended post: 【Bioinformatics】 Bioinformatics Analysis Table of Contents


1. QC

2. Motif Analysis

3. Protein-Protein Interaction

4. Prediction of Protein Variant Function

5. Metabolomics Analysis

6. Database Utilization


a. Transcriptomics Analysis Pipeline

b. Collection of Python Functions for Organic Chemistry



1. QC

⑴ Data Import: ThermoRawFileParser, msconvert

⑵ Mass spectrometry QC: PTXQC, rawtools

⑶ Peptide and protein ID: MaxQuant, MSFragger, Comet

⑷ Label-free or isobaric quant: DIA-NN, Skyline, FlashLFQ



2. Motif Analysis

⑴ Overview: Includes epitope and pocket analysis.

⑵ Sequence Logo

① A graphical representation of amino acid or nucleotide multiple sequence alignment

② Developed by Tom Schneider and Mike Stephens

③ The y-axis represents information content as defined in information theory

Example 1. When all nucleotide sequences (A, T, G, C) occur at the same frequency : Maximum entropy = 2, Actual entropy = 2, Information content = 0

Example 2. When only one nucleotide appears : Maximum entropy = 2, Actual entropy = 0, Information content = 2

Example 3. When two nucleotides appear at the same frequency : Maximum entropy = 2, Actual entropy = 1, Information content = 1

⑶ PROSITE

① A database of protein patterns

② Patterns are defined using regular expressions as follows:

○ Used when an amino acid is known

○ Positions are separated by ‘-‘

○ ‘x’ is a wildcard character

○ ‘[]’ represents ambiguity, i.e., [one of]

○ ‘{}’ represents negation, i.e., {not one of}

○ ‘()’ denotes a range, i.e., (min, max)

○ ‘<’ or ‘>’ indicates the N-terminus or C-terminus of a protein, respectively

③ Examples

○ [AC]-x-V-x(4)-{ED} : [Ala or Cys]-any-Val-any-any-any-any-{any but Glu or Asp}

○ <A-x-)-V : Translates to N-terminal Ala-any-[Ser or Thr]-[Ser or Thr]-(any or none)-Val

⑷ Topological analysis about membrane (e.g., trans-membrane region)

Membrane protein and hydropathy index

② Tools: TMHMM, TopGraph, Phobius

⑸ PTM

Post-translational modification (PTM)

② Tools: MusiteDeep, PTM-prophet, MaxQuant

○ MusiteDeep (detecting 13 different PTM patterns): Hydroxylysine, Hydroxyproline, Methylarginine, Methyllysine, N-linked_glycosylation, N6-acetyllysine, O-linked_glycosylation, Phosphoserine_Phosphothreonine, Phosphotyrosine, Pyrrolidone_carboxylic_acid, S-palmitoyl_cysteine, SUMOylation, Ubiquitination

⑹ Differential abundance analysis

① MSstats

② limma

⑺ Protein pathway enrichment

① Perseus

② clusterProfiler

⑻ Pocket, Epitope discovery

fpocket

② PUResNet

③ Kalasanty

RAPID-Net

⑤ Site4Drug



3. Protein-Protein Interaction (PPI; Molecular Docking)

⑴ Key Points

① Binding affinity (BA) is generally quantified by the dissociation constant (Kd) or inhibition constant (Ki)

② General considerations in PPI

○ General characteristics (e.g., atom types)

○ Physicochemical properties (e.g., excluded volume, partial charge, heavy atom neighbors, heteroatom neighbors, hybridization)

○ Pharmacological properties (e.g., hydrophobicity, aromaticity, acid/base, ring formation)

③ Datasets

○ PDBbind database of version 2016

Subset 1. General set : Includes all data, i.e., 13,285 protein-ligand complexes

Subset 2. Refined set : A subset of the general set, containing 4,057 high-quality complexes

Subset 3. Core 2016 set : 290 complexes extracted from the refined set, frequently used as benchmarking data

CASF-2013

○ CSAR-HiQ

CSAR-HiQ_51 : A subset extracted from an original set of 176 protein-ligand complexes

CSAR-HiQ_36 : A subset extracted from an original set of 167 protein-ligand complexes

○ Biolip

○ InterPepScore

④ While there are several models for protein-ligand interactions, models for protein-protein interactions remain relatively scarce

⑵ Models

① Overview

○ Divided into binding site prediction models and binding affinity prediction models, though the distinction is not strict

○ Generally, a binding distance of 3 Å or less between a ligand and receptor is considered strong binding

Type 1. AlphaFold2 multimer, AFM-LIS, AlphaFold3

Type 2. DeepDTA

Type 3. DeepDTAF

Type 4. DeepFusionDTA

Type 5. GraphDTA

Type 6. CAPLA

Type 7. GNINA

○ Uses CNN for both binding site prediction and affinity evaluation

Type 8. SMINA

○ Uses physics-based scoring functions for both binding site prediction and affinity evaluation

Type 9. GLIDE

○ Uses physics-based scoring functions for both binding site prediction and affinity evaluation

Type 10. EquiBind

○ GNN with SE(3) equivariance

Type 11. TANKBind

○ Uses the attention mechanism of Transformers

Type 12. DIFFDOCK

○ Utilizes a diffusion model.

Type 13. membranefold

○ Imposes membrane-attachment conditions on AlphaFold

Type 14. Boltz-2, BoltzGen, Boltz Lab

○ Relatively free from the time–accuracy trade-off

○ Recently, Boltzgen—designed to create binders based on Boltz-2—was announced


스크린샷 2026-02-20 오전 11 53 12

Figure 1. Boltz-2 Benchmarking Study


○ BoltzLab


BoltzLab use review

Figure 2. Boltz-2 User review


Type 15. DrugCLIP: Since screening after structure prediction takes a long time, it co-embeds the drug and the pocket first, then performs screening in a search-engine-like manner.

Type 16. BindCLIP

Type 17. Chai

Type 18. xQuest



4. Prediction of Protein Variant Function

⑴ PolyPhen-2 (Adzhubei et al., 2013)

⑵ SIFT (Kumar et al., 2009)

⑶ Mutation Taster (Schwarz et al., 2014)

⑷ Mutation Assessor (Reva et al., 2011)

⑸ LR and LRT (Chun & Fay, 2009)



5. Metabolomics Analysis

⑴ Peak detection: scipy.signal.find_peaks, peak widths

⑵ LC-MS/GC-MS peak picking, alignment, feature grouping: XCMS centWave

⑶ Normalization, scaling: Median, Quantile (Bolstad 2003), TIC, PQN (Dieterle 2006), Log2

⑷ Metabolite annotation: HMDB m/z matching, [M+H]⁺/[M-H]⁻/[M+Na]⁺ adducts

⑸ Feature quantification, imputation, normalization: Min/2, median, KNN imputation (sklearn); TIC/median/log norm

⑹ Univariate statistical testing with FDR correction: Welch’s t-test, Wilcoxon, ANOVA, Kruskal-Wallis + BH FDR

⑺ Differential metabolite analysis with PCA: Welch’s t-test + BH FDR, PCA visualization

⑻ Pathway enrichment: hypergeometric test (ORA), KEGG pathways, BH FDR



6. Database Utilization

Small Molecule Database

Integrated Small Molecule Database: Database providing data on the physiological activity of about 800,000 small molecules in vector format

AlphaFold2 Database: Database with structural data of 200 million proteins

ensembl: Transcriptome database

uniprot: Protein database

The Human Protein Atlas: Public access resource aiming to map all human proteins in cells, tissues, and organs

SGC (Chemical Probes): Provides a unique probe collection along with related data, control compounds, and usage recommendations

⑵ Antigen-antibody Database

① IEDB (Immune Epitope Database)

② VDJdb

③ BciPep

④ SAbDab

⑤ IMGT/3Dstructure-DB

⑥ AACDB(Antigen-Antibody Complex DB)

⑦ Thera-SAbDab

⑧ Cov-AbDab

⑨ Abcam: Commercial

⑩ CST: Commercial

⑪ CiteAb: Commercial

⑫ Antibodypedia: Commercial

⑬ ABCD database

Pharmacogenomics Database

① NCBI dbSNP

② gnomAD

③ pharmVar

④ PHARMGKB

⑤ NCBI PubChem

⑥ Broad Institute CMAP

⑦ CTD

⑧ Comptox

⑨ DrugBank

⑩ Stitch (search tool for interactions of chemicals)

ToppFun

⑫ DepMap: Provides expression data and lineage information for the corresponding cell line.

⑬ L1000CDS2

⑭ L1000FWD

⑮ GDSC (Genomic of Drug Sensitivity in Cancer)

⑯ CCLE

ClinicalTrials.gov: Provides information on clinical trial progress for each drug

⑱ Cortellis: Provides information on the clinical trial progress of each drug.

⑲ The Antibody Society: Provides information on the clinical trial progress of antibodies.

⑳ PRISM: Provides large-scale drug response data across hundreds of cancer cell lines.



Input: 2024.03.31 01:08

Modified: 2024.09.29 15:40

results matching ""

    No results matching ""