Proteomics Analysis Pipeline
Recommended post: 【Bioinformatics】 Bioinformatics Analysis Table of Contents
2. Protein-Protein Interaction
3. Prediction of Protein Variant Function
1. Motif Analysis
⑴ Sequence Logo
① A graphical representation of amino acid or nucleotide multiple sequence alignment
② Developed by Tom Schneider and Mike Stephens
③ The y-axis represents information content as defined in information theory
④ Example 1. When all nucleotide sequences (A, T, G, C) occur at the same frequency : Maximum entropy = 2, Actual entropy = 2, Information content = 0
⑤ Example 2. When only one nucleotide appears : Maximum entropy = 2, Actual entropy = 0, Information content = 2
⑥ Example 3. When two nucleotides appear at the same frequency : Maximum entropy = 2, Actual entropy = 1, Information content = 1
⑵ PROSITE
① A database of protein patterns
② Patterns are defined using regular expressions as follows:
○ Used when an amino acid is known
○ Positions are separated by ‘-‘
○ ‘x’ is a wildcard character
○ ‘[]’ represents ambiguity, i.e., [one of]
○ ‘{}’ represents negation, i.e., {not one of}
○ ‘()’ denotes a range, i.e., (min, max)
○ ‘<’ or ‘>’ indicates the N-terminus or C-terminus of a protein, respectively
③ Examples
○ [AC]-x-V-x(4)-{ED} : [Ala or Cys]-any-Val-any-any-any-any-{any but Glu or Asp}
○ <A-x-)-V : Translates to N-terminal Ala-any-[Ser or Thr]-[Ser or Thr]-(any or none)-Val
2. Protein-Protein Interaction (PPI; Molecular Docking)
⑴ Key Points
① Binding affinity (BA) is generally quantified by the dissociation constant (Kd) or inhibition constant (Ki)
② General considerations in PPI
○ General characteristics (e.g., atom types)
○ Physicochemical properties (e.g., excluded volume, partial charge, heavy atom neighbors, heteroatom neighbors, hybridization)
○ Pharmacological properties (e.g., hydrophobicity, aromaticity, acid/base, ring formation)
③ Datasets
○ PDBbind database of version 2016
○ Subset 1. General set : Includes all data, i.e., 13,285 protein-ligand complexes
○ Subset 2. Refined set : A subset of the general set, containing 4,057 high-quality complexes
○ Subset 3. Core 2016 set : 290 complexes extracted from the refined set, frequently used as benchmarking data
○ CSAR-HiQ
○ CSAR-HiQ_51 : A subset extracted from an original set of 176 protein-ligand complexes
○ CSAR-HiQ_36 : A subset extracted from an original set of 167 protein-ligand complexes
○ Biolip
○ InterPepScore
④ While there are several models for protein-ligand interactions, models for protein-protein interactions remain relatively scarce
⑵ Models
① Overview
○ Divided into binding site prediction models and binding affinity prediction models, though the distinction is not strict
○ Generally, a binding distance of 3 Å or less between a ligand and receptor is considered strong binding
② Type 1. AlphaFold2 multimer
③ Type 2. DeepDTA
④ Type 3. DeepDTAF
⑤ Type 4. DeepFusionDTA
⑥ Type 5. GraphDTA
⑦ Type 6. CAPLA
⑧ Type 7. GNINA
○ Uses CNN for both binding site prediction and affinity evaluation
⑨ Type 8. SMINA
○ Uses physics-based scoring functions for both binding site prediction and affinity evaluation
⑩ Type 9. GLIDE
○ Uses physics-based scoring functions for both binding site prediction and affinity evaluation
⑪ Type 10. EquiBind
○ GNN with SE(3) equivariance
⑫ Type 11. TANKBind
○ Uses the attention mechanism of Transformers
⑬ Type 12. DIFFDOCK
○ Utilizes a diffusion model.
3. Prediction of Protein Variant Function
⑴ PolyPhen-2 (Adzhubei et al., 2013)
⑵ SIFT (Kumar et al., 2009)
⑶ Mutation Taster (Schwarz et al., 2014)
⑷ Mutation Assessor (Reva et al., 2011)
⑸ LR and LRT (Chun & Fay, 2009)
Input: 2024.03.31 01:08
Modified: 2024.09.29 15:40