Proteomics Analysis Pipeline

Recommended post: 【Bioinformatics】 Bioinformatics Analysis Table of Contents

1. Motif Analysis

2. Protein-Protein Interaction

3. Prediction of Protein Variant Function

a. Transcriptomics Analysis Pipeline

b. Collection of Python Functions for Organic Chemistry

1. Motif Analysis

⑴ Overview: Includes epitope and pocket analysis.

⑵ Sequence Logo

① A graphical representation of amino acid or nucleotide multiple sequence alignment

② Developed by Tom Schneider and Mike Stephens

③ The y-axis represents information content as defined in information theory

④ Example 1. When all nucleotide sequences (A, T, G, C) occur at the same frequency : Maximum entropy = 2, Actual entropy = 2, Information content = 0

⑤ Example 2. When only one nucleotide appears : Maximum entropy = 2, Actual entropy = 0, Information content = 2

⑥ Example 3. When two nucleotides appear at the same frequency : Maximum entropy = 2, Actual entropy = 1, Information content = 1

⑶ PROSITE

① A database of protein patterns

② Patterns are defined using regular expressions as follows:

○ Used when an amino acid is known

○ Positions are separated by ‘-‘

○ ‘x’ is a wildcard character

○ ‘[]’ represents ambiguity, i.e., [one of]

○ ‘{}’ represents negation, i.e., {not one of}

○ ‘()’ denotes a range, i.e., (min, max)

○ ‘<’ or ‘>’ indicates the N-terminus or C-terminus of a protein, respectively

③ Examples

○ [AC]-x-V-x(4)-{ED} : [Ala or Cys]-any-Val-any-any-any-any-{any but Glu or Asp}

○ <A-x-)-V : Translates to N-terminal Ala-any-[Ser or Thr]-[Ser or Thr]-(any or none)-Val

⑷ Topological analysis about membrane (e.g., trans-membrane region)

① Membrane protein and hydropathy index

② Tools: TMHMM, TopGraph, Phobius

⑸ PTM

① Post-translational modification (PTM)

② Tools: MusiteDeep (detecting 13 different PTM patterns)

○ Hydroxylysine

○ Hydroxyproline

○ Methylarginine

○ Methyllysine

○ N-linked_glycosylation

○ N6-acetyllysine

○ O-linked_glycosylation

○ Phosphoserine_Phosphothreonine

○ Phosphotyrosine

○ Pyrrolidone_carboxylic_acid

○ S-palmitoyl_cysteine

○ SUMOylation

○ Ubiquitination

2. Protein-Protein Interaction (PPI; Molecular Docking)

⑴ Key Points

① Binding affinity (BA) is generally quantified by the dissociation constant (Kd) or inhibition constant (Ki)

② General considerations in PPI

○ General characteristics (e.g., atom types)

○ Physicochemical properties (e.g., excluded volume, partial charge, heavy atom neighbors, heteroatom neighbors, hybridization)

○ Pharmacological properties (e.g., hydrophobicity, aromaticity, acid/base, ring formation)

③ Datasets

○ PDBbind database of version 2016

○ Subset 1. General set : Includes all data, i.e., 13,285 protein-ligand complexes

○ Subset 2. Refined set : A subset of the general set, containing 4,057 high-quality complexes

○ Subset 3. Core 2016 set : 290 complexes extracted from the refined set, frequently used as benchmarking data

○ CASF-2013

○ CSAR-HiQ

○ CSAR-HiQ_51 : A subset extracted from an original set of 176 protein-ligand complexes

○ CSAR-HiQ_36 : A subset extracted from an original set of 167 protein-ligand complexes

○ Biolip

○ InterPepScore

④ While there are several models for protein-ligand interactions, models for protein-protein interactions remain relatively scarce

⑵ Models

① Overview

○ Divided into binding site prediction models and binding affinity prediction models, though the distinction is not strict

○ Generally, a binding distance of 3 Å or less between a ligand and receptor is considered strong binding

② Type 1. AlphaFold2 multimer, AFM-LIS, AlphaFold3

③ Type 2. DeepDTA

④ Type 3. DeepDTAF

⑤ Type 4. DeepFusionDTA

⑥ Type 5. GraphDTA

⑦ Type 6. CAPLA

⑧ Type 7. GNINA

○ Uses CNN for both binding site prediction and affinity evaluation

⑨ Type 8. SMINA

○ Uses physics-based scoring functions for both binding site prediction and affinity evaluation

⑩ Type 9. GLIDE

○ Uses physics-based scoring functions for both binding site prediction and affinity evaluation

⑪ Type 10. EquiBind

○ GNN with SE(3) equivariance

⑫ Type 11. TANKBind

○ Uses the attention mechanism of Transformers

⑬ Type 12. DIFFDOCK

○ Utilizes a diffusion model.

⑭ Type 13. membranefold

○ Imposes membrane-attachment conditions on AlphaFold

⑮ Type 14. Boltz-2, BoltzGen, Boltz Lab

○ Relatively free from the time–accuracy trade-off

○ Recently, Boltzgen—designed to create binders based on Boltz-2—was announced

Figure 1. Boltz-2 Benchmarking Study

⑯ Type 15. DrugCLIP: Since screening after structure prediction takes a long time, it co-embeds the drug and the pocket first, then performs screening in a search-engine-like manner.

⑰ Type 16. BindCLIP

⑱ Type 17. Chai

3. Prediction of Protein Variant Function

⑴ PolyPhen-2 (Adzhubei et al., 2013)

⑵ SIFT (Kumar et al., 2009)

⑶ Mutation Taster (Schwarz et al., 2014)

⑷ Mutation Assessor (Reva et al., 2011)

⑸ LR and LRT (Chun & Fay, 2009)

Input: 2024.03.31 01:08

Modified: 2024.09.29 15:40

2085

Proteomics Analysis Pipeline

1. Motif Analysis

2. Protein-Protein Interaction (PPI; Molecular Docking)

3. Prediction of Protein Variant Function

results matching ""

No results matching ""