Korean, Edit

Proteomics Analysis Pipeline

Recommended post: 【Bioinformatics】 Bioinformatics Analysis Table of Contents


1. Motif Analysis

2. Protein-Protein Interaction

3. Prediction of Protein Variant Function



1. Motif Analysis

⑴ Sequence Logo

① A graphical representation of amino acid or nucleotide multiple sequence alignment

② Developed by Tom Schneider and Mike Stephens

③ The y-axis represents information content as defined in information theory

Example 1. When all nucleotide sequences (A, T, G, C) occur at the same frequency : Maximum entropy = 2, Actual entropy = 2, Information content = 0

Example 2. When only one nucleotide appears : Maximum entropy = 2, Actual entropy = 0, Information content = 2

Example 3. When two nucleotides appear at the same frequency : Maximum entropy = 2, Actual entropy = 1, Information content = 1

⑵ PROSITE

① A database of protein patterns

② Patterns are defined using regular expressions as follows:

○ Used when an amino acid is known

○ Positions are separated by ‘-‘

○ ‘x’ is a wildcard character

○ ‘[]’ represents ambiguity, i.e., [one of]

○ ‘{}’ represents negation, i.e., {not one of}

○ ‘()’ denotes a range, i.e., (min, max)

○ ‘<’ or ‘>’ indicates the N-terminus or C-terminus of a protein, respectively

③ Examples

○ [AC]-x-V-x(4)-{ED} : [Ala or Cys]-any-Val-any-any-any-any-{any but Glu or Asp}

○ <A-x-)-V : Translates to N-terminal Ala-any-[Ser or Thr]-[Ser or Thr]-(any or none)-Val



2. Protein-Protein Interaction (PPI; Molecular Docking)

⑴ Key Points

① Binding affinity (BA) is generally quantified by the dissociation constant (Kd) or inhibition constant (Ki)

② General considerations in PPI

○ General characteristics (e.g., atom types)

○ Physicochemical properties (e.g., excluded volume, partial charge, heavy atom neighbors, heteroatom neighbors, hybridization)

○ Pharmacological properties (e.g., hydrophobicity, aromaticity, acid/base, ring formation)

③ Datasets

○ PDBbind database of version 2016

Subset 1. General set : Includes all data, i.e., 13,285 protein-ligand complexes

Subset 2. Refined set : A subset of the general set, containing 4,057 high-quality complexes

Subset 3. Core 2016 set : 290 complexes extracted from the refined set, frequently used as benchmarking data

CASF-2013

○ CSAR-HiQ

CSAR-HiQ_51 : A subset extracted from an original set of 176 protein-ligand complexes

CSAR-HiQ_36 : A subset extracted from an original set of 167 protein-ligand complexes

○ Biolip

○ InterPepScore

④ While there are several models for protein-ligand interactions, models for protein-protein interactions remain relatively scarce

⑵ Models

① Overview

○ Divided into binding site prediction models and binding affinity prediction models, though the distinction is not strict

○ Generally, a binding distance of 3 Å or less between a ligand and receptor is considered strong binding

Type 1. AlphaFold2 multimer

Type 2. DeepDTA

Type 3. DeepDTAF

Type 4. DeepFusionDTA

Type 5. GraphDTA

Type 6. CAPLA

Type 7. GNINA

○ Uses CNN for both binding site prediction and affinity evaluation

Type 8. SMINA

○ Uses physics-based scoring functions for both binding site prediction and affinity evaluation

Type 9. GLIDE

○ Uses physics-based scoring functions for both binding site prediction and affinity evaluation

Type 10. EquiBind

○ GNN with SE(3) equivariance

Type 11. TANKBind

○ Uses the attention mechanism of Transformers

Type 12. DIFFDOCK

○ Utilizes a diffusion model.



3. Prediction of Protein Variant Function

⑴ PolyPhen-2 (Adzhubei et al., 2013)

⑵ SIFT (Kumar et al., 2009)

⑶ Mutation Taster (Schwarz et al., 2014)

⑷ Mutation Assessor (Reva et al., 2011)

⑸ LR and LRT (Chun & Fay, 2009)



Input: 2024.03.31 01:08

Modified: 2024.09.29 15:40

results matching ""

    No results matching ""