Korean, Edit

Types of Small Molecule Drug Databases

Recommended Post : 【Bioinformatics】 Table of Contents for Bioinformatics Analysis


1. Protein Representation

2. Molecular Representation

3. Small Molecule Databases

4. Small Molecule Simulation

5. Drug Genomics Databases


a. Pharmacology



1. Protein Representation

⑴ AAC (Amino Acid Composition) - 20D

⑵ Dipeptide Composition Descriptor - 400D

⑶ Tripeptide Composition Descriptor - 8000D

⑷ Composition, Transition and Distribution (CTD) - 147D

⑸ ProtVec (Asgari et al., PLoS ONE 10(11): e0141287, 2015)



2. Molecular Representation

⑴ Structural Based

① Pattern-based FP

○ MACCS, PubChem, FP3, FP4

⑵ Topological

① Path-based FP

○ Daylight, FP2

② Circular FP

○ ECFP2, ECFP4, ECFP6

③ Pharmacophore FP

○ 2D pharmacophore

⑶ Neural Network Based

① Graph-based representation

○ GNN (Graph Convolutional Network (GCN), Graph Attention Network (GAT), Gated Graph Neural Network (GGNN), …)

○ GNN is a more advanced form than CNN, and GNN broadly includes CNN as well

② Molecular Embedding

○ Seq2seq, Mol2Vec



3. Small Molecule Databases

Integrated Small Molecule Database

① Provides data in vector form for the biological activity of approximately 800,000 small molecules

② Data directly usable for building machine learning models

③ Categories: Chemical properties, target information, biological networks, cell experiments, clinical data

AlphaFold2 Database

① Database of 200 million protein structure data

② 1000 times more than the approximately 200,000 protein structures experimentally determined by humans over 50 years

③ Achieved in just one year since the start of database creation

Figure. 1. Size of the AlphaFold2 Database

Ensembl : Transcriptome Database

Uniprot : Protein Database

The Human Protein Atlas : Public access resource aiming to map all human proteins in cells, tissues, and organs

SGC (Chemical Probes) : Provides unique probe collections along with related data, reference compounds, and usage recommendations

⑺ Zinc Clean Leads Collection (1,936,962 molecules)

① Molecular weight in the range from 250 to 350 Daltons, a number of rotatable bonds not greater than 7, and XlogP less than or equal to 3.5.

② Removed molecules containing charged atoms or atoms besides C, N, S, O, F, Cl, Br, H or cycles longer than 8 atoms.

③ Molecules filtered via medicinal chemistry filters (MCFs) and PAINS filters.

https://github.com/molecularsets/moses

⑻ Known DDR1 Kinase Inhibitors

⑼ Common Kinase Inhibitors (positive)

⑽ Molecules that Act on Non-Kinase Targets (negative)

① 2 ~ 4 : from ChemBL dataset

⑾ Patent Data for Claimed Molecules

www.globaldata.com : Contains information on approximately 17,000 drugs registered until 2017

⑿ 3D Structure of DDR1 Inhibitors



4. Small Molecule Simulation

⑴ Docking Simulation

① Using the Maestro suite (https://www.schrodinger.com)

② PDB structure 3ZOS was preprocessed and energy minimized using the Prep module

⑵ Toxicity-related Public Databases

① ToxCast : Approximately 8500 compounds. Database includes about 700 types of in vitro assays with consideration of various cell lines and biological activities through high-throughput screening

② Tox21 : Approximately 8000 compounds. Database provides qualitative measurements of chemical reactions for compounds on 12 major cell toxicity targets using methods like luciferase assay

③ DSSTOX : Approximately 740,000 compounds. Database integrates physical and chemical properties of chemicals with biological experimental data from Tox21 and ToxCast

④ ClinTox : Approximately 1500 compounds. Compares FDA-approved drugs with those that failed clinical trials due to toxicity issues

⑤ SIDER : Approximately 1500 compounds. Database integrates information on side effects of marketed drugs. Classifies drug side effects based on frequency and severity using reported papers and experimental data

⑥ ECOTOX : Approximately 12,000 compounds. Database provides integrated toxicity experimental data for chemicals on over 13,000 species. Toxicity is evaluated based on criteria such as EC50, IC50, and NOEL, and links to relevant papers are provided



5. Drug Genomics Databases

⑴ NCBI dbSNP

⑵ gnomAD

⑶ pharmVar

⑷ PHARMGKB

⑸ NCBI PubChem

⑹ Broad Institute CMAP

⑺ CTD

⑻ Drug Bank

⑼ Stitch (Search Tool for Interactions of Chemicals)

ToppFun

⑾ depmap

⑿ L1000CDS2

⒀ L1000FWD

⒁ GDSC (Genomic of Drug Sensitivity in Cancer)

⒂ CCLE

ClinicalTrials.gov : Provides information on clinical trial progress for each drug



Input: 2022.04.27 01:25

results matching ""

    No results matching ""