Please cite as: Gilson, Michael K.; Liu, Tiqing; Hwang, Linda (2024). BindingDB Dataset January 1, 2024. In BindingDB: Measured Binding Data for Protein-Ligand and Other Molecular Systems. UC San Diego Library Digital Collections. https://doi.org/10.6075/J0BP02ZT Corresponding author: Michael K. Gilson, mgilson@ucsd.edu Primary associated publication: - Gilson,M.K., Liu,T., Baitaluk,M., Nicola,G., Hwang, L. and Chong,J. BindingDB in 2015: A public database for medicinal chemistry, computational chemistry and systems pharmacology Nucleic Acids Research 44:D1045-D1053 (2015). 10.1093/nar/gkv1072 - Liu,T., Lin,Y., Wen,X., Jorrisen, R.N. and Gilson,M.K. BindingDB: a web-accessible database of experimentally determined protein-ligand binding affinities Nucleic Acids Research 35:D198-D201 (2007). 10.1093/nar/gkl999 - Chen,X., Liu,M., and Gilson,M.K. Binding DB: A web-accessible molecular recognition database J. Combi. Chem. High-Throughput Screen 4:719-725(2001). 10.2174/1386207013330670 Description of contents: This BindingDB release contains about 2.8 million experimental binding data, and the downloads comprise six (6) files, as follows: - BDB-Oracle_All_202401_dmp.zip: Oracle database dump of all of BindingDB - BDB-mySQL_All_202401_dmp.zip: MySQL database dump of all of BindingDB - BindingDB_All_202401_tsv.zip: zip-compressed, tab-separated-value file of all protein-ligand data in BindingDB. (See Data Dictionary below.) - BindingDB_BindingDB_Articles_202401_tsv.zip: Data curated by BindingDB from scientific articles - BindingDB_BindingDB_Patents_202401_tsv.zip: Data curated by BindingDB from US or WO patents - BindingDB_Data_Dictionary.pdf: Description of the columns in a BindingDB tsv file Methods: Data curated by BindingDB were collected from articles and patents using a semi-automated procedure. We first work from the source document to automatically preload a representation of the data into a web-based curation interface. Human curators then examine the preloaded data, compare it with the source document, and make corrections as needed. The corrected curation is then inspected by a second curator and any proposed corrections are discussed and made if needed. Occasionally, a user will contact BindingDB with a proposed correction. Each such proposal is reviewed individually and made if warranted. Data dictionary: Overview: BindingDB provides data files in a spreadsheet-compatible, Tab-separated value (TSV) format. Each row contains information for one binding measurement, that is, for the interaction of one small molecule ligand (also termed a compound or monomer) with one protein target. Each row includes a SMILES string for the ligand, the identity of the target, the measured affinity (usually Ki, IC50, or Kd), the source of the data, and links to related information in other databases. When a piece of information is unavailable, the data cell is left blank. BindingDB switched from the widely used comma-separated value (CSV) format to TSV because some data fields occasionally contain legitimate commas. These are improperly interpreted as value-separators, unless additional characters are added distinguish them from the commas intended as value-separators. We found that errors often occur during the addition and the subsequent removal of these additional characters. Because no data fields include legitimate Tab characters, the Tab-separated value format avoids these problems. BindingDB TSVs reference following databases: ChEBI: www.ebi.ac.uk/chebi ChEMBL: www.ebi.ac.uk/chembl CSAR: www.csardock.org D3R: https://drugdesigndata.org DrugBank: www.drugbank.ca Guide to Pharmacology: https://www.guidetopharmacology.org/ KEGG: www.genome.jp/kegg PDB: www.pdb.org PDSP Ki: pdsp.med.unc.edu/databases/kidb.php PubChem: pubchem.ncbi.nlm.nih.gov PubMed: www.ncbi.nlm.nih.gov/pubmed UniProtKB: www.uniprot.org ZINC: zinc.docking.org Columns in a BindingDB TSV file: The file BindingDB_Data_Dictionary.pdf defines the content of each column, in the order presented. Note that the total number of columns can vary from row to row, because the row ends with a set of columns that repeats for each protein chain (i.e., BindingDB Polymer) of the target. For example, if a protein is a trimer, and if this information was fully captured by the curator, then the last set of columns will occur three times, once for each protein chain. Technical details: The Oracle data dump was exported from Oracle 12c Enterprise Edition release 12.2.0.1.0- 64-bit production. The MySQL dump was exported from MySQL version 8.0.26. License: Data marked as curated by BindingDB are released under the Creative Commons Attribution 3.0 License. Data marked as curated by ChEMBL are shared under the Creative Commons Attribution-Share Alike 3.0 Unported License selected by ChEMBL.7