Coder Social home page Coder Social logo

tox21_dataset's Introduction

Tox21 Data Challenge 2014 Datasets

Introduction

Toxicity evaluation of newly synthesized or used compounds is one of the main challenges during product development in many areas of industry. For example, toxicity is the second reason—after lack of efficacy—for failure in preclinical and clinical studies of drug candidates. To avoid attrition at the late stage of the drug development process, the toxicity analyses are employed at the early stages of a discovery pipeline, along with activity and selectivity enhancing. Although many assays for screening in vitro toxicity are available, their massive application is not always time and cost effective. Thus, the need for fast and reliable in silico tools, which can be used not only for toxicity prediction of existing compounds, but also for prioritization of compounds planned for synthesis or acquisition. Here I present the benchmark results of the combination of various attribute selection methods and machine learning algorithms and their application to the data sets of the Tox21 Data Challenge. The best performing method: Best First for attribute selection with the Rotation Forest/ADTree classifier offers good accuracy for most tested cases. For 11 out of 12 targets, the AUROC value for the final evaluation set was = 0.72, while for three targets the AUROC value was = 0.80, with the average AUROC being 0.784 ± 0.069. The use of two-dimensional descriptors sets enables fast screening and compound prioritization even for a very large database. Open source tools used in this project make the presented approach widely available and encourage the community to further improve the presented scheme. F. Stefaniak, Front. Environ. Sci., 01 December 2015

Datasets

Dataset used for modeling in the Tox21 Data Challenge 2014. Preprocessing steps, descriptors and modeling approach is decribed in F. Stefaniak, Front. Environ. Sci., 01 December 2015.

  • compounds - list of compounds' names, activity and SMILES codes for compounds used in the training dataset
  • datasets - tables with descriptors and activity values (column: Activity) in csv and arff/weka formats.

datasets

Results

results

Global comparison:

Source: Efficient toxicity prediction via simple features using shallow neural networks and decision trees, A Karim, A Mishra, MAH Newton, A Sattar - ACS Omega, 2019 - ACS Publications.

Source: Toxicity prediction using deep learning, T Unterthiner, A Mayr, G Klambauer… - arXiv preprint arXiv …, 2015 - arxiv.org

tox21_dataset's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

xly1524 nscr45

tox21_dataset's Issues

is there any docs

is there any documentations for how to use this datasets.
can you help me?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.