Coder Social home page Coder Social logo

eliottbo / mlstclassifier_cd Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 350 KB

multi-locus sequence type clade classifier for Clostridioides difficile

License: Apache License 2.0

Python 100.00%
bioinformatics clade classifier machine-learning microbiology mlst multi-locus-sequence-typing tool

mlstclassifier_cd's Introduction

MLSTclassifier_cd

Table of Contents

Overview

Enhance your clade prediction process with MLSTclassifier_cd, a powerful machine learning tool that employs K-Nearest Neighbors (KNN) algorithm. Designed specifically for Multi-Locus Sequence Type (MLST) analysis of C. difficile strains, including cryptic variants, this tool streamlines and accelerates clade prediction. MLSTclassifier_cd achieves a prediction accuracy of approximately 92%.

StatQuest methodology was used to build the model (https://www.youtube.com/watch?v=q90UDEgYqeI&t=3327s). Powered by the Scikit-learn library, MLSTclassifier_cd is a good tool to have a first classification of your C. difficile strains including cryptic ones.

The model was trained using data from PubMLST (May 2023): https://pubmlst.org/bigsdb?db=pubmlst_cdifficile_seqdef&page=downloadProfiles&scheme_id=1. Cryptic strains for training were assessed manually using phylogenetic tree construction, fastbaps and popPUNK to refine clustering.

GitHub repo: https://github.com/eliottBo/MLSTclassifier_cd

Installation:

It is recommended to use a virtual environment.

Install PyPI package: pip install mlstclassifier-cd

https://pypi.org/project/mlstclassifier-cd/

Usage:

The first argument is a path to a directory containing ".mlst" (like the ones optained from PubMLST) or ".fastmlst" files from FastMLST. The second argument is a path to the output directory where the output files will be.

Basic Command:

MLSTclassifier_cd [input directory path] [output directory path]

Example: MLSTclassifier_cd /Desktop/input_directory_name /Desktop/output_directory_name/

Output:

After running MLSTclassifier_cd, the result file contain a column named "predicted_clade". It also creates the following files:

  • "pie_chart.html" plot representing the proportions of the different clades found.
  • "count.csv" a csv file containing the raw value count of your predicted clades for you to generate your own graphs!

mlstclassifier_cd's People

Contributors

eliottbo avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.