Coder Social home page Coder Social logo

cfdnapattern's Introduction

CfdnaPattern

Pattern Recognition for Cell-free DNA

Predict a fastq is cfdna or not

# predict a single file
python predict.py <single_fastq_file>

# predict files
python predict.py <fastq_file1> <fastq_file2> ... 

# predict files with wildcard
python predict.py *.fq

warning: this tool doesn't work for trimmed fastq

prediction output

For each file given in the command line, this tool will output a line <prediction>: <filename>, like

cfdna: /fq/160220_NS500713_0040_AHVNG2BGXX/20160220-cfdna-001_S1_R1_001.fastq.gz
cfdna: /fq/160220_NS500713_0040_AHVNG2BGXX/20160220-cfdna-001_S1_R2_001.fastq.gz
not-cfdna: /fq/160220_NS500713_0040_AHVNG2BGXX/20160220-gdna-002_S2_R1_001.fastq.gz
not-cfdna: /fq/160220_NS500713_0040_AHVNG2BGXX/20160220-gdna-002_S2_R2_001.fastq.gz

Add -q or --quite to enable quite output mode, in which it will only output:

  • a file with name of cfdna, but prediction is not-cfdna
  • a file without name of cfdna, but prediction is cfdna

Train a model

This tool has a pre-trained model (cfdna.model), which can be used for prediction. But you still can train a model by yourself.

  • prepare/link all your fastq files in some folder
  • for files from cfdna, include cfdna (case-insensitive) in the filename, like 20160220-cfdna-015_S15_R1_001.fq
  • for files from genomic DNA, include gdna (case-insensitive) in the filename, like 20160220-gdna-002_S2_R1_001.fq
  • for files from FFPE DNA, include ffpe (case-insensitive) in the filename, like 20160123-ffpe-040_S0_R1_001.fq
  • run:
python train.py /fastq_folder/*.fq

Citation

If you used CfdnaPattern for your publication, please cite: https://doi.org/10.1109/TCBB.2017.2723388

Full options:

python training.py <fastq_files> [options] 

Options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit
  -m MODEL_FILE, --model=MODEL_FILE
                        specify which file to store the built model.
  -a ALGORITHM, --algorithm=ALGORITHM
                        specify which algorithm to use for classfication,
                        candidates are svm/knn/rbf/rf/gnb/benchmark, rbf means
                        svm using rbf kernel, rf means random forest, gnb
                        means Gaussian Naive Bayes, benchmark will try every
                        algorithm and plot the score figure, default is knn.
  -c CFDNA_FLAG, --cfdna_flag=CFDNA_FLAG
                        specify the filename flag of cfdna files, separated by
                        semicolon. default is: cfdna
  -o OTHER_FLAG, --other_flag=OTHER_FLAG
                        specify the filename flag of other files, separated by
                        semicolon. default is: gdna;ffpe
  -p PASSES, --passes=PASSES
                        specify how many passes to do training and validating,
                        default is 10.
  -n, --no_cache_check  if the cache file exists, use it without checking the
                        identity with input files

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.