Coder Social home page Coder Social logo

cfdnapattern's Introduction

CfdnaPattern

Pattern Recognition for Cell-free DNA

Predict a fastq is cfdna or not

# predict a single file
python predict.py <single_fastq_file>

# predict files
python predict.py <fastq_file1> <fastq_file2> ... 

# predict files with wildcard
python predict.py *.fq

warning: this tool doesn't work for trimmed fastq

prediction output

For each file given in the command line, this tool will output a line <prediction>: <filename>, like

cfdna: /fq/160220_NS500713_0040_AHVNG2BGXX/20160220-cfdna-001_S1_R1_001.fastq.gz
cfdna: /fq/160220_NS500713_0040_AHVNG2BGXX/20160220-cfdna-001_S1_R2_001.fastq.gz
not-cfdna: /fq/160220_NS500713_0040_AHVNG2BGXX/20160220-gdna-002_S2_R1_001.fastq.gz
not-cfdna: /fq/160220_NS500713_0040_AHVNG2BGXX/20160220-gdna-002_S2_R2_001.fastq.gz

Add -q or --quite to enable quite output mode, in which it will only output:

  • a file with name of cfdna, but prediction is not-cfdna
  • a file without name of cfdna, but prediction is cfdna

Train a model

This tool has a pre-trained model (cfdna.model), which can be used for prediction. But you still can train a model by yourself.

  • prepare/link all your fastq files in some folder
  • for files from cfdna, include cfdna (case-insensitive) in the filename, like 20160220-cfdna-015_S15_R1_001.fq
  • for files from genomic DNA, include gdna (case-insensitive) in the filename, like 20160220-gdna-002_S2_R1_001.fq
  • for files from FFPE DNA, include ffpe (case-insensitive) in the filename, like 20160123-ffpe-040_S0_R1_001.fq
  • run:
python train.py /fastq_folder/*.fq

Citation

If you used CfdnaPattern for your publication, please cite: https://doi.org/10.1109/TCBB.2017.2723388

Full options:

python training.py <fastq_files> [options] 

Options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit
  -m MODEL_FILE, --model=MODEL_FILE
                        specify which file to store the built model.
  -a ALGORITHM, --algorithm=ALGORITHM
                        specify which algorithm to use for classfication,
                        candidates are svm/knn/rbf/rf/gnb/benchmark, rbf means
                        svm using rbf kernel, rf means random forest, gnb
                        means Gaussian Naive Bayes, benchmark will try every
                        algorithm and plot the score figure, default is knn.
  -c CFDNA_FLAG, --cfdna_flag=CFDNA_FLAG
                        specify the filename flag of cfdna files, separated by
                        semicolon. default is: cfdna
  -o OTHER_FLAG, --other_flag=OTHER_FLAG
                        specify the filename flag of other files, separated by
                        semicolon. default is: gdna;ffpe
  -p PASSES, --passes=PASSES
                        specify how many passes to do training and validating,
                        default is 10.
  -n, --no_cache_check  if the cache file exists, use it without checking the
                        identity with input files

cfdnapattern's People

Contributors

sfchen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cfdnapattern's Issues

Does this tool involve seeding?

Does this tool involve some kind of random seeding or do you expect it to produce identical results each time it is run with the same input data? If it involves seeding, is it possible to fix the seeds for repeatability?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.