Coder Social home page Coder Social logo

tf-idf's Introduction

TF-IDF method for LGT detection.

This is the source code of TF-IDF program. This program is specific for LGT detection.

NOTE: This version of the TF-IDF program has been modified to fix a memory leak that was occuring during the final stage when the results were being written to file. The way in which the program is called on the command line has also changed and the log messages has been made more informative. The results produced by this version should be identical as the core algorithm is unchanged.

Input Files

The input should be seqeuence part of a FASTA file. e.g. there are 5 sequences for LGT detection, the input file should be (minus the '#' comments which are added for this example):

ACCCGGGGTTTTCAAA # seq1 ACCCTTGGGGCCCAAT # seq2 ACCCGGGTTCCAAAAA # seq3 ACGGTTGGGGCCCAAT # seq4 ACCCTTGGGGCCCCCT # seq5

The header of each sequence is not required.

The group information should be provided in a separate file. e.g. there are 5 sequences in a dataset, and the first 2 sequences are in one group, the rest 3 sequences are in another group. Then the group information should be:

2 # amount of groups 1 2 # sequence IDs in group 1 3 4 5 # sequence IDs in group 2

To Compile

The program is written in C++. It can be compiled by GCC 4.4.7 (has also been tested with 9.4.0). If you have any question, please contact me by email [email protected].

g++ -O3 -o tf-idf tf-idf.cpp

To Run

An example of how to run this program from command line:

tf-idf seqfile.txt 40 0.05 < speciesInfo.txt

Where:

  • seqfile.txt is the file with your sequences (in the format shown above)
  • 40 is the k-mer size to use (40 is the default for the original version of program; we suggest you keep this set at 40)
  • 0.05 is the significant level used for the significant test
  • speciesInfo.txt is the group information file (in the format shown above)

tf-idf's People

Contributors

congyingnan avatar timothystephens avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.