rdpstaff / classifier Goto Github PK

RDP extensible sequence classifier for fungal lsu, bacterial and archaeal 16s

License: GNU General Public License v2.0

Java 100.00%

classifier's Introduction

INTRO

The RDP Classifier is a naive Bayesian classifier which was developed to provide rapid taxonomic placement based on rRNA sequence data. The RDP Classifier can rapidly and accurately classify bacterial and archaeal 16s rRNA sequences, and Fungal LSU sequences. It provides taxonomic assignments from domain to genus, with confidence estimates for each assignment. The RDP Classifier likely can be adapted to additional phylogenetically coherent bacterial taxonomies. The online version of RDP Classifier can be found at http://rdp.cme.msu.edu/classifier/classifier.jsp.

How to cite Classifier? Wang, Q, G. M. Garrity, J. M. Tiedje, and J. R. Cole. 2007.
Naive Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy. Appl Environ Microbiol. 73(16):5261-7.

Gene Copy Number Adjustment
Classifier provides gene copy number adjustment for 16S gene sequences (see http://rdp.cme.msu.edu/classifier/class_help.jsp#copynumber).
The precompiled Classifier was trained with the 16S gene copy number data provided by rrnDB website. The Classifier can be trained with user-provided gene copy number data. See How to Train the Classifier below. Classifier outputs both copy number adjusted and unadjusted assignment counts in the hierarchical output files.

BIOM Format

The Classifier can take an input minimal (or rich) dense BIOM file as input with an optional Metadata file, and produces a rich dense BIOM file. If an input cluster BIOM file ( version 1.0) is provided, along with the representative sequences input (output from Clustering rep-seqs subcommand with "-c" option to make sure rep seq Ids match the cluster Ids in the BIOM file, see http://rdp.cme.msu.edu/tutorials/stats/RDPtutorial_statistics.html), the classification result of each sequence will replace the taxonomy of the corresponding cluster. If a metadata file is provided, the information will replace the metadata of the corresponding sample. The resulting rich dense BIOM file can be used by thirdparty tools such as phyloseq or QIIME etc.

In order to use BIOM files as input, the format must be specified in the command line with "-f biom". Then, the biom file is specified with the command "-m /path/to/biom_file.biom". Including a Metadata file is optional and can be included by using the command "-d /path/to/metadata.txt".

QUICKSTART

ant jar
Some commands in this tutorial depend on RDP Clustering. See RDPTools (https://github.com/rdpstaff/RDPTools) to install.

USAGE

There are many subcommands offered by the Classifier package. The default subcommand is classify.

java -Xmx1g -jar /path/to/classifier.jar
USAGE: ClassifierMain <subcommand> <subcommand args ...>
classify - classify one or multiple samples
crossvalidate - cross validate accuracy testing
libcompare - compare two samples
loot - leave one (sequence or taxon) out accuracy testing
merge-detail - merge classification detail result files to create a taxon assignment counts file
merge-count - merge multiple taxon assignment count files to into one count file
random-sample - random select a subset or subregion of sequences
rm-dupseq - remove identical or any sequence contained by another sequence
rm-partialseq - remove partial sequences
taxa-sim - calculate and plot the similarities within taxa
train - retrain classifier

1. Classify one or more samples
usage: [options] <samplefile>[,idmappingfile] ...
-c,--conf <arg> assignment confidence cutoff used to determine the
assignment count for each taxon. Range [0-1],
Default is 0.8.
-f,--format <arg> tab-delimited output format:
[allrank|fixrank|biom|filterbyconf|db]. Default is
allRank.
allrank: outputs the results for all ranks applied
for each sequence: seqname, orientation, taxon
name, rank, conf, ...
fixrank: only outputs the results for fixed ranks
in order: domain, phylum, class, order, family,
genus
biom: outputs rich dense biom format if OTU or
metadata provided
filterbyconf: only outputs the results for major
ranks as in fixrank, results below the confidence
cutoff were bin to a higher rank unclassified_node
db: outputs the seqname, trainset_no, tax_id, conf.
-g,--gene <arg> 16srrna, fungallsu, fungalits_warcup,
fungalits_unite. Default is 16srrna. This option
can be overwritten by -t option
-h,--hier_outfile <arg> tab-delimited output file containing the assignment
count for each taxon in the hierarchical format.
Default is null.
-o,--outputFile <arg> tab-delimited text output file for classification
assignment.
-q,--queryFile legacy option, no longer needed
-t,--train_propfile <arg> property file containing the mapping of the
training files if not using the default. Note: the
training files and the property file should be in
the same directory.
-w,--minWords <arg> minimum number of words for each bootstrap trial.
Default(maximum) is 1/8 of the words of each
sequence. Minimum is 5

[Example command to classify sequences ]:
java -Xmx1g -jar /path/to/classifier.jar classify -c 0.5 -o usga_classified.txt -h soil_hier.txt samplefiles/USGA_2_4_B_trimmed.fasta

To speedup classification when large number of duplicate sequences exist in the inputs, you can dereplicate the input files first and use both the unique sequence fasta and idmapping file as input. The classification assignment output only contains the results of the unique sequences, the assignment counts in the hier_out_file are expanded to reflect the full sets. The hier_outfile can be imported to Excel to make plots, or loaded into R program as a data matrix.

[Example command to classify sequences with idmapping]:
java -jar /path/to/Clustering.jar derep -u -o Native_1_4_A_derep.fasta Native_1_4_A.ids Native_1_4_A.sample samplefiles/Native_1_4_A_trimmed.fasta
java -Xmx1g -jar /path/to/classifier.jar classify -c 0.5 -o native_classified.txt -h soil_hier.txt Native_1_4_A_derep.fasta,Native_1_4_A.ids
java -Xmx1g -jar /path/to/classifier.jar classify -c 0.5 -f fixrank -o soil_classified.txt -h soil_hier.txt Native_1_4_A_derep.fasta,Native_1_4_A.ids samplefiles/USGA_2_4_B_trimmed.fasta

Notes:
The bootstrap assignment strategy has been changed to avoid over-predication problem when multiple genera are tied for highest score occurred during bootstrap trials. This happens when every sequence in multiple genera (say N) contains the same partial sequence. One of the genera will be randomly chosen from the list of N genera with the highest tie score. If the tie score occurred during the genus assignment deterministic step, the first genus will be chosen. In this way, the genus assignment will remain deterministic but the bootstrap score will be close to 1/N .

By default, the Classifier output the results for all ranks applied for each sequence. Some users found the format "fixrank" useful to load into third party analysis tools. When "fixrank" is specified, the Classifier outputs the results in a fixed rank order as described above. In case of missing ranks in the lineage, the bootstrap value and the taxon name from the immediate lower rank will be reported. This eliminates the gaps in the lineage, but also introduces non-existing taxon name and rank. User should interpret the "fixrank" results with caution.

By default the Classifier chooses a subset of 1/8 of all the possible overlapping words from the query sequence for each bootstrap trial. The Classifier uses the minWords if the minWords is larger than 1/8 of words. Choosing more words helps gaining higher bootstrap values for short query sequence. Using larger "minWords" will increase the run time since the run time is proportional to the number and the length of the query sequences.

2. Compare two samples
This command combines classification with a statistical test to flag taxa differing significantly between libraries.

[Example command from a terminal]:
java -Xmx1g -jar /path/to/classifier.jar libcompare -q1 samplefiles/Native_1_4_A_trimmed.fasta -q2 samplefiles/USGA_2_4_B_trimmed.fasta -c 0.5 -o libcompare.txt

3. Merge classification results
If you have classified samples at different time using the same training set, you can use this command to merge the classification results and reproduce the hier_outfile with the assignment counts from all the samples from the input. Each input classification result is treated as one sample. Note taxon and rank filter options only affect the assignment output, not the hier_outfile.

[Example command to merge two classification results]:
java -Xmx1g -jar /path/to/classifier.jar merge-detail -h soil_hier.txt -o merged_classified.txt native_classified.txt,Native_1_4_A.ids usga_classified.txt

[Example command to merge two classification results, filter the classification output by confidence ]:
java -Xmx1g -jar /path/to/classifier.jar merge-detail -h soil_hier.txt -o merged_classified.txt -f filterbyconf -c 0.5 native_classified.txt,Native_1_4_A.ids usga_classified.txt

[Example command to merge two classification results, only output classification results assigned to Alphaproteobacteria and confidence at family >= 0.5 ]:
create a file taxonFilter.txt containing Alphaproteobacteria.
java -Xmx1g -jar /path/to/classifier.jar merge-detail -h soil_hier.txt -o merged_classified.txt -n taxonFilter.txt -c 0.5 -r family native_classified.txt,Native_1_4_A.ids usga_classified.txt

4. Merge assignment count results
If you have classified samples at different time using the same training set, you can merger multiple assignment count results into one assignment count file, keeping one column for each unique sample. If same sample occurred more than once, the taxon counts for this sample will be combined.

[Example command to merge three assignment count results]:
java -Xmx1g -jar /path/to/classifier.jar merge-count merged_hier.txt sampleset1_hier.txt sampleset2_hier.txt sampleset3_hier.txt

5. How to Train the Classifier
a. Follow these steps when there is a need to retrain Classifier, such as novel lineages, newly named type organisms, taxonomic rearrangements, better training set covering specific taxa, or alternative taxonomy. Two files, a taxonomy file and a training sequence file with lineage are required. Prefer high quality, full length sequences, or at least covering the entire region of gene of interest. See samplefiles for example data files. The 16S rRNA training data and Fungal LSU training data can be download from http://sourceforge.net/projects/rdp-classifier/?source=directory.
Based on our experience, trimming the sequences to a specific region does not improve accuracy. The ranks are not required to be uniform neither, which means you can define any number of ranks as necessary. The speed of the Classifier is proportional to the number of genera, not the number of training sequences.

b. Clean-up partial or duplicate training sequences to avoid inflated results in classification performance testing.
Use subcommand "rm-dupseq" to remove identical sequences or any sequence contained by another sequence. Use subcommand "rm-partialseq" to remove partial sequences based on pairwise alignment to near-full length reference sequences.

c. Plot intra taxon Similarity by fraction of matching 8-mer
Use subcommand "taxa-sim" to calculate and plot intra taxon Similarity by fraction of matching 8-mer (see example plots using fungal ITS training sets on RDP's poster http://rdp.cme.msu.edu/download/posters/MSA2014_RDP.pdf). To run taxa-sim in Headless mode without GUI display, use the following options:
java -Djava.awt.headless=true -jar classifier.jar taxa-sim

d. Estimate the accuracy of your own training data using leave-one-out testing
The program will output a tab-delimited test result file which can be loaded to Excel and plot the accuracy rates. It also contains the list of misclassified sequences and the rank when misclassified seqs group by taxon. Examine the result careful to spot errors in the taxonomy.

Leave-one-sequence-out testing: each iteration one sequence from the training set was chosen as a test sequence. That sequence was removed from training set. The assignment of the sequence produced by the Classifier was compared to the original taxonomy label to measure the accuracy of the Classifier.
[Example command with length of 400]:
java -Xmx1g -jar /path/to/classifier.jar loot -q samplefiles/Armatimonadetes.fasta -s samplefiles/new_trainset.fasta -t samplefiles/new_trainset_db_taxid.txt -l 400 -o Armatimonadetes_400_loso_test.txt

Leave-one-taxon-out testing: similar to the leave-one-sequence-out testing except for each test sequence, the lowest taxon that sequence assigned to (either species or genus node) was removed from the training set. This is intended to test if the species or genus is no present in the training set, how likely the Classifier can assign the sequence to the correct genus or higher taxa.
[Example command ]:
java -Xmx1g -jar /path/to/classifier.jar loot -h -q samplefiles/Armatimonadetes.fasta -s samplefiles/new_trainset.fasta -t samplefiles/new_trainset_db_taxid.txt -o Armatimonadetes_loto_test.txt

e. Cross validate testing
This comand performs a random sub-sampling validation. It calculate the values of (1-specificity) and sensitivity for each rank at each bootstrap cutoff.

[Example command to do cross validation testing with length of 400]:
java -Xmx1g -jar /path/to/classifier.jar crossvalidate -o crossvalidate_400.txt -s samplefiles/new_trainset.fasta -t samplefiles/new_trainset_db_taxid.txt -l 400

f. Train classifer
If you are satisfied with the testing results, go ahead to train the classifier.

[Example command to train classifier]:
java -Xmx1g -jar /path/to/classifier.jar train -o mytrained -s samplefiles/new_trainset.fasta -t samplefiles/new_trainset_db_taxid.txt -c gene_copynumber.txt
cp samplefiles/rRNAClassifier.properties mytrained/

"-c" specify the gene copy number file. It should at least three columns: name, rank and mean for the lowest rank taxon to be trained. See example file samplefiles/gene_copynumber.txt

g. Classify sequences using the new model

[Example command to classify with the new model using "-t" option]:
java -Xmx1g -jar /path/to/classifier.jar classify -t mytrained/rRNAClassifier.properties -o Armatimonadetes_classified.txt samplefiles/Armatimonadetes.fasta

classifier's People

Contributors

Stargazers

Watchers

classifier's Issues

Online RDP classifier (http://rdp.cme.msu.edu/classifier/classifier.jsp.) cannot be opened

Hi,
Thank you for reading this! I've been using the online RDP classifier for taxonomy classification, but I can't open the website recently. The error message is:

This site can’t be reached
http://rdp.cme.msu.edu/classifier/classifier.jsp is unreachable.
ERR_ADDRESS_UNREACHABLE

Thank you in advance for the help!

Continuous errors with classifier subcommands - problem in tax file?

Hi there

We have been struggling a while with errors when running taxa-sim and loot as subcommands in classifier with our own created database and tax file (using the methods of [https://github.com/iimog/meta-barcoding-dual-indexing]).

The most recent error we are getting is:

Exception in thread "main" java.lang.IllegalArgumentException:
The taxID for ancestor 'f__undef__27' of sequence '27526738' at depth '5' with parent id '272' is not found!
at edu.msu.cme.rdp.classifier.train.validation.TreeFactory.getTaxonomy(TreeFactory.java:213)
at edu.msu.cme.rdp.classifier.train.validation.TreeFactory.addSequence(TreeFactory.java:167)
at edu.msu.cme.rdp.classifier.train.validation.TreeFactory.addSequence(TreeFactory.java:149)
at edu.msu.cme.rdp.classifier.train.validation.leaveoneout.LeaveOneOutTesterMain.createTree(LeaveOneOutTesterMain.java:109)
at edu.msu.cme.rdp.classifier.train.validation.leaveoneout.LeaveOneOutTesterMain.(LeaveOneOutTesterMain.java:79)
at edu.msu.cme.rdp.classifier.train.validation.leaveoneout.LeaveOneOutTesterMain.main(LeaveOneOutTesterMain.java:186)
at edu.msu.cme.rdp.classifier.cli.ClassifierMain.main(ClassifierMain.java:75)

There are no weird characters like "*" or anything in the file (saw it was a problem with someone else), everything seems fine at first glance. I am not a bioinformaticist though and our senior bioinformaticist is swamped. What could this error be indicating?

Will it be helpful to share a shortened version of the sequence file and the tax file? Or the full files?
Any help will be greatly appreciated, we are at wits' end.

Thanks a lot,
Annie

Training Data Site for build.xml is Defunct

Hello,

http://rdp.cme.msu.edu/download/rdpclassifiertraindata/data.tgz appears to be permanently offline. is there another location training data can be acquired from?

How to format the taxonomy file to retrain classifier

Hi rdp staff,

I am trying to retrain RDP classifier using NCBI 16s database, however, when I looked into the example taxonomy file and the fasta file, I am a bit confused how should I even generate that file.

0*Root*-1*0*rootrank
1*Bacteria*0*1*domain
2*"Actinobacteria"*1*2*phylum
3*Actinobacteria*2*3*class
4*Acidimicrobidae*3*4*subclass
5*Acidimicrobiales*4*5*order
6*"Acidimicrobineae"*5*6*suborder
7*Acidimicrobiaceae*6*7*family
8*Acidimicrobium*7*8*genus
9*Ferrimicrobium*7*8*genus
10*Ferrithrix*7*8*genus
11*Ilumatobacter*7*8*genus
12*Iamiaceae*6*7*family
3102*Aquihabitans*12*8*genus
13*Iamia*12*8*genus

Could you please explain how each line is constructed? Allow me to take a line as an example,

6*"Acidimicrobineae"*5*6*suborder

I could guess that the first number is the taxonomy id for Acidimicrobineae, which is 6, and its parent taxonomy is 5, Acidimicrobiales. I assume that the suborder at the end of the line indicates that Acidimicrobineae is at the taxonomy rank of suborder, right? Then what is the 6 before suborder mean? when I look at 12*Iamiaceae*6*7*family, I can say Iamiaceae is a family level taxonomy, which has the parent of 6 (Acidimicrobineae) and 7 (Acidimicrobiaceae)? I am not sure I am getting what's the rule of constructing the taxonomy file here. Could you please explain how this is done?

Thanks in advance,

Eddi

Doesn't build

Hi,

I just cloned the repository and typed ant jar as suggested in the README. It didn't build. Here is the error message:

lucass@milou2 share $ git clone https://github.com/rdpstaff/classifier.git
Initialized empty Git repository in /pica/h1/lucass/share/classifier/.git/
remote: Reusing existing pack: 304, done.
remote: Total 304 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (304/304), 839.86 KiB | 629 KiB/s, done.
Resolving deltas: 100% (99/99), done.
lucass@milou2 share $ mv classifier/ rdp_classifier
lucass@milou2 share $ cd rdp_classifier/
lucass@milou2 rdp_classifier (master) $ ant jar
Buildfile: /pica/h1/lucass/share/rdp_classifier/build.xml

download-ivy:
    [mkdir] Created dir: /home/lucass/.ant/lib
      [get] Getting: http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0-rc2/ivy-2.1.0-rc2.jar
      [get] To: /home/lucass/.ant/lib/ivy.jar

init-ivy:

resolve:
[ivy:retrieve] :: Ivy 2.1.0-rc2 - 20090704004254 :: http://ant.apache.org/ivy/ ::
[ivy:retrieve] :: loading settings :: url = jar:file:/home/lucass/.ant/lib/ivy.jar!/org/apache/ivy/core/settings/ivysettings.xml
[ivy:retrieve] :: resolving dependencies :: edu.cme.rdp#classifier;[email protected]
[ivy:retrieve]  confs: [default]
[ivy:retrieve]  found commons-cli#commons-cli;1.2 in public
[ivy:retrieve]  found commons-io#commons-io;2.4 in public
[ivy:retrieve]  found junit#junit;4.8.2 in public
[ivy:retrieve] downloading http://repo1.maven.org/maven2/commons-cli/commons-cli/1.2/commons-cli-1.2-sources.jar ...
[ivy:retrieve] ...... (47kB)
[ivy:retrieve] .. (0kB)
[ivy:retrieve]  [SUCCESSFUL ] commons-cli#commons-cli;1.2!commons-cli.jar(source) (403ms)
[ivy:retrieve] downloading http://repo1.maven.org/maven2/commons-cli/commons-cli/1.2/commons-cli-1.2.jar ...
[ivy:retrieve] .... (40kB)
[ivy:retrieve] .. (0kB)
[ivy:retrieve]  [SUCCESSFUL ] commons-cli#commons-cli;1.2!commons-cli.jar (400ms)
[ivy:retrieve] downloading http://repo1.maven.org/maven2/commons-cli/commons-cli/1.2/commons-cli-1.2-javadoc.jar ...
[ivy:retrieve] ............................... (209kB)
[ivy:retrieve] .. (0kB)
[ivy:retrieve]  [SUCCESSFUL ] commons-cli#commons-cli;1.2!commons-cli.jar(javadoc) (474ms)
[ivy:retrieve] downloading http://repo1.maven.org/maven2/commons-io/commons-io/2.4/commons-io-2.4.jar ...
[ivy:retrieve] .......... (180kB)
[ivy:retrieve] .. (0kB)
[ivy:retrieve]  [SUCCESSFUL ] commons-io#commons-io;2.4!commons-io.jar (430ms)
[ivy:retrieve] downloading http://repo1.maven.org/maven2/commons-io/commons-io/2.4/commons-io-2.4-sources.jar ...
[ivy:retrieve] ................... (240kB)
[ivy:retrieve] .. (0kB)
[ivy:retrieve]  [SUCCESSFUL ] commons-io#commons-io;2.4!commons-io.jar(source) (455ms)
[ivy:retrieve] downloading http://repo1.maven.org/maven2/commons-io/commons-io/2.4/commons-io-2.4-javadoc.jar ...
[ivy:retrieve] ....................................................... (707kB)
[ivy:retrieve] .. (0kB)
[ivy:retrieve]  [SUCCESSFUL ] commons-io#commons-io;2.4!commons-io.jar(javadoc) (514ms)
[ivy:retrieve] downloading http://repo1.maven.org/maven2/junit/junit/4.8.2/junit-4.8.2-javadoc.jar ...
[ivy:retrieve] ................. (387kB)
[ivy:retrieve] .. (0kB)
[ivy:retrieve]  [SUCCESSFUL ] junit#junit;4.8.2!junit.jar(javadoc) (429ms)
[ivy:retrieve] downloading http://repo1.maven.org/maven2/junit/junit/4.8.2/junit-4.8.2-sources.jar ...
[ivy:retrieve] .... (143kB)
[ivy:retrieve] .. (0kB)
[ivy:retrieve]  [SUCCESSFUL ] junit#junit;4.8.2!junit.jar(source) (353ms)
[ivy:retrieve] downloading http://repo1.maven.org/maven2/junit/junit/4.8.2/junit-4.8.2.jar ...
[ivy:retrieve] ........ (231kB)
[ivy:retrieve] .. (0kB)
[ivy:retrieve]  [SUCCESSFUL ] junit#junit;4.8.2!junit.jar (403ms)
[ivy:retrieve] :: resolution report :: resolve 6344ms :: artifacts dl 3887ms
        ---------------------------------------------------------------------
        |                  |            modules            ||   artifacts   |
        |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
        ---------------------------------------------------------------------
        |      default     |   3   |   3   |   3   |   0   ||   9   |   9   |
        ---------------------------------------------------------------------
[ivy:retrieve] :: retrieving :: edu.cme.rdp#classifier
[ivy:retrieve]  confs: [default]
[ivy:retrieve]  9 artifacts copied, 0 already retrieved (2188kB/52ms)

-pre-init:

-init-private:

-init-user:

-init-project:

-init-macrodef-property:

-do-init:

-post-init:

-init-check:

-init-ap-cmdline-properties:

-init-macrodef-javac-with-processors:

-init-macrodef-javac-without-processors:

-init-macrodef-javac:

-init-macrodef-junit:

-init-debug-args:

-init-macrodef-nbjpda:

-init-macrodef-debug:

-init-macrodef-java:

-init-presetdef-jar:

-init-ap-cmdline-supported:

-init-ap-cmdline:

init:

-deps-jar-init:

deps-jar:
    [mkdir] Created dir: /pica/h1/lucass/share/rdp_classifier/build

-warn-already-built-jar:
[propertyfile] Updating property file: /pica/h1/lucass/share/rdp_classifier/build/built-jar.properties

-check-call-dep:

-maybe-call-dep:

BUILD FAILED
/pica/h1/lucass/share/rdp_classifier/nbproject/build-impl.xml:581: The following error occurred while executing this line:
/pica/h1/lucass/share/rdp_classifier/nbproject/build-impl.xml:1074: The following error occurred while executing this line:
java.io.FileNotFoundException: /pica/h1/lucass/share/ReadSeq/build.xml (No such file or directory)
        at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.<init>(FileInputStream.java:138)
        at org.apache.tools.ant.helper.ProjectHelper2.parse(ProjectHelper2.java:268)
        at org.apache.tools.ant.helper.ProjectHelper2.parse(ProjectHelper2.java:177)
        at org.apache.tools.ant.ProjectHelper.configureProject(ProjectHelper.java:82)
        at org.apache.tools.ant.taskdefs.Ant.execute(Ant.java:393)
        at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
        at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
        at org.apache.tools.ant.Task.perform(Task.java:348)
        at org.apache.tools.ant.Target.execute(Target.java:390)
        at org.apache.tools.ant.Target.performTasks(Target.java:411)
        at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1397)
        at org.apache.tools.ant.helper.SingleCheckExecutor.executeTargets(SingleCheckExecutor.java:38)
        at org.apache.tools.ant.Project.executeTargets(Project.java:1249)
        at org.apache.tools.ant.taskdefs.Ant.execute(Ant.java:442)
        at org.apache.tools.ant.taskdefs.CallTarget.execute(CallTarget.java:105)
        at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
        at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
        at org.apache.tools.ant.Task.perform(Task.java:348)
        at org.apache.tools.ant.Target.execute(Target.java:390)
        at org.apache.tools.ant.Target.performTasks(Target.java:411)
        at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1397)
        at org.apache.tools.ant.Project.executeTarget(Project.java:1366)
        at org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.java:41)
        at org.apache.tools.ant.Project.executeTargets(Project.java:1249)
        at org.apache.tools.ant.Main.runBuild(Main.java:801)
        at org.apache.tools.ant.Main.startAnt(Main.java:218)
        at org.apache.tools.ant.launch.Launcher.run(Launcher.java:280)
        at org.apache.tools.ant.launch.Launcher.main(Launcher.java:109)

Total time: 12 seconds

Can't find "commons-io.jar"

OK so I figured out that it needed two dependencies ("ReadSeq" and "TaxonomyTree") but now it is complaining it can't find a file in a directory that doesn't exist. Did someone hard code a path ?

download-traindata:
      [get] Getting: http://rdp.cme.msu.edu/download/rdpclassifiertraindata/data.tgz
      [get] To: /pica/h1/lucass/share/rdp/classifier/build/classes/data.tgz
    [untar] Expanding: /pica/h1/lucass/share/rdp/classifier/build/classes/data.tgz into /pica/h1/lucass/share/rdp/classifier/build/classes
     [move] Moving 1 file to /pica/h1/lucass/share/rdp/classifier/dist

jar:

BUILD FAILED
/pica/h1/lucass/share/rdp/classifier/build.xml:131: Warning: Could not find resource file "/scratch/jars/commons-io.jar" to copy.

Total time: 44 seconds

Which is canonical, this github repo (v2.10.1) or sourceforge (v2.16)

It is not clear to me which I should be using: the sourceforge version, https://sourceforge.net/projects/rdp-classifier/ which is v2.12, Last Update: 2016-07-12, or the version here on this github repo which is v2.10.2 and last update is 2015-09-15. The one at sourceforge looks like it is the more recent one.

I would appreciate some help on this. :-)
Thanks,
Glen

custom training data error

Hi RDP Team,

I have run into an issue while training the classifier with a custom dataset. I get an error similar to the followng :

Exception in thread "main" java.lang.IllegalArgumentException: Sequence GAXI01005455.1.1233 has different lowest rank: L_7 from the previous lowest rank: L_11
Any idea why this happens?

Edit : to elaborate, Im trying to train the RDP classifier with the SILVA v128 SSU Ref99 (available here : https://www.arb-silva.de/fileadmin/silva_databases/release_128/Exports/SILVA_128_SSURef_Nr99_tax_silva.fasta.gz). I rebuilt a taxonomy file from scratch using the lineage2taxTrain.py script.

Problems with make file

I have been trying to make the RDPtools, however I keep running into make issues. I am following the instructions on the README file, I am making using super user prevligies (SU). Below is the output. Kindly help me figure this out.

ant -f Clustering/build.xml jar
Buildfile: /opt/RDPTools/Clustering/build.xml

download-ivy:
[get] Getting: http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0-rc2/ivy-2.1.0-rc2.jar
[get] To: /root/.ant/lib/ivy.jar
[get] Not modified - so not downloaded

init-ivy:

resolve:
[ivy:retrieve] :: Ivy 2.1.0-rc2 - 20090704004254 :: http://ant.apache.org/ivy/ ::
[ivy:retrieve] :: loading settings :: url = jar:file:/root/.ant/lib/ivy.jar!/org/apache/ivy/core/settings/ivysettings.xml
[ivy:retrieve] :: resolving dependencies :: edu.cme.rdp#clustering;[email protected]
[ivy:retrieve] confs: [default]
[ivy:retrieve] found com.sun.xml.bind#jaxb-impl;2.2.7 in public
[ivy:retrieve] found com.sun.xml.bind#jaxb-core;2.2.7 in public
[ivy:retrieve] found javax.xml.bind#jaxb-api;2.2.7 in public
[ivy:retrieve] found com.sun.istack#istack-commons-runtime;2.16 in public
[ivy:retrieve] found com.sun.xml.fastinfoset#FastInfoset;1.2.12 in public
[ivy:retrieve] found javax.xml.bind#jsr173_api;1.0 in public
[ivy:retrieve] found commons-cli#commons-cli;1.2 in public
[ivy:retrieve] found junit#junit;4.8.2 in public
[ivy:retrieve] found commons-codec#commons-codec;1.8 in public
[ivy:retrieve] found commons-io#commons-io;2.4 in public
[ivy:retrieve] :: resolution report :: resolve 253ms :: artifacts dl 21ms
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 10 | 0 | 0 | 0 || 20 | 0 |
---------------------------------------------------------------------
[ivy:retrieve] :: retrieving :: edu.cme.rdp#clustering
[ivy:retrieve] confs: [default]
[ivy:retrieve] 0 artifacts copied, 20 already retrieved (0kB/10ms)

-pre-init:

-init-private:

-init-user:

-init-project:

-init-macrodef-property:

-do-init:

-post-init:

-init-check:

-init-ap-cmdline-properties:

-init-macrodef-javac-with-processors:

-init-macrodef-javac-without-processors:

-init-macrodef-javac:

-init-macrodef-test-impl:

-init-macrodef-junit-init:

-init-macrodef-junit-single:

-init-test-properties:

-init-macrodef-junit-batch:

-init-macrodef-junit:

-init-macrodef-junit-impl:
Trying to override old definition of task http://www.netbeans.org/ns/j2se-project/3:test-impl

-init-macrodef-testng:

-init-macrodef-testng-impl:

-init-macrodef-test:

-init-macrodef-junit-debug:

-init-macrodef-junit-debug-batch:

-init-macrodef-junit-debug-impl:

-init-macrodef-test-debug-junit:

-init-macrodef-testng-debug:

-init-macrodef-testng-debug-impl:

-init-macrodef-test-debug-testng:

-init-macrodef-test-debug:

-init-debug-args:

-init-macrodef-nbjpda:

-init-macrodef-debug:

-init-macrodef-java:

-init-presetdef-jar:

-init-ap-cmdline-supported:

-init-ap-cmdline:

init:

-deps-jar-init:
[delete] Deleting: /opt/RDPTools/Clustering/build/built-jar.properties

deps-jar:

-warn-already-built-jar:
[propertyfile] Updating property file: /opt/RDPTools/Clustering/build/built-jar.properties

-check-call-dep:

-maybe-call-dep:

download-ivy:
[get] Getting: http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0-rc2/ivy-2.1.0-rc2.jar
[get] To: /root/.ant/lib/ivy.jar
[get] Not modified - so not downloaded

init-ivy:

resolve:
[ivy:retrieve] :: loading settings :: url = jar:file:/root/.ant/lib/ivy.jar!/org/apache/ivy/core/settings/ivysettings.xml
[ivy:retrieve] :: resolving dependencies :: edu.cme.rdp#AlignmentTools;[email protected]
[ivy:retrieve] confs: [default]
[ivy:retrieve] found commons-lang#commons-lang;2.6 in public
[ivy:retrieve] found commons-cli#commons-cli;1.2 in public
[ivy:retrieve] :: resolution report :: resolve 31ms :: artifacts dl 4ms
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 2 | 0 | 0 | 0 || 6 | 0 |
---------------------------------------------------------------------
[ivy:retrieve] :: retrieving :: edu.cme.rdp#AlignmentTools
[ivy:retrieve] confs: [default]
[ivy:retrieve] 0 artifacts copied, 6 already retrieved (0kB/3ms)

-pre-init:

-init-private:

-init-user:

-init-project:

-init-macrodef-property:

-do-init:

-post-init:

-init-check:

-init-ap-cmdline-properties:

-init-macrodef-javac-with-processors:

-init-macrodef-javac-without-processors:
Trying to override old definition of task http://www.netbeans.org/ns/j2se-project/3:javac

-init-macrodef-javac:
Trying to override old definition of task http://www.netbeans.org/ns/j2se-project/3:depend

-init-macrodef-test-impl:
Trying to override old definition of task http://www.netbeans.org/ns/j2se-project/3:test-impl

-init-macrodef-junit-init:

-init-macrodef-junit-single:

-init-test-properties:

-init-macrodef-junit-batch:

-init-macrodef-junit:

-init-macrodef-junit-impl:

-init-macrodef-testng:

-init-macrodef-testng-impl:

-init-macrodef-test:

-init-macrodef-junit-debug:

-init-macrodef-junit-debug-batch:

-init-macrodef-junit-debug-impl:

-init-macrodef-test-debug-junit:

-init-macrodef-testng-debug:

-init-macrodef-testng-debug-impl:

-init-macrodef-test-debug-testng:

-init-macrodef-test-debug:

-init-debug-args:

-init-macrodef-nbjpda:
Trying to override old definition of task http://www.netbeans.org/ns/j2se-project/1:nbjpdastart

-init-macrodef-debug:
Trying to override old definition of task http://www.netbeans.org/ns/j2se-project/3:debug

-init-macrodef-java:
Trying to override old definition of task http://www.netbeans.org/ns/j2se-project/1:java

-init-presetdef-jar:

-init-ap-cmdline-supported:

-init-ap-cmdline:

init:

-deps-jar-init:

deps-jar:

-warn-already-built-jar:
[propertyfile] Updating property file: /opt/RDPTools/Clustering/build/built-jar.properties

-check-call-dep:

-maybe-call-dep:

download-ivy:
[get] Getting: http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0-rc2/ivy-2.1.0-rc2.jar
[get] To: /root/.ant/lib/ivy.jar
[get] Not modified - so not downloaded

init-ivy:

resolve:
[ivy:retrieve] :: loading settings :: url = jar:file:/root/.ant/lib/ivy.jar!/org/apache/ivy/core/settings/ivysettings.xml
[ivy:retrieve] :: resolving dependencies :: edu.cme.rdp#ReadSeq;[email protected]
[ivy:retrieve] confs: [default]
[ivy:retrieve] found commons-lang#commons-lang;2.6 in public
[ivy:retrieve] found commons-cli#commons-cli;1.2 in public
[ivy:retrieve] found commons-io#commons-io;2.4 in public
[ivy:retrieve] :: resolution report :: resolve 39ms :: artifacts dl 6ms
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 3 | 0 | 0 | 0 || 9 | 0 |
---------------------------------------------------------------------
[ivy:retrieve] :: retrieving :: edu.cme.rdp#ReadSeq
[ivy:retrieve] confs: [default]
[ivy:retrieve] 0 artifacts copied, 9 already retrieved (0kB/3ms)

-pre-init:

-init-private:

-init-user:

-init-project:

-init-macrodef-property:

-do-init:

-post-init:

-init-check:

-init-ap-cmdline-properties:

-init-macrodef-javac-with-processors:

-init-macrodef-javac-without-processors:
Trying to override old definition of task http://www.netbeans.org/ns/j2se-project/3:javac

-init-macrodef-javac:
Trying to override old definition of task http://www.netbeans.org/ns/j2se-project/3:depend

-init-macrodef-test-impl:

-init-macrodef-junit-init:

-init-macrodef-junit-single:

-init-test-properties:

-init-macrodef-junit-batch:

-init-macrodef-junit:

-init-macrodef-junit-impl:

-init-macrodef-testng:

-init-macrodef-testng-impl:

-init-macrodef-test:

-init-macrodef-junit-debug:

-init-macrodef-junit-debug-batch:

-init-macrodef-junit-debug-impl:

-init-macrodef-test-debug-junit:

-init-macrodef-testng-debug:

-init-macrodef-testng-debug-impl:

-init-macrodef-test-debug-testng:

-init-macrodef-test-debug:

-init-debug-args:

-init-macrodef-nbjpda:
Trying to override old definition of task http://www.netbeans.org/ns/j2se-project/1:nbjpdastart

-init-macrodef-debug:
Trying to override old definition of task http://www.netbeans.org/ns/j2se-project/3:debug

-init-macrodef-java:
Trying to override old definition of task http://www.netbeans.org/ns/j2se-project/1:java

-init-presetdef-jar:

-init-ap-cmdline-supported:

-init-ap-cmdline:

init:

-deps-jar-init:

deps-jar:

-warn-already-built-jar:
[propertyfile] Updating property file: /opt/RDPTools/Clustering/build/built-jar.properties

-check-automatic-build:

-clean-after-automatic-build:

-verify-automatic-build:

-pre-pre-compile:

-pre-compile:

-copy-persistence-xml:

-compile-depend:

-do-compile:

-post-compile:

compile:

-pre-jar:

-post-jar:

jar:
[echo] To run this application from the command line without Ant, try:
[echo] java -jar "/opt/RDPTools/ReadSeq/dist/ReadSeq.jar"

-check-automatic-build:

-clean-after-automatic-build:

-verify-automatic-build:

-pre-pre-compile:

-pre-compile:

-copy-persistence-xml:

-compile-depend:

-do-compile:

-post-compile:

compile:

-pre-jar:

-post-jar:

jar:
[echo] To run this application from the command line without Ant, try:
[echo] java -jar "/opt/RDPTools/AlignmentTools/dist/AlignmentTools.jar"

-check-call-dep:

-maybe-call-dep:

-check-call-dep:

-maybe-call-dep:

download-ivy:
[get] Getting: http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0-rc2/ivy-2.1.0-rc2.jar
[get] To: /root/.ant/lib/ivy.jar
[get] Not modified - so not downloaded

init-ivy:

resolve:
[ivy:retrieve] :: loading settings :: url = jar:file:/root/.ant/lib/ivy.jar!/org/apache/ivy/core/settings/ivysettings.xml
[ivy:retrieve] :: resolving dependencies :: edu.cme.rdp#SeqFilters;[email protected]
[ivy:retrieve] confs: [default]
[ivy:retrieve] found commons-lang#commons-lang;2.6 in public
[ivy:retrieve] found commons-cli#commons-cli;1.2 in public
[ivy:retrieve] found jfree#jfreechart;1.0.13 in public
[ivy:retrieve] found jfree#jcommon;1.0.16 in public
[ivy:retrieve] :: resolution report :: resolve 46ms :: artifacts dl 6ms
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 4 | 0 | 0 | 0 || 10 | 0 |
---------------------------------------------------------------------
[ivy:retrieve] :: retrieving :: edu.cme.rdp#SeqFilters
[ivy:retrieve] confs: [default]
[ivy:retrieve] 0 artifacts copied, 10 already retrieved (0kB/3ms)

-pre-init:

-init-private:

-init-user:

-init-project:

-init-macrodef-property:

-do-init:

-post-init:

-init-check:

-init-ap-cmdline-properties:

-init-macrodef-javac-with-processors:

-init-macrodef-javac-without-processors:
Trying to override old definition of task http://www.netbeans.org/ns/j2se-project/3:javac

-init-macrodef-javac:
Trying to override old definition of task http://www.netbeans.org/ns/j2se-project/3:depend

-init-macrodef-test-impl:
Trying to override old definition of task http://www.netbeans.org/ns/j2se-project/3:test-impl

-init-macrodef-junit-init:

-init-macrodef-junit-single:

-init-test-properties:

-init-macrodef-junit-batch:

-init-macrodef-junit:

-init-macrodef-junit-impl:

-init-macrodef-testng:

-init-macrodef-testng-impl:

-init-macrodef-test:

-init-macrodef-junit-debug:

-init-macrodef-junit-debug-batch:

-init-macrodef-junit-debug-impl:

-init-macrodef-test-debug-junit:

-init-macrodef-testng-debug:

-init-macrodef-testng-debug-impl:

-init-macrodef-test-debug-testng:

-init-macrodef-test-debug:

-init-debug-args:

-init-macrodef-nbjpda:
Trying to override old definition of task http://www.netbeans.org/ns/j2se-project/1:nbjpdastart

-init-macrodef-debug:
Trying to override old definition of task http://www.netbeans.org/ns/j2se-project/3:debug

-init-macrodef-java:
Trying to override old definition of task http://www.netbeans.org/ns/j2se-project/1:java

-init-presetdef-jar:

-init-ap-cmdline-supported:

-init-ap-cmdline:

init:

-deps-jar-init:

deps-jar:

-warn-already-built-jar:
[propertyfile] Updating property file: /opt/RDPTools/Clustering/build/built-jar.properties

-check-call-dep:

-maybe-call-dep:

download-ivy:
[get] Getting: http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0-rc2/ivy-2.1.0-rc2.jar
[get] To: /root/.ant/lib/ivy.jar
[get] Not modified - so not downloaded

init-ivy:

resolve:
[ivy:retrieve] :: loading settings :: url = jar:file:/root/.ant/lib/ivy.jar!/org/apache/ivy/core/settings/ivysettings.xml
[ivy:retrieve] :: resolving dependencies :: edu.cme.rdp#ProbeMatch;[email protected]
[ivy:retrieve] confs: [default]
[ivy:retrieve] found commons-lang#commons-lang;2.6 in public
[ivy:retrieve] found commons-cli#commons-cli;1.2 in public
[ivy:retrieve] found junit#junit;4.8.2 in public
[ivy:retrieve] :: resolution report :: resolve 32ms :: artifacts dl 5ms
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 3 | 0 | 0 | 0 || 9 | 0 |
---------------------------------------------------------------------
[ivy:retrieve] :: retrieving :: edu.cme.rdp#ProbeMatch
[ivy:retrieve] confs: [default]
[ivy:retrieve] 0 artifacts copied, 9 already retrieved (0kB/3ms)

-pre-init:

-init-private:

-init-user:

-init-project:

-init-macrodef-property:

-do-init:

-post-init:

-init-check:

-init-ap-cmdline-properties:

-init-macrodef-javac-with-processors:

-init-macrodef-javac-without-processors:
Trying to override old definition of task http://www.netbeans.org/ns/j2se-project/3:javac

-init-macrodef-javac:
Trying to override old definition of task http://www.netbeans.org/ns/j2se-project/3:depend

-init-macrodef-test-impl:

-init-macrodef-junit-init:

-init-macrodef-junit-single:

-init-test-properties:

-init-macrodef-junit-batch:

-init-macrodef-junit:

-init-macrodef-junit-impl:
Trying to override old definition of task http://www.netbeans.org/ns/j2se-project/3:test-impl

-init-macrodef-testng:

-init-macrodef-testng-impl:

-init-macrodef-test:

-init-macrodef-junit-debug:

-init-macrodef-junit-debug-batch:

-init-macrodef-junit-debug-impl:

-init-macrodef-test-debug-junit:
Trying to override old definition of task http://www.netbeans.org/ns/j2se-project/3:test-debug

-init-macrodef-testng-debug:

-init-macrodef-testng-debug-impl:

-init-macrodef-test-debug-testng:

-init-macrodef-test-debug:

-init-debug-args:

-init-macrodef-nbjpda:
Trying to override old definition of task http://www.netbeans.org/ns/j2se-project/1:nbjpdastart

-init-macrodef-debug:
Trying to override old definition of task http://www.netbeans.org/ns/j2se-project/3:debug

-init-macrodef-java:
Trying to override old definition of task http://www.netbeans.org/ns/j2se-project/1:java

-init-presetdef-jar:

-init-ap-cmdline-supported:

-init-ap-cmdline:

init:

-deps-jar-init:

deps-jar:

-warn-already-built-jar:
[propertyfile] Updating property file: /opt/RDPTools/Clustering/build/built-jar.properties

-check-call-dep:

-maybe-call-dep:

-check-call-dep:

-maybe-call-dep:

-check-automatic-build:

-clean-after-automatic-build:

-verify-automatic-build:

-pre-pre-compile:

-pre-compile:

-copy-persistence-xml:

-compile-depend:

-do-compile:

-post-compile:

compile:

-pre-jar:

-post-jar:

jar:
[echo] To run this application from the command line without Ant, try:
[echo] java -jar "/opt/RDPTools/ProbeMatch/dist/ProbeMatch.jar"

-check-call-dep:

-maybe-call-dep:

-check-automatic-build:

-clean-after-automatic-build:

-verify-automatic-build:

-pre-pre-compile:

-pre-compile:

-copy-persistence-xml:

-compile-depend:

-do-compile:

-post-compile:

compile:

-pre-jar:

-post-jar:

jar:
[echo] To run this application from the command line without Ant, try:
[echo] java -jar "/opt/RDPTools/SeqFilters/dist/SeqFilters.jar"

-check-call-dep:

-maybe-call-dep:

-check-call-dep:

-maybe-call-dep:

-check-automatic-build:

-clean-after-automatic-build:

-verify-automatic-build:

-pre-pre-compile:

-pre-compile:

-copy-persistence-xml:

-compile-depend:

-do-compile:
[javac] Compiling 72 source files to /opt/RDPTools/Clustering/build/classes
[javac] warning: [options] bootstrap class path not set in conjunction with -source 1.5
[javac] /opt/RDPTools/Clustering/src/edu/msu/cme/pyro/cluster/ClusterReplay.java:26: error: cannot find symbol
[javac] import edu.msu.cme.rdp.taxatree.Taxon;
[javac] ^
[javac] symbol: class Taxon
[javac] location: package edu.msu.cme.rdp.taxatree
[javac] /opt/RDPTools/Clustering/src/edu/msu/cme/pyro/cluster/ClusterReplay.java:27: error: cannot find symbol
[javac] import edu.msu.cme.rdp.taxatree.TaxonHolder;
[javac] ^
[javac] symbol: class TaxonHolder
[javac] location: package edu.msu.cme.rdp.taxatree
[javac] /opt/RDPTools/Clustering/src/edu/msu/cme/rdp/taxatree/TreeBuilder.java:22: error: package edu.msu.cme.rdp.taxatree.utils does not exist
[javac] import edu.msu.cme.rdp.taxatree.utils.NewickPrintVisitor;
[javac] ^
[javac] /opt/RDPTools/Clustering/src/edu/msu/cme/rdp/taxatree/TreeBuilder.java:23: error: package edu.msu.cme.rdp.taxatree.utils.NewickPrintVisitor does not exist
[javac] import edu.msu.cme.rdp.taxatree.utils.NewickPrintVisitor.NewickDistanceFactory;
[javac] ^
[javac] /opt/RDPTools/Clustering/src/edu/msu/cme/rdp/taxatree/TreeBuilder.java:50: error: cannot find symbol
[javac] TaxonHolder lastMerged = null;
[javac] ^
[javac] symbol: class TaxonHolder
[javac] location: class TreeBuilder
[javac] /opt/RDPTools/Clustering/src/edu/msu/cme/rdp/taxatree/TreeBuilder.java:53: error: cannot find symbol
[javac] Map<Integer, TaxonHolder> taxonMap = new HashMap();
[javac] ^
[javac] symbol: class TaxonHolder
[javac] location: class TreeBuilder
[javac] /opt/RDPTools/Clustering/src/edu/msu/cme/rdp/taxatree/TreeBuilder.java:60: error: cannot find symbol
[javac] TaxonHolder holder;
[javac] ^
[javac] symbol: class TaxonHolder
[javac] location: class TreeBuilder
[javac] /opt/RDPTools/Clustering/src/edu/msu/cme/rdp/taxatree/TreeBuilder.java:60: error: cannot find symbol
[javac] TaxonHolder holder;
[javac] ^
[javac] symbol: class Taxon
[javac] location: class TreeBuilder
[javac] /opt/RDPTools/Clustering/src/edu/msu/cme/rdp/taxatree/TreeBuilder.java:64: error: cannot find symbol
[javac] holder = new TaxonHolder(new Taxon(taxid++, seqids.get(0), ""));
[javac] ^
[javac] symbol: class TaxonHolder
[javac] location: class TreeBuilder
[javac] /opt/RDPTools/Clustering/src/edu/msu/cme/rdp/taxatree/TreeBuilder.java:64: error: cannot find symbol
[javac] holder = new TaxonHolder(new Taxon(taxid++, seqids.get(0), ""));
[javac] ^
[javac] symbol: class Taxon
[javac] location: class TreeBuilder
[javac] /opt/RDPTools/Clustering/src/edu/msu/cme/rdp/taxatree/TreeBuilder.java:66: error: cannot find symbol
[javac] holder = new TaxonHolder(new Taxon(taxid++, "", ""));
[javac] ^
[javac] symbol: class TaxonHolder
[javac] location: class TreeBuilder
[javac] /opt/RDPTools/Clustering/src/edu/msu/cme/rdp/taxatree/TreeBuilder.java:66: error: cannot find symbol
[javac] holder = new TaxonHolder(new Taxon(taxid++, "", ""));
[javac] ^
[javac] symbol: class Taxon
[javac] location: class TreeBuilder
[javac] /opt/RDPTools/Clustering/src/edu/msu/cme/rdp/taxatree/TreeBuilder.java:70: error: cannot find symbol
[javac] TaxonHolder th = new TaxonHolder(new Taxon(id, seqid, ""));
[javac] ^
[javac] symbol: class TaxonHolder
[javac] location: class TreeBuilder
[javac] /opt/RDPTools/Clustering/src/edu/msu/cme/rdp/taxatree/TreeBuilder.java:70: error: cannot find symbol
[javac] TaxonHolder th = new TaxonHolder(new Taxon(id, seqid, ""));
[javac] ^
[javac] symbol: class TaxonHolder
[javac] location: class TreeBuilder
[javac] /opt/RDPTools/Clustering/src/edu/msu/cme/rdp/taxatree/TreeBuilder.java:70: error: cannot find symbol
[javac] TaxonHolder th = new TaxonHolder(new Taxon(id, seqid, ""));
[javac] ^
[javac] symbol: class Taxon
[javac] location: class TreeBuilder
[javac] /opt/RDPTools/Clustering/src/edu/msu/cme/rdp/taxatree/TreeBuilder.java:84: error: cannot find symbol
[javac] TaxonHolder holder = new TaxonHolder(new Taxon(taxid++, "", ""));
[javac] ^
[javac] symbol: class TaxonHolder
[javac] location: class TreeBuilder
[javac] /opt/RDPTools/Clustering/src/edu/msu/cme/rdp/taxatree/TreeBuilder.java:84: error: cannot find symbol
[javac] TaxonHolder holder = new TaxonHolder(new Taxon(taxid++, "", ""));
[javac] ^
[javac] symbol: class TaxonHolder
[javac] location: class TreeBuilder
[javac] /opt/RDPTools/Clustering/src/edu/msu/cme/rdp/taxatree/TreeBuilder.java:84: error: cannot find symbol
[javac] TaxonHolder holder = new TaxonHolder(new Taxon(taxid++, "", ""));
[javac] ^
[javac] symbol: class Taxon
[javac] location: class TreeBuilder
[javac] /opt/RDPTools/Clustering/src/edu/msu/cme/rdp/taxatree/TreeBuilder.java:106: error: cannot find symbol
[javac] NewickPrintVisitor visitor = new NewickPrintVisitor(newickTreeOut, false, new NewickDistanceFactory() {
[javac] ^
[javac] symbol: class NewickPrintVisitor
[javac] location: class TreeBuilder
[javac] /opt/RDPTools/Clustering/src/edu/msu/cme/rdp/taxatree/TreeBuilder.java:106: error: cannot find symbol
[javac] NewickPrintVisitor visitor = new NewickPrintVisitor(newickTreeOut, false, new NewickDistanceFactory() {
[javac] ^
[javac] symbol: class NewickPrintVisitor
[javac] location: class TreeBuilder
[javac] /opt/RDPTools/Clustering/src/edu/msu/cme/rdp/taxatree/TreeBuilder.java:106: error: cannot find symbol
[javac] NewickPrintVisitor visitor = new NewickPrintVisitor(newickTreeOut, false, new NewickDistanceFactory() {
[javac] ^
[javac] symbol: class NewickDistanceFactory
[javac] location: class TreeBuilder
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.
[javac] 21 errors
[javac] 1 warning

BUILD FAILED
/opt/RDPTools/Clustering/nbproject/build-impl.xml:955: The following error occurred while executing this line:
/opt/RDPTools/Clustering/nbproject/build-impl.xml:300: Compile failed; see the compiler error output for details.

Total time: 4 seconds
make: *** [Clustering/dist/Clustering.jar] Error 1

NullPointerException with crossvalidate

I'm trying to crossvalidate my training set with the latest classifier version from github (2014-11-24) on 64 bit Ubuntu 14.04 with java version "1.7.0_65" (OpenJDK). Unfortunately I get a NullPointerException each time I try:

java -Xmx1g -jar ~/software/RDPTools/classifier.jar crossvalidate -o minimal.txt -s minimal.fa -t minimal.tax
164458368
298507601
298507603
164458369
298507604
Exception in thread "main" java.lang.NullPointerException
    at edu.msu.cme.rdp.classifier.train.validation.HierarchyTree.getWordOccurrence(HierarchyTree.java:289)
    at edu.msu.cme.rdp.classifier.train.validation.NBClassifier.calculateProb(NBClassifier.java:107)
    at edu.msu.cme.rdp.classifier.train.validation.NBClassifier.assignClass(NBClassifier.java:68)
    at edu.msu.cme.rdp.classifier.train.validation.DecisionMaker.getBestClasspath(DecisionMaker.java:42)
    at edu.msu.cme.rdp.classifier.train.validation.crossvalidate.CrossValidate.runTest(CrossValidate.java:117)
    at edu.msu.cme.rdp.classifier.train.validation.crossvalidate.CrossValidateMain.main(CrossValidateMain.java:121)
    at edu.msu.cme.rdp.classifier.cli.ClassifierMain.main(ClassifierMain.java:77)

For testing purpose I attach my minimal.fa and minimal.tax below. They work fine with loot, taxa-sim and train.
minimal.fa: https://gist.github.com/iimog/aa2ce23ab6f4d63cfa2b
minimal.tax: https://gist.github.com/iimog/4ea791d8be7f51073c46

Any help would be highly regarded.

Thanks in advance,
Markus Ankenbrand

Problem with taxasim in headless mode

When running the subcommand taxasim in an environment without a X11 server an Exception is thrown at the end of execution:

java -jar classifier.jar taxa-sim rdp.tax rdp.fa rdp.fa taxasim 8 rankFile sab
100
200
300
... [truncated]
Exception in thread "main" java.lang.InternalError: Can't connect to X11 window server using 'localhost:12.0' as the value of the DISPLAY variable.
        at sun.awt.X11GraphicsEnvironment.initDisplay(Native Method)
        at sun.awt.X11GraphicsEnvironment.access$200(X11GraphicsEnvironment.java:62)
        at sun.awt.X11GraphicsEnvironment$1.run(X11GraphicsEnvironment.java:178)
        at java.security.AccessController.doPrivileged(Native Method)
        at sun.awt.X11GraphicsEnvironment.<clinit>(X11GraphicsEnvironment.java:142)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:190)
        at java.awt.GraphicsEnvironment.getLocalGraphicsEnvironment(GraphicsEnvironment.java:82)
        at sun.swing.SwingUtilities2.isLocalDisplay(SwingUtilities2.java:1406)
        at javax.swing.plaf.metal.MetalLookAndFeel.initComponentDefaults(MetalLookAndFeel.java:1563)
        at javax.swing.plaf.basic.BasicLookAndFeel.getDefaults(BasicLookAndFeel.java:147)
        at javax.swing.plaf.metal.MetalLookAndFeel.getDefaults(MetalLookAndFeel.java:1599)
        at javax.swing.UIManager.setLookAndFeel(UIManager.java:530)
        at javax.swing.UIManager.setLookAndFeel(UIManager.java:570)
        at javax.swing.UIManager.initializeDefaultLAF(UIManager.java:1320)
        at javax.swing.UIManager.initialize(UIManager.java:1407)
        at javax.swing.UIManager.maybeInitialize(UIManager.java:1395)
        at javax.swing.UIManager.getDefaults(UIManager.java:644)
        at javax.swing.UIManager.getColor(UIManager.java:686)
        at org.jfree.chart.JFreeChart.<clinit>(JFreeChart.java:261)
        at org.jfree.chart.ChartFactory.createXYLineChart(ChartFactory.java:1748)
        at edu.msu.cme.rdp.classifier.train.validation.distance.TaxaSimilarityMain.createPlot(TaxaSimilarityMain.java:324)
        at edu.msu.cme.rdp.classifier.train.validation.distance.TaxaSimilarityMain.main(TaxaSimilarityMain.java:385)
        at edu.msu.cme.rdp.classifier.cli.ClassifierMain.main(ClassifierMain.java:79)

The plot is not generated and the txt output is incomplete. This error can be avoided by invocing the classifier in headless mode:

java -Djava.awt.headless=true -jar classifier.jar taxa-sim rdp.tax rdp.fa rdp.fa taxasim 8 rankFile sab

This workaround should be documented or else the problem can be fixed in code by adding

System.setProperty("java.awt.headless", "true");

before any graphics code (e.g. in a static {} block)
See this post on Stack Overflow.

Cheers, Markus Ankenbrand

Error while getting repo2.maven to /root/.ant/lib/ivy.jar

Hello,

I tried to solve this issue for a couple of hours but nothing worked for me.
I tired to build the tools with sudo make but everytime I am running into the same issue (see below)

ant -f Clustering/build.xml jar
Buildfile: /home/davin/Documents/RDPTools/Clustering/build.xml

download-ivy:
[get] Getting: http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0-rc2/ivy-2.1.0-rc2.jar
[get] To: /root/.ant/lib/ivy.jar
[get] Error getting http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0-rc2/ivy-2.1.0-rc2.jar to /root/.ant/lib/ivy.jar

BUILD FAILED
/home/davin/Documents/RDPTools/Clustering/build.xml:87: java.net.UnknownHostException: repo2.maven.org
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:607)
at java.net.Socket.connect(Socket.java:556)
at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
at sun.net.www.http.HttpClient.(HttpClient.java:242)
at sun.net.www.http.HttpClient.New(HttpClient.java:339)
at sun.net.www.http.HttpClient.New(HttpClient.java:357)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1226)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1162)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1056)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:990)
at org.apache.tools.ant.taskdefs.Get$GetThread.openConnection(Get.java:766)
at org.apache.tools.ant.taskdefs.Get$GetThread.get(Get.java:676)
at org.apache.tools.ant.taskdefs.Get$GetThread.run(Get.java:666)

Total time: 0 seconds
Makefile:15: recipe for target 'Clustering/dist/Clustering.jar' failed
make: *** [Clustering/dist/Clustering.jar] Error 1

I tried to clone the repository to my home and ran it again. I also installed JDK version 8 and I made sure my ant version is up to date. Nothing seems to work from the other solutions I found so I opened this issue here.

Thanks in advance for any help!

merge-detail for fungal its

Hi I'm getting the following error:

cmd-> java -Xmx16g -jar /bio_bin/rdp_classifier_2.10.1/dist/classifier.jar merge-detail \
> -o combined_rdp.txt -h combined_hierarchy.txt -c 0.5 --gene fungalits_unite \
> S14MNOARP5.trimmed.noplant/S14MNOARP5.trimmed.noplant_UNITE_public_30.12.2014_UniVec.ovl.dusted.valid.rdp.tmp \
> S14MNOAEP5.trimmed.noplant/S14MNOAEP5.trimmed.noplant_UNITE_public_30.12.2014_UniVec.ovl.dusted.valid.rdp.tmp
Command Error: fungalits_unite is NOT valid, only allows 16srrna, fungallsu, fungalits_warcup and fungalits_unite

Any ideas why? The classification portion worked without issue.

Dereplication pipeline doesn't work

Hi,
I have a dataset of 16S sequencing composed by multiple fasta files (54 files) one for each sample. I would like to classify all the sequences in the dataset using the dereplication pipeline in order to speed up the entire process. However when the classification process reaches the end of the analysis, in the hier file there are only the assignments of the dereplicated file and not the assignments of the original files (the 54 files). any help would be appreciated!
Thanks in advance

Giovanni

why there are a lot of s__uncultured_bacterium_qiime_unique_taxon_tag_xxxx and norank_qiime_unique_taxon_tage was found in the taxonomy classification result

Hi, rdpstaff group,
When I use
java -Xmx1g -jar /path/to/classifier.jar classify -t mytrained/rRNAClassifier.properties -o result.tax.txt asv_seqs.fasta
and found that there there is a lot of s__uncultured_bacterium_qiime_unique_taxon_tag_xxxx and o__norank_qiime_unique_taxon_tag_xxxx results in my taxonomic classification result.

But I have seen any unique taxon_tag in my raw database.
Is this norank or uncultured came from classifier.jar ?

Classifier and CPU consumption

Hi,

When I use RDP classifier with my own databank (a very large 16S databank) the CPU usage of RDP is unacceptable : up to 2360% (see below).
This phenomena doesn't appear with the default databank and is more reduced with the databank provided in example of RDP train classifier.
How can I reduce the CPU consumption/nb threads of RDP classifier ?

Command with my databank:

java -Xmx15g -jar path/to/classifier.jar classify -c 0.8 -t path/to/my_bank.properties -o result.rdp sub.fasta

Consumption:

top - 09:51:00 up 56 days, 22:36,  0 users,  load average: 15.10, 23.87, 20.76
Tasks: 840 total,  11 running, 829 sleeping,   0 stopped,   0 zombie
Cpu(s): 81.2%us,  0.2%sy,  0.0%ni, 18.6%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  264438700k total, 84939736k used, 179498964k free,   174172k buffers
Swap: 16777208k total,    36100k used, 16741108k free, 64703676k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                        
 65765 fescudie  20   0 18.4g 6.4g  10m S 2360.5  2.6   4:59.91 java                                                                                                                                         
 65850 fescudie  20   0 13684 1776  880 R  0.7  0.0   0:00.05 top                                                                                                                                            
 65432 fescudie  20   0  104m 1948 1408 S  0.0  0.0   0:00.15 bash

Consumption with threads:

top - 10:33:10 up 56 days, 23:18,  0 users,  load average: 14.83, 10.51, 10.28
Tasks: 1305 total,  11 running, 1294 sleeping,   0 stopped,   0 zombie
Cpu(s): 41.4%us,  2.5%sy,  0.0%ni, 56.1%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  264438700k total, 83889500k used, 180549200k free,   174876k buffers
Swap: 16777208k total,    36100k used, 16741108k free, 64773160k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                        
 66871 fescudie  20   0 18.4g 5.3g   9m R 70.3  2.1   0:16.90 java                                                                                                                                           
 66876 fescudie  20   0 18.4g 5.3g   9m S 29.7  2.1   0:02.20 java                                                                                                                                           
 66889 fescudie  20   0 18.4g 5.3g   9m S 29.7  2.1   0:02.31 java                                                                                                                                           
 66891 fescudie  20   0 18.4g 5.3g   9m S 29.7  2.1   0:02.22 java                                                                                                                                           
 66897 fescudie  20   0 18.4g 5.3g   9m S 29.7  2.1   0:02.27 java                                                                                                                                           
 66878 fescudie  20   0 18.4g 5.3g   9m S 29.4  2.1   0:02.05 java                                                                                                                                           
 66879 fescudie  20   0 18.4g 5.3g   9m S 29.4  2.1   0:02.01 java                                                                                                                                           
 66881 fescudie  20   0 18.4g 5.3g   9m S 29.4  2.1   0:02.07 java                                                                                                                                           
 66882 fescudie  20   0 18.4g 5.3g   9m S 29.4  2.1   0:02.13 java                                                                                                                                           
 66884 fescudie  20   0 18.4g 5.3g   9m S 29.4  2.1   0:01.99 java                                                                                                                                           
 66886 fescudie  20   0 18.4g 5.3g   9m S 29.4  2.1   0:02.19 java                                                                                                                                           
 66890 fescudie  20   0 18.4g 5.3g   9m S 29.4  2.1   0:02.12 java                                                                                                                                           
 66892 fescudie  20   0 18.4g 5.3g   9m S 29.4  2.1   0:02.16 java                                                                                                                                           
 66893 fescudie  20   0 18.4g 5.3g   9m S 29.4  2.1   0:02.29 java                                                                                                                                           
 66894 fescudie  20   0 18.4g 5.3g   9m S 29.4  2.1   0:01.68 java                                                                                                                                           
 66895 fescudie  20   0 18.4g 5.3g   9m S 29.4  2.1   0:02.04 java                                                                                                                                           
 66896 fescudie  20   0 18.4g 5.3g   9m S 29.4  2.1   0:02.27 java                                                                                                                                           
 66898 fescudie  20   0 18.4g 5.3g   9m S 29.4  2.1   0:02.11 java                                                                                                                                           
 66875 fescudie  20   0 18.4g 5.3g   9m S 29.1  2.1   0:02.22 java                                                                                                                                           
 66877 fescudie  20   0 18.4g 5.3g   9m S 29.1  2.1   0:02.26 java                                                                                                                                           
 66899 fescudie  20   0 18.4g 5.3g   9m S 29.1  2.1   0:02.26 java                                                                                                                                           
 66885 fescudie  20   0 18.4g 5.3g   9m S 28.7  2.1   0:02.13 java                                                                                                                                           
 66880 fescudie  20   0 18.4g 5.3g   9m S 28.4  2.1   0:02.19 java                                                                                                                                           
 66874 fescudie  20   0 18.4g 5.3g   9m S 28.1  2.1   0:02.01 java                                                                                                                                           
 66872 fescudie  20   0 18.4g 5.3g   9m S 26.8  2.1   0:01.99 java                                                                                                                                           
 66873 fescudie  20   0 18.4g 5.3g   9m S 26.1  2.1   0:02.00 java                                                                                                                                           
 66883 fescudie  20   0 18.4g 5.3g   9m S 24.1  2.1   0:02.03 java                                                                                                                                           
 66888 fescudie  20   0 18.4g 5.3g   9m S 22.1  2.1   0:01.62 java                                                                                                                                           
 66887 fescudie  20   0 18.4g 5.3g   9m S 21.8  2.1   0:01.92 java                                                                                                                                           
 66912 fescudie  20   0 14080 2168  884 R  1.0  0.0   0:00.11 top                                                                                                                                            
 65432 fescudie  20   0  104m 1948 1408 S  0.0  0.0   0:00.44 bash                                                                                                                                           
 66870 fescudie  20   0 18.4g 5.3g   9m S  0.0  2.1   0:00.00 java                                                                                                                                           
 66900 fescudie  20   0 18.4g 5.3g   9m S  0.0  2.1   0:00.00 java                                                                                                                                           
 66901 fescudie  20   0 18.4g 5.3g   9m S  0.0  2.1   0:00.00 java                                                                                                                                           
 66902 fescudie  20   0 18.4g 5.3g   9m S  0.0  2.1   0:00.00 java                                                                                                                                           
 66903 fescudie  20   0 18.4g 5.3g   9m S  0.0  2.1   0:00.00 java                                                                                                                                           
 66904 fescudie  20   0 18.4g 5.3g   9m S  0.0  2.1   0:00.10 java                                                                                                                                           
 66905 fescudie  20   0 18.4g 5.3g   9m S  0.0  2.1   0:00.09 java                                                                                                                                           
 66906 fescudie  20   0 18.4g 5.3g   9m S  0.0  2.1   0:00.00 java                                                                                                                                           
 66907 fescudie  20   0 18.4g 5.3g   9m S  0.0  2.1   0:00.00 java

Command with RDP default databank:

java -Xmx15g -jar path/to/classifier.jar classify -c 0.8 -o result.rdp sub.fasta

Consumption:

top - 09:53:41 up 56 days, 22:39,  0 users,  load average: 9.96, 17.82, 18.93
Tasks: 840 total,  10 running, 830 sleeping,   0 stopped,   0 zombie
Cpu(s): 25.0%us,  0.0%sy,  0.0%ni, 75.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  264438700k total, 78978564k used, 185460136k free,   174216k buffers
Swap: 16777208k total,    36100k used, 16741108k free, 64768832k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                        
 65863 fescudie  20   0 18.4g 703m  10m S 100.1  0.3   1:18.87 java                                                                                                                                          
 65917 fescudie  20   0 13684 1784  880 R  0.3  0.0   0:00.36 top                                                                                                                                            
 65432 fescudie  20   0  104m 1948 1408 S  0.0  0.0   0:00.16 bash

Command with 'Example command to train classifier':

java -Xmx1g -jar path/to/classifier.jar train -o mytrained -s path/to/RDPTools/classifier/samplefiles/new_trainset.fasta -t path/to/RDPTools/classifier/samplefiles/new_trainset_db_taxid.txt
cp path/to/RDPTools/classifier/samplefiles/rRNAClassifier.properties mytrained
java -Xmx15g -jar path/to/classifier.jar classify -c 0.8 -t mytrained/rRNAClassifier.properties -o result.rdp sub.fasta

Consumption:

top - 10:23:54 up 56 days, 23:09,  0 users,  load average: 9.19, 8.95, 10.32
Tasks: 840 total,  10 running, 830 sleeping,   0 stopped,   0 zombie
Cpu(s): 25.5%us,  0.1%sy,  0.0%ni, 74.5%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  264438700k total, 78953232k used, 185485468k free,   174720k buffers
Swap: 16777208k total,    36100k used, 16741108k free, 64773140k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                        
 66617 fescudie  20   0 18.4g 590m  10m S 120.6  0.2   0:29.30 java                                                                                                                                          
 66655 fescudie  20   0 13684 1784  884 R  0.7  0.0   0:00.13 top                                                                                                                                            
 65432 fescudie  20   0  104m 1948 1408 S  0.0  0.0   0:00.30 bash

Thanks in advance.

Problem isntalling classifier

Hi - Hola .
I hope your help, please.

I download classifier with command:
$git clone https://github.com/rdpstaff/classifier.git
$cd classifier #for enter.
$ls #for list:
build build.xml ivy.xml lib LICENSE manifest.mf nbproject README samplefiles src test
and finally for install classifier:
$ant -f build.xml

and have a error in one test:

-do-test-run:
[junit] Testsuite: edu.msu.cme.rdp.classifier.rrnaclassifier.ClassifierTest
[junit] testClassify
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0,072 sec
[junit]
[junit] ------------- Standard Output ---------------
[junit] testClassify
[junit] ------------- ---------------- ---------------
[junit] testGetName
[junit] Testsuite: edu.msu.cme.rdp.classifier.rrnaclassifier.HierarchyTreeTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0,069 sec
[junit]
[junit] ------------- Standard Error -----------------
[junit] testGetName
[junit] ------------- ---------------- ---------------
[junit] Testsuite: edu.msu.cme.rdp.classifier.rrnaclassifier.ParsedSequenceTest
[junit] testGetReversedWord
[junit] testGetWordIndex
[junit] testCreateWordIndexArr
[junit] testGetReversedSeq
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0,077 sec
[junit]
[junit] ------------- Standard Output ---------------
[junit] testGetReversedWord
[junit] testGetWordIndex
[junit] testCreateWordIndexArr
[junit] testGetReversedSeq
[junit] ------------- ---------------- ---------------
[junit] Testsuite: edu.msu.cme.rdp.classifier.rrnaclassifier.TrainingInfoTest
[junit] testCreateTree
[junit] testCreateLogWordPriorArr
[junit] testCreateProbIndexArr
[junit] testCreateClassifier
[junit] testCreateGenusWordConditionalProbList
[junit] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0,565 sec
[junit]
[junit] ------------- Standard Output ---------------
[junit] testCreateTree
[junit] testCreateLogWordPriorArr
[junit] testCreateProbIndexArr
[junit] testCreateClassifier
[junit] testCreateGenusWordConditionalProbList
[junit] ------------- ---------------- ---------------
[junit] Testsuite: edu.msu.cme.rdp.classifier.rrnaclassifier.TreeFileParserTest
[junit] testParseTreeFile
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0,143 sec
[junit]
[junit] ------------- Standard Output ---------------
[junit] testParseTreeFile
[junit] ------------- ---------------- ---------------
[junit] Testsuite: edu.msu.cme.rdp.classifier.train.GoodWordIteratorTest
[junit] testNext
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0,061 sec
[junit]
[junit] ------------- Standard Output ---------------
[junit] testNext
[junit] ------------- ---------------- ---------------
[junit] Testsuite: edu.msu.cme.rdp.classifier.train.RawHierarchyTreeTest
[junit] testInitWordOccurrence()
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0,083 sec
[junit]
[junit] ------------- Standard Output ---------------
[junit] testInitWordOccurrence()
[junit] ------------- ---------------- ---------------
[junit] Testsuite: edu.msu.cme.rdp.classifier.train.RawSequenceParserTest
[junit] testNext()
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0,083 sec
[junit]
[junit] ------------- Standard Output ---------------
[junit] testNext()
[junit] ------------- ---------------- ---------------
[junit] Testsuite: edu.msu.cme.rdp.classifier.train.TreeFactoryTest
[junit] testAddSequence
[junit] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0,139 sec
[junit]
[junit] ------------- Standard Output ---------------
[junit] testAddSequence
[junit] ------------- ---------------- ---------------
[junit] Testcase: testAddSequence(edu.msu.cme.rdp.classifier.train.TreeFactoryTest): FAILED
[junit] null expected:<G[1]> but was:<G[2]>
[junit] junit.framework.ComparisonFailure: null expected:<G[1]> but was:<G[2]>
[junit] at edu.msu.cme.rdp.classifier.train.TreeFactoryTest.testAddSequence(TreeFactoryTest.java:55)
[junit]
[junit]
[junit] Test edu.msu.cme.rdp.classifier.train.TreeFactoryTest FAILED
[junit] Testsuite: edu.msu.cme.rdp.classifier.train.validation.AddLogsTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0,07 sec
[junit]
[junit] Testsuite: edu.msu.cme.rdp.classifier.train.validation.DecisionMakerTest
[junit] testGetBestClasspath
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0,54 sec
[junit]
[junit] ------------- Standard Output ---------------
[junit] testGetBestClasspath
[junit] ------------- ---------------- ---------------
[junit] Testsuite: edu.msu.cme.rdp.classifier.train.validation.GoodWordIteratorTest
[junit] testNext
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0,072 sec
[junit]
[junit] ------------- Standard Output ---------------
[junit] testNext
[junit] ------------- ---------------- ---------------
[junit] Testsuite: edu.msu.cme.rdp.classifier.train.validation.HierarchyTreeTest
[junit] testHideSeq
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0,133 sec
[junit]
[junit] ------------- Standard Output ---------------
[junit] testHideSeq
[junit] ------------- ---------------- ---------------
[junit] Testsuite: edu.msu.cme.rdp.classifier.train.validation.NBClassifierTest
[junit] testassignClass
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0,107 sec
[junit]
[junit] ------------- Standard Output ---------------
[junit] testassignClass
[junit] ------------- ---------------- ---------------
[junit] Testsuite: edu.msu.cme.rdp.classifier.train.validation.SequenceParserTest
[junit] testNext()
[junit] testHasNext
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0,088 sec
[junit]
[junit] ------------- Standard Output ---------------
[junit] testHasNext
[junit] ------------- ---------------- ---------------
[junit] ------------- Standard Error -----------------
[junit] testNext()
[junit] ------------- ---------------- ---------------
[junit] Testsuite: edu.msu.cme.rdp.classifier.train.validation.TreeFactoryTest
[junit] testAddSequence
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0,092 sec
[junit]
[junit] ------------- Standard Output ---------------
[junit] testAddSequence
[junit] ------------- ---------------- ---------------

test-report:

-post-test-run:

BUILD FAILED
/home/orson/RDPTools/classifier/nbproject/build-impl.xml:1304: Some tests failed; see details above.

Total time: 10 seconds

Please. Help me. Thanks

How to map RDP Nomenclature to NCBI Nomenclature?

I am using the standalone RDP classifier to annotate our assemblies. But I find that the RDP nomenclature is different from NCBI nomenclature, such as the NCBI genus Anabaena of phylum Cyanobacteria is named as GpI according to RDP classification. The inconsistence between two nomenclatural systems makes me confuse and cannot determine the identity apparently. Is there any tool can give the mapping between these two nomenclatures?

Thank you very much.

copy number file -c flag

Hello classifier team,
I'm interested to use copy number utility from RDP classifier. I've following as my query:
1- The documentation mentions to provide copy number files with flag -c. However, in -help, -c is for the confidence used for the classification.
Am I misunderstanding something from help on IO screen and your documentation?
Kindly guide.

Training the RDP classifier -c option

RDPstaff,

I am trying to retrain the RDP classifier and have an issue with the -c option. I have already prepped my seq and tax files (end of email) and trained RDP against them.

It output 4 files (below), but none of them is the properties file.

bergeyTrainingTree.xml logWordPrior.txt
genus_wordConditionalProbList.txt wordConditionalProbIndexArr.txt

Do I need to include the -c file to get this? If so, there is no information anywhere on how to generate it that I can find so I was hoping can help. According to the README, "It should at least three columns: name, rank and mean for the lowest rank taxon to be trained". What do you mean by mean in the context of this file? Furthermore, how should I go about generating the whole file?

SEQ FILE

AB353770|AB353770.1.1740_U Root;Eukaryota;Alveolata;Dinoflagellata;Dinophyceae;Peridiniales;Kryptoperidiniaceae;Unruhdinium
ATGCTTGTCTCAAAGATTAAGCCATGCATGTCTCAGTATAAGCTTTTACATGGCGAAACTGCGAATGGCTCATTAAAACAGTTACAGTTTATTTGAAG (cont.)

TAX FILE

0*Root*-1*0*rootrank
1*Eukaryota*0*1*domain
2*Alveolata*1*2*supergroup
3*Dinoflagellata*2*3*division
4*Dinophyceae*3*4*class
5*Peridiniales*4*5*order

Thanks for the help

-Andrew Davis

addFullLineage.py and lineage2taxTrain.py

Hello,

I would like to create my own training data. Could you send me the scripts, lineage2taxTrain.py and addFullLineage.py ? I will really appreciate that. My email address is [email protected]

Thanks !

how to perform leave one out testing using a confidence of 0?

Hi guys, Im trying to perform leave one sequence out testing on the rdp classifier such that i can get all the classifications for all 13212 sequences in the RDP dataset. The output of the leave one sequence out testing only returns the misclassified sequences with their confidences without all levels. How do i get the leave one sequence out testing format to give me all the classifications at all levels with all confidences. So far i've tried to set the confidence like this but it doesnt seem to work :

java -Xmx1g -jar /path/to/classifier.jar loot -q -allrank -c 0.0 samplefiles/Armatimonadetes.fasta -s samplefiles/new_trainset.fasta -t samplefiles/new_trainset_db_taxid.txt -l 400 -o Armatimonadetes_400_loso_test.txt

Can you guys help out?
Thanks

Is there a way to define a different k-mer? What way to chop up the reads?

Hi,

From the RDP classifier paper I read about, it says the word size is 8 (Just to make sure I am understanding it right, the size here should be the length of word, such as ATCTGGTC, right?), which is the optimal, because the other word size of 6,7 or 9 is not accurate enough comparing to size 8 according to preliminary experiments.

- Is there an option for me to pick another word size when I am training my own classifier with a customized database?

Also, I want to know how do you chop up the reads in the database? It says all the words should be non-overlapping, which is to satisfy the assumption for Bayes Rule that all features are independent (correct me if I am understanding it incorrectly). Say I have a sequence in the database:

SeqA: AAAAAAAA TTTTTTTT GGGGGGGG TTTTTTTT

If I chop up from the very first nt, then I should get the 8-size word:

AAAAAAA X1, TTTTTTTT X2, and GGGGGGGG X1, and this will be recorded as the features for this particular genus.

But what if I have a test sequence:

SeqB: ATTTTTTT TGG, clearly you can tell it's a subset from SeqA (I make the subset bold in SeqA), but if I chop up from the very first nt, it won't give me the same feature word as you could get from SeqA. I will get ATTTTTTT, and whatever the leftover: TGG. I am curious, what do you do with the leftover nt? Just throw them away?

- I think I need a little insight about how to chop up the database into kmers, and how you define the features?

I am a beginner in Machine Learning algorithms, and still trying to learn more about RDP classifier. If my understanding is wrong, I am welcome to any suggestion.

Thanks a lot!

Eddi

Exception handling

Hi,

When trying to merge the existing results of the classifier I get the following error:

cmd-> java -Xmx16g -jar /bio_bin/rdp_classifier_2.8/dist/classifier.jar merge-detail \

-o merged_classified.txt
-h merged_classified.hier.txt
-c 0.5
--train_propfile /bioinformatics/bio_db/silva_SSURef_108_tax_silva_trunc/qiime/Silva_108/taxa_mapping/CombinedClassifier/rRNAClassifier.properties
./Corn-Root-P1-MP/Corn-Root-P1-MP.16S18S.univec.rdp ./Corn-Root-P1-Mobio/Corn-Root-P1-Mobio.16S18S.univec.rdp
Exception in thread "main" java.lang.IllegalArgumentException: taxon Node environmental samples in line "M01224:135:000000000-A9TYB:1:1107:21832:7607 Root norank 1.0 Eukaryota Superkingdom 1.0 Fungi Kingdom 0.99 Dikarya Subkingdom 0.99 Basidiomycota Phylum 0.99 environmental samples Genus 0.75" is not found in the original Classifier training data.
at edu.msu.cme.rdp.classifier.rrnaclassifier.ClassificationParser.next(ClassificationParser.java:107)
at edu.msu.cme.rdp.multicompare.MultiClassifier.multiClassificationParser(MultiClassifier.java:252)
at edu.msu.cme.rdp.multicompare.Reprocess.main(Reprocess.java:184)
at edu.msu.cme.rdp.classifier.cli.ClassifierMain.main(ClassifierMain.java:69)

Is there any way the exception handling could be done more gracefully and keep going with the merging, while putting the offending reads in a separate file or just logging that some reads were problematic?
BTW, all the reads were classified with same training set that was supplied on the command line and should NOT be generating an error.

LOOT (by taxon) analysis: calculation error on % of misclassified Seqs?

I ran two LOOTs. When leaving one taxon out, the % misclassified in the last table (**misclassified sequences group by taxon) is always 100%, which is not correct, according to the other tables in the output and according to the LOOT by sequence analysis.

java -Xmx46g -jar classifier.jar loot -h -q MIDORI_UNIQUE_1.1_COI_RDP_.05_seqs.fasta -s MIDORI_UNIQUE_1.1_COI_RDP.fasta -t RDP_taxonomy_file.txt -o midori_leaveonetaxonout_test_0.05.txt

**misclassified sequences group by taxon
Tested Seqs (non-singleton)	misclassified	pct misclassified
26881	26881	1
26881	26881	1
16	16	1
11	11	1
11	11	1
10	10	1
0	0	0
...

java -Xmx46g -jar classifier.jar loot -q MIDORI_UNIQUE_1.1_COI_RDP_.05_seqs.fasta -s MIDORI_UNIQUE_1.1_COI_RDP.fasta -t RDP_taxonomy_file.txt -o midori_leaveoneseqout_test_0.05.txt

**misclassified sequences group by taxon
Tested Seqs (non-singleton)	misclassified	pct misclassified
26881	2363	0.087905956
26881	2363	0.087905956
16	3	0.1875
11	0	0
11	0	0
10	0	0
...

FYI: Trimming to region of interest

Dear Team,

I noticed the following in your instructions for training RDP.

"Based on our experience, trimming the sequences to a specific region does not improve accuracy."

I wanted to inform you that I explicitly tested this some years ago, and in my experience trimming did have a significant impact on assignment accuracy.

Metabarcoding free-living marine nematodes using curated 18S and CO1 reference sequence databases for species-level taxonomic assignments, DOI: 10.1002/ece3.4814

I have only ever used trimmed training sets since then and would advise other users to do the same.

Kind regards,

Lara

Retraining the RDP - Problem with Sequence/Taxonomy File Format

Hi,

My question is about the required format the classifier needs. I created the exact formats that described in tutorials. These are my raw training files:

However, when I try to extract the ready files using the following commands (while changing the appropriate file names):

python lineage2taxTrain.py rawTaxonomy.txt > ready4train_taxonomy.txt
python addFullLineage.py rawTaxonomy.txt rawSeqs.fasta > ready4train_seqs.fasta

It runs, but the output is quite strange. I tried many tweaks and solutions and this is my output ready4train_taxonomy.txt file - I have no idea why there are spaces between every two characters, I added no spaces in the Python scripts:

This is my output ready4train_seqs.fasta sequence file:

The resulting taxonomy file is never something like the following, which is what I normally see should be the output:

Could you kindly guide me to the correct formatting if I am doing something wrong? There are ambiguous characters in the sequences - is this an issue and they should be removed?

Merge classification from different training set

Hi RDP team,

Thank you for this tool. I went through details of merging classification files.
I'm interested to merge classification results from an in-house training set with results obtained by default RDP tool.

Can this can be added in enhancement requests for future release? Or any suggestions on how to go ahead would really be helpful.

Thanks.

rdpstaff / classifier Goto Github PK

classifier's Introduction

classifier's People

Contributors

Stargazers

Watchers

Forkers

classifier's Issues

Recommend Projects

Recommend Topics

Recommend Org