rdpstaff / classifier Goto Github PK
View Code? Open in Web Editor NEWRDP extensible sequence classifier for fungal lsu, bacterial and archaeal 16s
License: GNU General Public License v2.0
RDP extensible sequence classifier for fungal lsu, bacterial and archaeal 16s
License: GNU General Public License v2.0
INTRO The RDP Classifier is a naive Bayesian classifier which was developed to provide rapid taxonomic placement based on rRNA sequence data. The RDP Classifier can rapidly and accurately classify bacterial and archaeal 16s rRNA sequences, and Fungal LSU sequences. It provides taxonomic assignments from domain to genus, with confidence estimates for each assignment. The RDP Classifier likely can be adapted to additional phylogenetically coherent bacterial taxonomies. The online version of RDP Classifier can be found at http://rdp.cme.msu.edu/classifier/classifier.jsp. How to cite Classifier? Wang, Q, G. M. Garrity, J. M. Tiedje, and J. R. Cole. 2007. Naive Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy. Appl Environ Microbiol. 73(16):5261-7. Gene Copy Number Adjustment Classifier provides gene copy number adjustment for 16S gene sequences (see http://rdp.cme.msu.edu/classifier/class_help.jsp#copynumber). The precompiled Classifier was trained with the 16S gene copy number data provided by rrnDB website. The Classifier can be trained with user-provided gene copy number data. See How to Train the Classifier below. Classifier outputs both copy number adjusted and unadjusted assignment counts in the hierarchical output files. BIOM Format The Classifier can take an input minimal (or rich) dense BIOM file as input with an optional Metadata file, and produces a rich dense BIOM file. If an input cluster BIOM file ( version 1.0) is provided, along with the representative sequences input (output from Clustering rep-seqs subcommand with "-c" option to make sure rep seq Ids match the cluster Ids in the BIOM file, see http://rdp.cme.msu.edu/tutorials/stats/RDPtutorial_statistics.html), the classification result of each sequence will replace the taxonomy of the corresponding cluster. If a metadata file is provided, the information will replace the metadata of the corresponding sample. The resulting rich dense BIOM file can be used by thirdparty tools such as phyloseq or QIIME etc. In order to use BIOM files as input, the format must be specified in the command line with "-f biom". Then, the biom file is specified with the command "-m /path/to/biom_file.biom". Including a Metadata file is optional and can be included by using the command "-d /path/to/metadata.txt". QUICKSTART ant jar Some commands in this tutorial depend on RDP Clustering. See RDPTools (https://github.com/rdpstaff/RDPTools) to install. USAGE There are many subcommands offered by the Classifier package. The default subcommand is classify. java -Xmx1g -jar /path/to/classifier.jar USAGE: ClassifierMain <subcommand> <subcommand args ...> classify - classify one or multiple samples crossvalidate - cross validate accuracy testing libcompare - compare two samples loot - leave one (sequence or taxon) out accuracy testing merge-detail - merge classification detail result files to create a taxon assignment counts file merge-count - merge multiple taxon assignment count files to into one count file random-sample - random select a subset or subregion of sequences rm-dupseq - remove identical or any sequence contained by another sequence rm-partialseq - remove partial sequences taxa-sim - calculate and plot the similarities within taxa train - retrain classifier 1. Classify one or more samples usage: [options] <samplefile>[,idmappingfile] ... -c,--conf <arg> assignment confidence cutoff used to determine the assignment count for each taxon. Range [0-1], Default is 0.8. -f,--format <arg> tab-delimited output format: [allrank|fixrank|biom|filterbyconf|db]. Default is allRank. allrank: outputs the results for all ranks applied for each sequence: seqname, orientation, taxon name, rank, conf, ... fixrank: only outputs the results for fixed ranks in order: domain, phylum, class, order, family, genus biom: outputs rich dense biom format if OTU or metadata provided filterbyconf: only outputs the results for major ranks as in fixrank, results below the confidence cutoff were bin to a higher rank unclassified_node db: outputs the seqname, trainset_no, tax_id, conf. -g,--gene <arg> 16srrna, fungallsu, fungalits_warcup, fungalits_unite. Default is 16srrna. This option can be overwritten by -t option -h,--hier_outfile <arg> tab-delimited output file containing the assignment count for each taxon in the hierarchical format. Default is null. -o,--outputFile <arg> tab-delimited text output file for classification assignment. -q,--queryFile legacy option, no longer needed -t,--train_propfile <arg> property file containing the mapping of the training files if not using the default. Note: the training files and the property file should be in the same directory. -w,--minWords <arg> minimum number of words for each bootstrap trial. Default(maximum) is 1/8 of the words of each sequence. Minimum is 5 [Example command to classify sequences ]: java -Xmx1g -jar /path/to/classifier.jar classify -c 0.5 -o usga_classified.txt -h soil_hier.txt samplefiles/USGA_2_4_B_trimmed.fasta To speedup classification when large number of duplicate sequences exist in the inputs, you can dereplicate the input files first and use both the unique sequence fasta and idmapping file as input. The classification assignment output only contains the results of the unique sequences, the assignment counts in the hier_out_file are expanded to reflect the full sets. The hier_outfile can be imported to Excel to make plots, or loaded into R program as a data matrix. [Example command to classify sequences with idmapping]: java -jar /path/to/Clustering.jar derep -u -o Native_1_4_A_derep.fasta Native_1_4_A.ids Native_1_4_A.sample samplefiles/Native_1_4_A_trimmed.fasta java -Xmx1g -jar /path/to/classifier.jar classify -c 0.5 -o native_classified.txt -h soil_hier.txt Native_1_4_A_derep.fasta,Native_1_4_A.ids java -Xmx1g -jar /path/to/classifier.jar classify -c 0.5 -f fixrank -o soil_classified.txt -h soil_hier.txt Native_1_4_A_derep.fasta,Native_1_4_A.ids samplefiles/USGA_2_4_B_trimmed.fasta Notes: The bootstrap assignment strategy has been changed to avoid over-predication problem when multiple genera are tied for highest score occurred during bootstrap trials. This happens when every sequence in multiple genera (say N) contains the same partial sequence. One of the genera will be randomly chosen from the list of N genera with the highest tie score. If the tie score occurred during the genus assignment deterministic step, the first genus will be chosen. In this way, the genus assignment will remain deterministic but the bootstrap score will be close to 1/N . By default, the Classifier output the results for all ranks applied for each sequence. Some users found the format "fixrank" useful to load into third party analysis tools. When "fixrank" is specified, the Classifier outputs the results in a fixed rank order as described above. In case of missing ranks in the lineage, the bootstrap value and the taxon name from the immediate lower rank will be reported. This eliminates the gaps in the lineage, but also introduces non-existing taxon name and rank. User should interpret the "fixrank" results with caution. By default the Classifier chooses a subset of 1/8 of all the possible overlapping words from the query sequence for each bootstrap trial. The Classifier uses the minWords if the minWords is larger than 1/8 of words. Choosing more words helps gaining higher bootstrap values for short query sequence. Using larger "minWords" will increase the run time since the run time is proportional to the number and the length of the query sequences. 2. Compare two samples This command combines classification with a statistical test to flag taxa differing significantly between libraries. [Example command from a terminal]: java -Xmx1g -jar /path/to/classifier.jar libcompare -q1 samplefiles/Native_1_4_A_trimmed.fasta -q2 samplefiles/USGA_2_4_B_trimmed.fasta -c 0.5 -o libcompare.txt 3. Merge classification results If you have classified samples at different time using the same training set, you can use this command to merge the classification results and reproduce the hier_outfile with the assignment counts from all the samples from the input. Each input classification result is treated as one sample. Note taxon and rank filter options only affect the assignment output, not the hier_outfile. [Example command to merge two classification results]: java -Xmx1g -jar /path/to/classifier.jar merge-detail -h soil_hier.txt -o merged_classified.txt native_classified.txt,Native_1_4_A.ids usga_classified.txt [Example command to merge two classification results, filter the classification output by confidence ]: java -Xmx1g -jar /path/to/classifier.jar merge-detail -h soil_hier.txt -o merged_classified.txt -f filterbyconf -c 0.5 native_classified.txt,Native_1_4_A.ids usga_classified.txt [Example command to merge two classification results, only output classification results assigned to Alphaproteobacteria and confidence at family >= 0.5 ]: create a file taxonFilter.txt containing Alphaproteobacteria. java -Xmx1g -jar /path/to/classifier.jar merge-detail -h soil_hier.txt -o merged_classified.txt -n taxonFilter.txt -c 0.5 -r family native_classified.txt,Native_1_4_A.ids usga_classified.txt 4. Merge assignment count results If you have classified samples at different time using the same training set, you can merger multiple assignment count results into one assignment count file, keeping one column for each unique sample. If same sample occurred more than once, the taxon counts for this sample will be combined. [Example command to merge three assignment count results]: java -Xmx1g -jar /path/to/classifier.jar merge-count merged_hier.txt sampleset1_hier.txt sampleset2_hier.txt sampleset3_hier.txt 5. How to Train the Classifier a. Follow these steps when there is a need to retrain Classifier, such as novel lineages, newly named type organisms, taxonomic rearrangements, better training set covering specific taxa, or alternative taxonomy. Two files, a taxonomy file and a training sequence file with lineage are required. Prefer high quality, full length sequences, or at least covering the entire region of gene of interest. See samplefiles for example data files. The 16S rRNA training data and Fungal LSU training data can be download from http://sourceforge.net/projects/rdp-classifier/?source=directory. Based on our experience, trimming the sequences to a specific region does not improve accuracy. The ranks are not required to be uniform neither, which means you can define any number of ranks as necessary. The speed of the Classifier is proportional to the number of genera, not the number of training sequences. b. Clean-up partial or duplicate training sequences to avoid inflated results in classification performance testing. Use subcommand "rm-dupseq" to remove identical sequences or any sequence contained by another sequence. Use subcommand "rm-partialseq" to remove partial sequences based on pairwise alignment to near-full length reference sequences. c. Plot intra taxon Similarity by fraction of matching 8-mer Use subcommand "taxa-sim" to calculate and plot intra taxon Similarity by fraction of matching 8-mer (see example plots using fungal ITS training sets on RDP's poster http://rdp.cme.msu.edu/download/posters/MSA2014_RDP.pdf). To run taxa-sim in Headless mode without GUI display, use the following options: java -Djava.awt.headless=true -jar classifier.jar taxa-sim d. Estimate the accuracy of your own training data using leave-one-out testing The program will output a tab-delimited test result file which can be loaded to Excel and plot the accuracy rates. It also contains the list of misclassified sequences and the rank when misclassified seqs group by taxon. Examine the result careful to spot errors in the taxonomy. Leave-one-sequence-out testing: each iteration one sequence from the training set was chosen as a test sequence. That sequence was removed from training set. The assignment of the sequence produced by the Classifier was compared to the original taxonomy label to measure the accuracy of the Classifier. [Example command with length of 400]: java -Xmx1g -jar /path/to/classifier.jar loot -q samplefiles/Armatimonadetes.fasta -s samplefiles/new_trainset.fasta -t samplefiles/new_trainset_db_taxid.txt -l 400 -o Armatimonadetes_400_loso_test.txt Leave-one-taxon-out testing: similar to the leave-one-sequence-out testing except for each test sequence, the lowest taxon that sequence assigned to (either species or genus node) was removed from the training set. This is intended to test if the species or genus is no present in the training set, how likely the Classifier can assign the sequence to the correct genus or higher taxa. [Example command ]: java -Xmx1g -jar /path/to/classifier.jar loot -h -q samplefiles/Armatimonadetes.fasta -s samplefiles/new_trainset.fasta -t samplefiles/new_trainset_db_taxid.txt -o Armatimonadetes_loto_test.txt e. Cross validate testing This comand performs a random sub-sampling validation. It calculate the values of (1-specificity) and sensitivity for each rank at each bootstrap cutoff. [Example command to do cross validation testing with length of 400]: java -Xmx1g -jar /path/to/classifier.jar crossvalidate -o crossvalidate_400.txt -s samplefiles/new_trainset.fasta -t samplefiles/new_trainset_db_taxid.txt -l 400 f. Train classifer If you are satisfied with the testing results, go ahead to train the classifier. [Example command to train classifier]: java -Xmx1g -jar /path/to/classifier.jar train -o mytrained -s samplefiles/new_trainset.fasta -t samplefiles/new_trainset_db_taxid.txt -c gene_copynumber.txt cp samplefiles/rRNAClassifier.properties mytrained/ "-c" specify the gene copy number file. It should at least three columns: name, rank and mean for the lowest rank taxon to be trained. See example file samplefiles/gene_copynumber.txt g. Classify sequences using the new model [Example command to classify with the new model using "-t" option]: java -Xmx1g -jar /path/to/classifier.jar classify -t mytrained/rRNAClassifier.properties -o Armatimonadetes_classified.txt samplefiles/Armatimonadetes.fasta
Hi,
Thank you for reading this! I've been using the online RDP classifier for taxonomy classification, but I can't open the website recently. The error message is:
This site can’t be reached
http://rdp.cme.msu.edu/classifier/classifier.jsp is unreachable.
ERR_ADDRESS_UNREACHABLE
Thank you in advance for the help!
Hi there
We have been struggling a while with errors when running taxa-sim and loot as subcommands in classifier with our own created database and tax file (using the methods of [https://github.com/iimog/meta-barcoding-dual-indexing]).
The most recent error we are getting is:
Exception in thread "main" java.lang.IllegalArgumentException:
The taxID for ancestor 'f__undef__27' of sequence '27526738' at depth '5' with parent id '272' is not found!
at edu.msu.cme.rdp.classifier.train.validation.TreeFactory.getTaxonomy(TreeFactory.java:213)
at edu.msu.cme.rdp.classifier.train.validation.TreeFactory.addSequence(TreeFactory.java:167)
at edu.msu.cme.rdp.classifier.train.validation.TreeFactory.addSequence(TreeFactory.java:149)
at edu.msu.cme.rdp.classifier.train.validation.leaveoneout.LeaveOneOutTesterMain.createTree(LeaveOneOutTesterMain.java:109)
at edu.msu.cme.rdp.classifier.train.validation.leaveoneout.LeaveOneOutTesterMain.(LeaveOneOutTesterMain.java:79)
at edu.msu.cme.rdp.classifier.train.validation.leaveoneout.LeaveOneOutTesterMain.main(LeaveOneOutTesterMain.java:186)
at edu.msu.cme.rdp.classifier.cli.ClassifierMain.main(ClassifierMain.java:75)
There are no weird characters like "*" or anything in the file (saw it was a problem with someone else), everything seems fine at first glance. I am not a bioinformaticist though and our senior bioinformaticist is swamped. What could this error be indicating?
Will it be helpful to share a shortened version of the sequence file and the tax file? Or the full files?
Any help will be greatly appreciated, we are at wits' end.
Thanks a lot,
Annie
Hello,
http://rdp.cme.msu.edu/download/rdpclassifiertraindata/data.tgz appears to be permanently offline. is there another location training data can be acquired from?
Hi rdp staff,
I am trying to retrain RDP classifier using NCBI 16s database, however, when I looked into the example taxonomy file and the fasta file, I am a bit confused how should I even generate that file.
0*Root*-1*0*rootrank
1*Bacteria*0*1*domain
2*"Actinobacteria"*1*2*phylum
3*Actinobacteria*2*3*class
4*Acidimicrobidae*3*4*subclass
5*Acidimicrobiales*4*5*order
6*"Acidimicrobineae"*5*6*suborder
7*Acidimicrobiaceae*6*7*family
8*Acidimicrobium*7*8*genus
9*Ferrimicrobium*7*8*genus
10*Ferrithrix*7*8*genus
11*Ilumatobacter*7*8*genus
12*Iamiaceae*6*7*family
3102*Aquihabitans*12*8*genus
13*Iamia*12*8*genus
Could you please explain how each line is constructed? Allow me to take a line as an example,
6*"Acidimicrobineae"*5*6*suborder
I could guess that the first number is the taxonomy id for Acidimicrobineae, which is 6, and its parent taxonomy is 5, Acidimicrobiales. I assume that the suborder at the end of the line indicates that Acidimicrobineae is at the taxonomy rank of suborder, right? Then what is the 6 before suborder mean? when I look at 12*Iamiaceae*6*7*family
, I can say Iamiaceae is a family level taxonomy, which has the parent of 6 (Acidimicrobineae) and 7 (Acidimicrobiaceae)? I am not sure I am getting what's the rule of constructing the taxonomy file here. Could you please explain how this is done?
Thanks in advance,
Eddi
Hi,
I just cloned the repository and typed ant jar
as suggested in the README. It didn't build. Here is the error message:
lucass@milou2 share $ git clone https://github.com/rdpstaff/classifier.git
Initialized empty Git repository in /pica/h1/lucass/share/classifier/.git/
remote: Reusing existing pack: 304, done.
remote: Total 304 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (304/304), 839.86 KiB | 629 KiB/s, done.
Resolving deltas: 100% (99/99), done.
lucass@milou2 share $ mv classifier/ rdp_classifier
lucass@milou2 share $ cd rdp_classifier/
lucass@milou2 rdp_classifier (master) $ ant jar
Buildfile: /pica/h1/lucass/share/rdp_classifier/build.xml
download-ivy:
[mkdir] Created dir: /home/lucass/.ant/lib
[get] Getting: http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0-rc2/ivy-2.1.0-rc2.jar
[get] To: /home/lucass/.ant/lib/ivy.jar
init-ivy:
resolve:
[ivy:retrieve] :: Ivy 2.1.0-rc2 - 20090704004254 :: http://ant.apache.org/ivy/ ::
[ivy:retrieve] :: loading settings :: url = jar:file:/home/lucass/.ant/lib/ivy.jar!/org/apache/ivy/core/settings/ivysettings.xml
[ivy:retrieve] :: resolving dependencies :: edu.cme.rdp#classifier;[email protected]
[ivy:retrieve] confs: [default]
[ivy:retrieve] found commons-cli#commons-cli;1.2 in public
[ivy:retrieve] found commons-io#commons-io;2.4 in public
[ivy:retrieve] found junit#junit;4.8.2 in public
[ivy:retrieve] downloading http://repo1.maven.org/maven2/commons-cli/commons-cli/1.2/commons-cli-1.2-sources.jar ...
[ivy:retrieve] ...... (47kB)
[ivy:retrieve] .. (0kB)
[ivy:retrieve] [SUCCESSFUL ] commons-cli#commons-cli;1.2!commons-cli.jar(source) (403ms)
[ivy:retrieve] downloading http://repo1.maven.org/maven2/commons-cli/commons-cli/1.2/commons-cli-1.2.jar ...
[ivy:retrieve] .... (40kB)
[ivy:retrieve] .. (0kB)
[ivy:retrieve] [SUCCESSFUL ] commons-cli#commons-cli;1.2!commons-cli.jar (400ms)
[ivy:retrieve] downloading http://repo1.maven.org/maven2/commons-cli/commons-cli/1.2/commons-cli-1.2-javadoc.jar ...
[ivy:retrieve] ............................... (209kB)
[ivy:retrieve] .. (0kB)
[ivy:retrieve] [SUCCESSFUL ] commons-cli#commons-cli;1.2!commons-cli.jar(javadoc) (474ms)
[ivy:retrieve] downloading http://repo1.maven.org/maven2/commons-io/commons-io/2.4/commons-io-2.4.jar ...
[ivy:retrieve] .......... (180kB)
[ivy:retrieve] .. (0kB)
[ivy:retrieve] [SUCCESSFUL ] commons-io#commons-io;2.4!commons-io.jar (430ms)
[ivy:retrieve] downloading http://repo1.maven.org/maven2/commons-io/commons-io/2.4/commons-io-2.4-sources.jar ...
[ivy:retrieve] ................... (240kB)
[ivy:retrieve] .. (0kB)
[ivy:retrieve] [SUCCESSFUL ] commons-io#commons-io;2.4!commons-io.jar(source) (455ms)
[ivy:retrieve] downloading http://repo1.maven.org/maven2/commons-io/commons-io/2.4/commons-io-2.4-javadoc.jar ...
[ivy:retrieve] ....................................................... (707kB)
[ivy:retrieve] .. (0kB)
[ivy:retrieve] [SUCCESSFUL ] commons-io#commons-io;2.4!commons-io.jar(javadoc) (514ms)
[ivy:retrieve] downloading http://repo1.maven.org/maven2/junit/junit/4.8.2/junit-4.8.2-javadoc.jar ...
[ivy:retrieve] ................. (387kB)
[ivy:retrieve] .. (0kB)
[ivy:retrieve] [SUCCESSFUL ] junit#junit;4.8.2!junit.jar(javadoc) (429ms)
[ivy:retrieve] downloading http://repo1.maven.org/maven2/junit/junit/4.8.2/junit-4.8.2-sources.jar ...
[ivy:retrieve] .... (143kB)
[ivy:retrieve] .. (0kB)
[ivy:retrieve] [SUCCESSFUL ] junit#junit;4.8.2!junit.jar(source) (353ms)
[ivy:retrieve] downloading http://repo1.maven.org/maven2/junit/junit/4.8.2/junit-4.8.2.jar ...
[ivy:retrieve] ........ (231kB)
[ivy:retrieve] .. (0kB)
[ivy:retrieve] [SUCCESSFUL ] junit#junit;4.8.2!junit.jar (403ms)
[ivy:retrieve] :: resolution report :: resolve 6344ms :: artifacts dl 3887ms
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 3 | 3 | 3 | 0 || 9 | 9 |
---------------------------------------------------------------------
[ivy:retrieve] :: retrieving :: edu.cme.rdp#classifier
[ivy:retrieve] confs: [default]
[ivy:retrieve] 9 artifacts copied, 0 already retrieved (2188kB/52ms)
-pre-init:
-init-private:
-init-user:
-init-project:
-init-macrodef-property:
-do-init:
-post-init:
-init-check:
-init-ap-cmdline-properties:
-init-macrodef-javac-with-processors:
-init-macrodef-javac-without-processors:
-init-macrodef-javac:
-init-macrodef-junit:
-init-debug-args:
-init-macrodef-nbjpda:
-init-macrodef-debug:
-init-macrodef-java:
-init-presetdef-jar:
-init-ap-cmdline-supported:
-init-ap-cmdline:
init:
-deps-jar-init:
deps-jar:
[mkdir] Created dir: /pica/h1/lucass/share/rdp_classifier/build
-warn-already-built-jar:
[propertyfile] Updating property file: /pica/h1/lucass/share/rdp_classifier/build/built-jar.properties
-check-call-dep:
-maybe-call-dep:
BUILD FAILED
/pica/h1/lucass/share/rdp_classifier/nbproject/build-impl.xml:581: The following error occurred while executing this line:
/pica/h1/lucass/share/rdp_classifier/nbproject/build-impl.xml:1074: The following error occurred while executing this line:
java.io.FileNotFoundException: /pica/h1/lucass/share/ReadSeq/build.xml (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at org.apache.tools.ant.helper.ProjectHelper2.parse(ProjectHelper2.java:268)
at org.apache.tools.ant.helper.ProjectHelper2.parse(ProjectHelper2.java:177)
at org.apache.tools.ant.ProjectHelper.configureProject(ProjectHelper.java:82)
at org.apache.tools.ant.taskdefs.Ant.execute(Ant.java:393)
at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
at org.apache.tools.ant.Task.perform(Task.java:348)
at org.apache.tools.ant.Target.execute(Target.java:390)
at org.apache.tools.ant.Target.performTasks(Target.java:411)
at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1397)
at org.apache.tools.ant.helper.SingleCheckExecutor.executeTargets(SingleCheckExecutor.java:38)
at org.apache.tools.ant.Project.executeTargets(Project.java:1249)
at org.apache.tools.ant.taskdefs.Ant.execute(Ant.java:442)
at org.apache.tools.ant.taskdefs.CallTarget.execute(CallTarget.java:105)
at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
at org.apache.tools.ant.Task.perform(Task.java:348)
at org.apache.tools.ant.Target.execute(Target.java:390)
at org.apache.tools.ant.Target.performTasks(Target.java:411)
at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1397)
at org.apache.tools.ant.Project.executeTarget(Project.java:1366)
at org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.java:41)
at org.apache.tools.ant.Project.executeTargets(Project.java:1249)
at org.apache.tools.ant.Main.runBuild(Main.java:801)
at org.apache.tools.ant.Main.startAnt(Main.java:218)
at org.apache.tools.ant.launch.Launcher.run(Launcher.java:280)
at org.apache.tools.ant.launch.Launcher.main(Launcher.java:109)
Total time: 12 seconds
OK so I figured out that it needed two dependencies ("ReadSeq" and "TaxonomyTree") but now it is complaining it can't find a file in a directory that doesn't exist. Did someone hard code a path ?
download-traindata:
[get] Getting: http://rdp.cme.msu.edu/download/rdpclassifiertraindata/data.tgz
[get] To: /pica/h1/lucass/share/rdp/classifier/build/classes/data.tgz
[untar] Expanding: /pica/h1/lucass/share/rdp/classifier/build/classes/data.tgz into /pica/h1/lucass/share/rdp/classifier/build/classes
[move] Moving 1 file to /pica/h1/lucass/share/rdp/classifier/dist
jar:
BUILD FAILED
/pica/h1/lucass/share/rdp/classifier/build.xml:131: Warning: Could not find resource file "/scratch/jars/commons-io.jar" to copy.
Total time: 44 seconds
It is not clear to me which I should be using: the sourceforge version, https://sourceforge.net/projects/rdp-classifier/ which is v2.12, Last Update: 2016-07-12, or the version here on this github repo which is v2.10.2 and last update is 2015-09-15. The one at sourceforge looks like it is the more recent one.
I would appreciate some help on this. :-)
Thanks,
Glen
Hi RDP Team,
I have run into an issue while training the classifier with a custom dataset. I get an error similar to the followng :
Exception in thread "main" java.lang.IllegalArgumentException: Sequence GAXI01005455.1.1233 has different lowest rank: L_7 from the previous lowest rank: L_11
Any idea why this happens?
Edit : to elaborate, Im trying to train the RDP classifier with the SILVA v128 SSU Ref99 (available here : https://www.arb-silva.de/fileadmin/silva_databases/release_128/Exports/SILVA_128_SSURef_Nr99_tax_silva.fasta.gz). I rebuilt a taxonomy file from scratch using the lineage2taxTrain.py script.
I have been trying to make the RDPtools, however I keep running into make issues. I am following the instructions on the README file, I am making using super user prevligies (SU). Below is the output. Kindly help me figure this out.
ant -f Clustering/build.xml jar
Buildfile: /opt/RDPTools/Clustering/build.xml
download-ivy:
[get] Getting: http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0-rc2/ivy-2.1.0-rc2.jar
[get] To: /root/.ant/lib/ivy.jar
[get] Not modified - so not downloaded
init-ivy:
resolve:
[ivy:retrieve] :: Ivy 2.1.0-rc2 - 20090704004254 :: http://ant.apache.org/ivy/ ::
[ivy:retrieve] :: loading settings :: url = jar:file:/root/.ant/lib/ivy.jar!/org/apache/ivy/core/settings/ivysettings.xml
[ivy:retrieve] :: resolving dependencies :: edu.cme.rdp#clustering;[email protected]
[ivy:retrieve] confs: [default]
[ivy:retrieve] found com.sun.xml.bind#jaxb-impl;2.2.7 in public
[ivy:retrieve] found com.sun.xml.bind#jaxb-core;2.2.7 in public
[ivy:retrieve] found javax.xml.bind#jaxb-api;2.2.7 in public
[ivy:retrieve] found com.sun.istack#istack-commons-runtime;2.16 in public
[ivy:retrieve] found com.sun.xml.fastinfoset#FastInfoset;1.2.12 in public
[ivy:retrieve] found javax.xml.bind#jsr173_api;1.0 in public
[ivy:retrieve] found commons-cli#commons-cli;1.2 in public
[ivy:retrieve] found junit#junit;4.8.2 in public
[ivy:retrieve] found commons-codec#commons-codec;1.8 in public
[ivy:retrieve] found commons-io#commons-io;2.4 in public
[ivy:retrieve] :: resolution report :: resolve 253ms :: artifacts dl 21ms
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 10 | 0 | 0 | 0 || 20 | 0 |
---------------------------------------------------------------------
[ivy:retrieve] :: retrieving :: edu.cme.rdp#clustering
[ivy:retrieve] confs: [default]
[ivy:retrieve] 0 artifacts copied, 20 already retrieved (0kB/10ms)
-pre-init:
-init-private:
-init-user:
-init-project:
-init-macrodef-property:
-do-init:
-post-init:
-init-check:
-init-ap-cmdline-properties:
-init-macrodef-javac-with-processors:
-init-macrodef-javac-without-processors:
-init-macrodef-javac:
-init-macrodef-test-impl:
-init-macrodef-junit-init:
-init-macrodef-junit-single:
-init-test-properties:
-init-macrodef-junit-batch:
-init-macrodef-junit:
-init-macrodef-junit-impl:
Trying to override old definition of task http://www.netbeans.org/ns/j2se-project/3:test-impl
-init-macrodef-testng:
-init-macrodef-testng-impl:
-init-macrodef-test:
-init-macrodef-junit-debug:
-init-macrodef-junit-debug-batch:
-init-macrodef-junit-debug-impl:
-init-macrodef-test-debug-junit:
-init-macrodef-testng-debug:
-init-macrodef-testng-debug-impl:
-init-macrodef-test-debug-testng:
-init-macrodef-test-debug:
-init-debug-args:
-init-macrodef-nbjpda:
-init-macrodef-debug:
-init-macrodef-java:
-init-presetdef-jar:
-init-ap-cmdline-supported:
-init-ap-cmdline:
init:
-deps-jar-init:
[delete] Deleting: /opt/RDPTools/Clustering/build/built-jar.properties
deps-jar:
-warn-already-built-jar:
[propertyfile] Updating property file: /opt/RDPTools/Clustering/build/built-jar.properties
-check-call-dep:
-maybe-call-dep:
download-ivy:
[get] Getting: http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0-rc2/ivy-2.1.0-rc2.jar
[get] To: /root/.ant/lib/ivy.jar
[get] Not modified - so not downloaded
init-ivy:
resolve:
[ivy:retrieve] :: loading settings :: url = jar:file:/root/.ant/lib/ivy.jar!/org/apache/ivy/core/settings/ivysettings.xml
[ivy:retrieve] :: resolving dependencies :: edu.cme.rdp#AlignmentTools;[email protected]
[ivy:retrieve] confs: [default]
[ivy:retrieve] found commons-lang#commons-lang;2.6 in public
[ivy:retrieve] found commons-cli#commons-cli;1.2 in public
[ivy:retrieve] :: resolution report :: resolve 31ms :: artifacts dl 4ms
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 2 | 0 | 0 | 0 || 6 | 0 |
---------------------------------------------------------------------
[ivy:retrieve] :: retrieving :: edu.cme.rdp#AlignmentTools
[ivy:retrieve] confs: [default]
[ivy:retrieve] 0 artifacts copied, 6 already retrieved (0kB/3ms)
-pre-init:
-init-private:
-init-user:
-init-project:
-init-macrodef-property:
-do-init:
-post-init:
-init-check:
-init-ap-cmdline-properties:
-init-macrodef-javac-with-processors:
-init-macrodef-javac-without-processors:
Trying to override old definition of task http://www.netbeans.org/ns/j2se-project/3:javac
-init-macrodef-javac:
Trying to override old definition of task http://www.netbeans.org/ns/j2se-project/3:depend
-init-macrodef-test-impl:
Trying to override old definition of task http://www.netbeans.org/ns/j2se-project/3:test-impl
-init-macrodef-junit-init:
-init-macrodef-junit-single:
-init-test-properties:
-init-macrodef-junit-batch:
-init-macrodef-junit:
-init-macrodef-junit-impl:
-init-macrodef-testng:
-init-macrodef-testng-impl:
-init-macrodef-test:
-init-macrodef-junit-debug:
-init-macrodef-junit-debug-batch:
-init-macrodef-junit-debug-impl:
-init-macrodef-test-debug-junit:
-init-macrodef-testng-debug:
-init-macrodef-testng-debug-impl:
-init-macrodef-test-debug-testng:
-init-macrodef-test-debug:
-init-debug-args:
-init-macrodef-nbjpda:
Trying to override old definition of task http://www.netbeans.org/ns/j2se-project/1:nbjpdastart
-init-macrodef-debug:
Trying to override old definition of task http://www.netbeans.org/ns/j2se-project/3:debug
-init-macrodef-java:
Trying to override old definition of task http://www.netbeans.org/ns/j2se-project/1:java
-init-presetdef-jar:
-init-ap-cmdline-supported:
-init-ap-cmdline:
init:
-deps-jar-init:
deps-jar:
-warn-already-built-jar:
[propertyfile] Updating property file: /opt/RDPTools/Clustering/build/built-jar.properties
-check-call-dep:
-maybe-call-dep:
download-ivy:
[get] Getting: http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0-rc2/ivy-2.1.0-rc2.jar
[get] To: /root/.ant/lib/ivy.jar
[get] Not modified - so not downloaded
init-ivy:
resolve:
[ivy:retrieve] :: loading settings :: url = jar:file:/root/.ant/lib/ivy.jar!/org/apache/ivy/core/settings/ivysettings.xml
[ivy:retrieve] :: resolving dependencies :: edu.cme.rdp#ReadSeq;[email protected]
[ivy:retrieve] confs: [default]
[ivy:retrieve] found commons-lang#commons-lang;2.6 in public
[ivy:retrieve] found commons-cli#commons-cli;1.2 in public
[ivy:retrieve] found commons-io#commons-io;2.4 in public
[ivy:retrieve] :: resolution report :: resolve 39ms :: artifacts dl 6ms
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 3 | 0 | 0 | 0 || 9 | 0 |
---------------------------------------------------------------------
[ivy:retrieve] :: retrieving :: edu.cme.rdp#ReadSeq
[ivy:retrieve] confs: [default]
[ivy:retrieve] 0 artifacts copied, 9 already retrieved (0kB/3ms)
-pre-init:
-init-private:
-init-user:
-init-project:
-init-macrodef-property:
-do-init:
-post-init:
-init-check:
-init-ap-cmdline-properties:
-init-macrodef-javac-with-processors:
-init-macrodef-javac-without-processors:
Trying to override old definition of task http://www.netbeans.org/ns/j2se-project/3:javac
-init-macrodef-javac:
Trying to override old definition of task http://www.netbeans.org/ns/j2se-project/3:depend
-init-macrodef-test-impl:
-init-macrodef-junit-init:
-init-macrodef-junit-single:
-init-test-properties:
-init-macrodef-junit-batch:
-init-macrodef-junit:
-init-macrodef-junit-impl:
-init-macrodef-testng:
-init-macrodef-testng-impl:
-init-macrodef-test:
-init-macrodef-junit-debug:
-init-macrodef-junit-debug-batch:
-init-macrodef-junit-debug-impl:
-init-macrodef-test-debug-junit:
-init-macrodef-testng-debug:
-init-macrodef-testng-debug-impl:
-init-macrodef-test-debug-testng:
-init-macrodef-test-debug:
-init-debug-args:
-init-macrodef-nbjpda:
Trying to override old definition of task http://www.netbeans.org/ns/j2se-project/1:nbjpdastart
-init-macrodef-debug:
Trying to override old definition of task http://www.netbeans.org/ns/j2se-project/3:debug
-init-macrodef-java:
Trying to override old definition of task http://www.netbeans.org/ns/j2se-project/1:java
-init-presetdef-jar:
-init-ap-cmdline-supported:
-init-ap-cmdline:
init:
-deps-jar-init:
deps-jar:
-warn-already-built-jar:
[propertyfile] Updating property file: /opt/RDPTools/Clustering/build/built-jar.properties
-check-automatic-build:
-clean-after-automatic-build:
-verify-automatic-build:
-pre-pre-compile:
-pre-compile:
-copy-persistence-xml:
-compile-depend:
-do-compile:
-post-compile:
compile:
-pre-jar:
-post-jar:
jar:
[echo] To run this application from the command line without Ant, try:
[echo] java -jar "/opt/RDPTools/ReadSeq/dist/ReadSeq.jar"
-check-automatic-build:
-clean-after-automatic-build:
-verify-automatic-build:
-pre-pre-compile:
-pre-compile:
-copy-persistence-xml:
-compile-depend:
-do-compile:
-post-compile:
compile:
-pre-jar:
-post-jar:
jar:
[echo] To run this application from the command line without Ant, try:
[echo] java -jar "/opt/RDPTools/AlignmentTools/dist/AlignmentTools.jar"
-check-call-dep:
-maybe-call-dep:
-check-call-dep:
-maybe-call-dep:
download-ivy:
[get] Getting: http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0-rc2/ivy-2.1.0-rc2.jar
[get] To: /root/.ant/lib/ivy.jar
[get] Not modified - so not downloaded
init-ivy:
resolve:
[ivy:retrieve] :: loading settings :: url = jar:file:/root/.ant/lib/ivy.jar!/org/apache/ivy/core/settings/ivysettings.xml
[ivy:retrieve] :: resolving dependencies :: edu.cme.rdp#SeqFilters;[email protected]
[ivy:retrieve] confs: [default]
[ivy:retrieve] found commons-lang#commons-lang;2.6 in public
[ivy:retrieve] found commons-cli#commons-cli;1.2 in public
[ivy:retrieve] found jfree#jfreechart;1.0.13 in public
[ivy:retrieve] found jfree#jcommon;1.0.16 in public
[ivy:retrieve] :: resolution report :: resolve 46ms :: artifacts dl 6ms
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 4 | 0 | 0 | 0 || 10 | 0 |
---------------------------------------------------------------------
[ivy:retrieve] :: retrieving :: edu.cme.rdp#SeqFilters
[ivy:retrieve] confs: [default]
[ivy:retrieve] 0 artifacts copied, 10 already retrieved (0kB/3ms)
-pre-init:
-init-private:
-init-user:
-init-project:
-init-macrodef-property:
-do-init:
-post-init:
-init-check:
-init-ap-cmdline-properties:
-init-macrodef-javac-with-processors:
-init-macrodef-javac-without-processors:
Trying to override old definition of task http://www.netbeans.org/ns/j2se-project/3:javac
-init-macrodef-javac:
Trying to override old definition of task http://www.netbeans.org/ns/j2se-project/3:depend
-init-macrodef-test-impl:
Trying to override old definition of task http://www.netbeans.org/ns/j2se-project/3:test-impl
-init-macrodef-junit-init:
-init-macrodef-junit-single:
-init-test-properties:
-init-macrodef-junit-batch:
-init-macrodef-junit:
-init-macrodef-junit-impl:
-init-macrodef-testng:
-init-macrodef-testng-impl:
-init-macrodef-test:
-init-macrodef-junit-debug:
-init-macrodef-junit-debug-batch:
-init-macrodef-junit-debug-impl:
-init-macrodef-test-debug-junit:
-init-macrodef-testng-debug:
-init-macrodef-testng-debug-impl:
-init-macrodef-test-debug-testng:
-init-macrodef-test-debug:
-init-debug-args:
-init-macrodef-nbjpda:
Trying to override old definition of task http://www.netbeans.org/ns/j2se-project/1:nbjpdastart
-init-macrodef-debug:
Trying to override old definition of task http://www.netbeans.org/ns/j2se-project/3:debug
-init-macrodef-java:
Trying to override old definition of task http://www.netbeans.org/ns/j2se-project/1:java
-init-presetdef-jar:
-init-ap-cmdline-supported:
-init-ap-cmdline:
init:
-deps-jar-init:
deps-jar:
-warn-already-built-jar:
[propertyfile] Updating property file: /opt/RDPTools/Clustering/build/built-jar.properties
-check-call-dep:
-maybe-call-dep:
download-ivy:
[get] Getting: http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0-rc2/ivy-2.1.0-rc2.jar
[get] To: /root/.ant/lib/ivy.jar
[get] Not modified - so not downloaded
init-ivy:
resolve:
[ivy:retrieve] :: loading settings :: url = jar:file:/root/.ant/lib/ivy.jar!/org/apache/ivy/core/settings/ivysettings.xml
[ivy:retrieve] :: resolving dependencies :: edu.cme.rdp#ProbeMatch;[email protected]
[ivy:retrieve] confs: [default]
[ivy:retrieve] found commons-lang#commons-lang;2.6 in public
[ivy:retrieve] found commons-cli#commons-cli;1.2 in public
[ivy:retrieve] found junit#junit;4.8.2 in public
[ivy:retrieve] :: resolution report :: resolve 32ms :: artifacts dl 5ms
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 3 | 0 | 0 | 0 || 9 | 0 |
---------------------------------------------------------------------
[ivy:retrieve] :: retrieving :: edu.cme.rdp#ProbeMatch
[ivy:retrieve] confs: [default]
[ivy:retrieve] 0 artifacts copied, 9 already retrieved (0kB/3ms)
-pre-init:
-init-private:
-init-user:
-init-project:
-init-macrodef-property:
-do-init:
-post-init:
-init-check:
-init-ap-cmdline-properties:
-init-macrodef-javac-with-processors:
-init-macrodef-javac-without-processors:
Trying to override old definition of task http://www.netbeans.org/ns/j2se-project/3:javac
-init-macrodef-javac:
Trying to override old definition of task http://www.netbeans.org/ns/j2se-project/3:depend
-init-macrodef-test-impl:
-init-macrodef-junit-init:
-init-macrodef-junit-single:
-init-test-properties:
-init-macrodef-junit-batch:
-init-macrodef-junit:
-init-macrodef-junit-impl:
Trying to override old definition of task http://www.netbeans.org/ns/j2se-project/3:test-impl
-init-macrodef-testng:
-init-macrodef-testng-impl:
-init-macrodef-test:
-init-macrodef-junit-debug:
-init-macrodef-junit-debug-batch:
-init-macrodef-junit-debug-impl:
-init-macrodef-test-debug-junit:
Trying to override old definition of task http://www.netbeans.org/ns/j2se-project/3:test-debug
-init-macrodef-testng-debug:
-init-macrodef-testng-debug-impl:
-init-macrodef-test-debug-testng:
-init-macrodef-test-debug:
-init-debug-args:
-init-macrodef-nbjpda:
Trying to override old definition of task http://www.netbeans.org/ns/j2se-project/1:nbjpdastart
-init-macrodef-debug:
Trying to override old definition of task http://www.netbeans.org/ns/j2se-project/3:debug
-init-macrodef-java:
Trying to override old definition of task http://www.netbeans.org/ns/j2se-project/1:java
-init-presetdef-jar:
-init-ap-cmdline-supported:
-init-ap-cmdline:
init:
-deps-jar-init:
deps-jar:
-warn-already-built-jar:
[propertyfile] Updating property file: /opt/RDPTools/Clustering/build/built-jar.properties
-check-call-dep:
-maybe-call-dep:
-check-call-dep:
-maybe-call-dep:
-check-automatic-build:
-clean-after-automatic-build:
-verify-automatic-build:
-pre-pre-compile:
-pre-compile:
-copy-persistence-xml:
-compile-depend:
-do-compile:
-post-compile:
compile:
-pre-jar:
-post-jar:
jar:
[echo] To run this application from the command line without Ant, try:
[echo] java -jar "/opt/RDPTools/ProbeMatch/dist/ProbeMatch.jar"
-check-call-dep:
-maybe-call-dep:
-check-automatic-build:
-clean-after-automatic-build:
-verify-automatic-build:
-pre-pre-compile:
-pre-compile:
-copy-persistence-xml:
-compile-depend:
-do-compile:
-post-compile:
compile:
-pre-jar:
-post-jar:
jar:
[echo] To run this application from the command line without Ant, try:
[echo] java -jar "/opt/RDPTools/SeqFilters/dist/SeqFilters.jar"
-check-call-dep:
-maybe-call-dep:
-check-call-dep:
-maybe-call-dep:
-check-automatic-build:
-clean-after-automatic-build:
-verify-automatic-build:
-pre-pre-compile:
-pre-compile:
-copy-persistence-xml:
-compile-depend:
-do-compile:
[javac] Compiling 72 source files to /opt/RDPTools/Clustering/build/classes
[javac] warning: [options] bootstrap class path not set in conjunction with -source 1.5
[javac] /opt/RDPTools/Clustering/src/edu/msu/cme/pyro/cluster/ClusterReplay.java:26: error: cannot find symbol
[javac] import edu.msu.cme.rdp.taxatree.Taxon;
[javac] ^
[javac] symbol: class Taxon
[javac] location: package edu.msu.cme.rdp.taxatree
[javac] /opt/RDPTools/Clustering/src/edu/msu/cme/pyro/cluster/ClusterReplay.java:27: error: cannot find symbol
[javac] import edu.msu.cme.rdp.taxatree.TaxonHolder;
[javac] ^
[javac] symbol: class TaxonHolder
[javac] location: package edu.msu.cme.rdp.taxatree
[javac] /opt/RDPTools/Clustering/src/edu/msu/cme/rdp/taxatree/TreeBuilder.java:22: error: package edu.msu.cme.rdp.taxatree.utils does not exist
[javac] import edu.msu.cme.rdp.taxatree.utils.NewickPrintVisitor;
[javac] ^
[javac] /opt/RDPTools/Clustering/src/edu/msu/cme/rdp/taxatree/TreeBuilder.java:23: error: package edu.msu.cme.rdp.taxatree.utils.NewickPrintVisitor does not exist
[javac] import edu.msu.cme.rdp.taxatree.utils.NewickPrintVisitor.NewickDistanceFactory;
[javac] ^
[javac] /opt/RDPTools/Clustering/src/edu/msu/cme/rdp/taxatree/TreeBuilder.java:50: error: cannot find symbol
[javac] TaxonHolder lastMerged = null;
[javac] ^
[javac] symbol: class TaxonHolder
[javac] location: class TreeBuilder
[javac] /opt/RDPTools/Clustering/src/edu/msu/cme/rdp/taxatree/TreeBuilder.java:53: error: cannot find symbol
[javac] Map<Integer, TaxonHolder> taxonMap = new HashMap();
[javac] ^
[javac] symbol: class TaxonHolder
[javac] location: class TreeBuilder
[javac] /opt/RDPTools/Clustering/src/edu/msu/cme/rdp/taxatree/TreeBuilder.java:60: error: cannot find symbol
[javac] TaxonHolder holder;
[javac] ^
[javac] symbol: class TaxonHolder
[javac] location: class TreeBuilder
[javac] /opt/RDPTools/Clustering/src/edu/msu/cme/rdp/taxatree/TreeBuilder.java:60: error: cannot find symbol
[javac] TaxonHolder holder;
[javac] ^
[javac] symbol: class Taxon
[javac] location: class TreeBuilder
[javac] /opt/RDPTools/Clustering/src/edu/msu/cme/rdp/taxatree/TreeBuilder.java:64: error: cannot find symbol
[javac] holder = new TaxonHolder(new Taxon(taxid++, seqids.get(0), ""));
[javac] ^
[javac] symbol: class TaxonHolder
[javac] location: class TreeBuilder
[javac] /opt/RDPTools/Clustering/src/edu/msu/cme/rdp/taxatree/TreeBuilder.java:64: error: cannot find symbol
[javac] holder = new TaxonHolder(new Taxon(taxid++, seqids.get(0), ""));
[javac] ^
[javac] symbol: class Taxon
[javac] location: class TreeBuilder
[javac] /opt/RDPTools/Clustering/src/edu/msu/cme/rdp/taxatree/TreeBuilder.java:66: error: cannot find symbol
[javac] holder = new TaxonHolder(new Taxon(taxid++, "", ""));
[javac] ^
[javac] symbol: class TaxonHolder
[javac] location: class TreeBuilder
[javac] /opt/RDPTools/Clustering/src/edu/msu/cme/rdp/taxatree/TreeBuilder.java:66: error: cannot find symbol
[javac] holder = new TaxonHolder(new Taxon(taxid++, "", ""));
[javac] ^
[javac] symbol: class Taxon
[javac] location: class TreeBuilder
[javac] /opt/RDPTools/Clustering/src/edu/msu/cme/rdp/taxatree/TreeBuilder.java:70: error: cannot find symbol
[javac] TaxonHolder th = new TaxonHolder(new Taxon(id, seqid, ""));
[javac] ^
[javac] symbol: class TaxonHolder
[javac] location: class TreeBuilder
[javac] /opt/RDPTools/Clustering/src/edu/msu/cme/rdp/taxatree/TreeBuilder.java:70: error: cannot find symbol
[javac] TaxonHolder th = new TaxonHolder(new Taxon(id, seqid, ""));
[javac] ^
[javac] symbol: class TaxonHolder
[javac] location: class TreeBuilder
[javac] /opt/RDPTools/Clustering/src/edu/msu/cme/rdp/taxatree/TreeBuilder.java:70: error: cannot find symbol
[javac] TaxonHolder th = new TaxonHolder(new Taxon(id, seqid, ""));
[javac] ^
[javac] symbol: class Taxon
[javac] location: class TreeBuilder
[javac] /opt/RDPTools/Clustering/src/edu/msu/cme/rdp/taxatree/TreeBuilder.java:84: error: cannot find symbol
[javac] TaxonHolder holder = new TaxonHolder(new Taxon(taxid++, "", ""));
[javac] ^
[javac] symbol: class TaxonHolder
[javac] location: class TreeBuilder
[javac] /opt/RDPTools/Clustering/src/edu/msu/cme/rdp/taxatree/TreeBuilder.java:84: error: cannot find symbol
[javac] TaxonHolder holder = new TaxonHolder(new Taxon(taxid++, "", ""));
[javac] ^
[javac] symbol: class TaxonHolder
[javac] location: class TreeBuilder
[javac] /opt/RDPTools/Clustering/src/edu/msu/cme/rdp/taxatree/TreeBuilder.java:84: error: cannot find symbol
[javac] TaxonHolder holder = new TaxonHolder(new Taxon(taxid++, "", ""));
[javac] ^
[javac] symbol: class Taxon
[javac] location: class TreeBuilder
[javac] /opt/RDPTools/Clustering/src/edu/msu/cme/rdp/taxatree/TreeBuilder.java:106: error: cannot find symbol
[javac] NewickPrintVisitor visitor = new NewickPrintVisitor(newickTreeOut, false, new NewickDistanceFactory() {
[javac] ^
[javac] symbol: class NewickPrintVisitor
[javac] location: class TreeBuilder
[javac] /opt/RDPTools/Clustering/src/edu/msu/cme/rdp/taxatree/TreeBuilder.java:106: error: cannot find symbol
[javac] NewickPrintVisitor visitor = new NewickPrintVisitor(newickTreeOut, false, new NewickDistanceFactory() {
[javac] ^
[javac] symbol: class NewickPrintVisitor
[javac] location: class TreeBuilder
[javac] /opt/RDPTools/Clustering/src/edu/msu/cme/rdp/taxatree/TreeBuilder.java:106: error: cannot find symbol
[javac] NewickPrintVisitor visitor = new NewickPrintVisitor(newickTreeOut, false, new NewickDistanceFactory() {
[javac] ^
[javac] symbol: class NewickDistanceFactory
[javac] location: class TreeBuilder
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.
[javac] 21 errors
[javac] 1 warning
BUILD FAILED
/opt/RDPTools/Clustering/nbproject/build-impl.xml:955: The following error occurred while executing this line:
/opt/RDPTools/Clustering/nbproject/build-impl.xml:300: Compile failed; see the compiler error output for details.
Total time: 4 seconds
make: *** [Clustering/dist/Clustering.jar] Error 1
I'm trying to crossvalidate my training set with the latest classifier version from github (2014-11-24) on 64 bit Ubuntu 14.04 with java version "1.7.0_65" (OpenJDK). Unfortunately I get a NullPointerException each time I try:
java -Xmx1g -jar ~/software/RDPTools/classifier.jar crossvalidate -o minimal.txt -s minimal.fa -t minimal.tax
164458368
298507601
298507603
164458369
298507604
Exception in thread "main" java.lang.NullPointerException
at edu.msu.cme.rdp.classifier.train.validation.HierarchyTree.getWordOccurrence(HierarchyTree.java:289)
at edu.msu.cme.rdp.classifier.train.validation.NBClassifier.calculateProb(NBClassifier.java:107)
at edu.msu.cme.rdp.classifier.train.validation.NBClassifier.assignClass(NBClassifier.java:68)
at edu.msu.cme.rdp.classifier.train.validation.DecisionMaker.getBestClasspath(DecisionMaker.java:42)
at edu.msu.cme.rdp.classifier.train.validation.crossvalidate.CrossValidate.runTest(CrossValidate.java:117)
at edu.msu.cme.rdp.classifier.train.validation.crossvalidate.CrossValidateMain.main(CrossValidateMain.java:121)
at edu.msu.cme.rdp.classifier.cli.ClassifierMain.main(ClassifierMain.java:77)
For testing purpose I attach my minimal.fa and minimal.tax below. They work fine with loot, taxa-sim and train.
minimal.fa: https://gist.github.com/iimog/aa2ce23ab6f4d63cfa2b
minimal.tax: https://gist.github.com/iimog/4ea791d8be7f51073c46
Any help would be highly regarded.
Thanks in advance,
Markus Ankenbrand
When running the subcommand taxasim in an environment without a X11 server an Exception is thrown at the end of execution:
java -jar classifier.jar taxa-sim rdp.tax rdp.fa rdp.fa taxasim 8 rankFile sab
100
200
300
... [truncated]
Exception in thread "main" java.lang.InternalError: Can't connect to X11 window server using 'localhost:12.0' as the value of the DISPLAY variable.
at sun.awt.X11GraphicsEnvironment.initDisplay(Native Method)
at sun.awt.X11GraphicsEnvironment.access$200(X11GraphicsEnvironment.java:62)
at sun.awt.X11GraphicsEnvironment$1.run(X11GraphicsEnvironment.java:178)
at java.security.AccessController.doPrivileged(Native Method)
at sun.awt.X11GraphicsEnvironment.<clinit>(X11GraphicsEnvironment.java:142)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:190)
at java.awt.GraphicsEnvironment.getLocalGraphicsEnvironment(GraphicsEnvironment.java:82)
at sun.swing.SwingUtilities2.isLocalDisplay(SwingUtilities2.java:1406)
at javax.swing.plaf.metal.MetalLookAndFeel.initComponentDefaults(MetalLookAndFeel.java:1563)
at javax.swing.plaf.basic.BasicLookAndFeel.getDefaults(BasicLookAndFeel.java:147)
at javax.swing.plaf.metal.MetalLookAndFeel.getDefaults(MetalLookAndFeel.java:1599)
at javax.swing.UIManager.setLookAndFeel(UIManager.java:530)
at javax.swing.UIManager.setLookAndFeel(UIManager.java:570)
at javax.swing.UIManager.initializeDefaultLAF(UIManager.java:1320)
at javax.swing.UIManager.initialize(UIManager.java:1407)
at javax.swing.UIManager.maybeInitialize(UIManager.java:1395)
at javax.swing.UIManager.getDefaults(UIManager.java:644)
at javax.swing.UIManager.getColor(UIManager.java:686)
at org.jfree.chart.JFreeChart.<clinit>(JFreeChart.java:261)
at org.jfree.chart.ChartFactory.createXYLineChart(ChartFactory.java:1748)
at edu.msu.cme.rdp.classifier.train.validation.distance.TaxaSimilarityMain.createPlot(TaxaSimilarityMain.java:324)
at edu.msu.cme.rdp.classifier.train.validation.distance.TaxaSimilarityMain.main(TaxaSimilarityMain.java:385)
at edu.msu.cme.rdp.classifier.cli.ClassifierMain.main(ClassifierMain.java:79)
The plot is not generated and the txt output is incomplete. This error can be avoided by invocing the classifier in headless mode:
java -Djava.awt.headless=true -jar classifier.jar taxa-sim rdp.tax rdp.fa rdp.fa taxasim 8 rankFile sab
This workaround should be documented or else the problem can be fixed in code by adding
System.setProperty("java.awt.headless", "true");
before any graphics code (e.g. in a static {}
block)
See this post on Stack Overflow.
Cheers, Markus Ankenbrand
Hello,
I tried to solve this issue for a couple of hours but nothing worked for me.
I tired to build the tools with sudo make but everytime I am running into the same issue (see below)
ant -f Clustering/build.xml jar
Buildfile: /home/davin/Documents/RDPTools/Clustering/build.xml
download-ivy:
[get] Getting: http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0-rc2/ivy-2.1.0-rc2.jar
[get] To: /root/.ant/lib/ivy.jar
[get] Error getting http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0-rc2/ivy-2.1.0-rc2.jar to /root/.ant/lib/ivy.jar
BUILD FAILED
/home/davin/Documents/RDPTools/Clustering/build.xml:87: java.net.UnknownHostException: repo2.maven.org
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:607)
at java.net.Socket.connect(Socket.java:556)
at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
at sun.net.www.http.HttpClient.(HttpClient.java:242)
at sun.net.www.http.HttpClient.New(HttpClient.java:339)
at sun.net.www.http.HttpClient.New(HttpClient.java:357)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1226)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1162)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1056)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:990)
at org.apache.tools.ant.taskdefs.Get$GetThread.openConnection(Get.java:766)
at org.apache.tools.ant.taskdefs.Get$GetThread.get(Get.java:676)
at org.apache.tools.ant.taskdefs.Get$GetThread.run(Get.java:666)
Total time: 0 seconds
Makefile:15: recipe for target 'Clustering/dist/Clustering.jar' failed
make: *** [Clustering/dist/Clustering.jar] Error 1
I tried to clone the repository to my home and ran it again. I also installed JDK version 8 and I made sure my ant version is up to date. Nothing seems to work from the other solutions I found so I opened this issue here.
Thanks in advance for any help!
Hi I'm getting the following error:
cmd-> java -Xmx16g -jar /bio_bin/rdp_classifier_2.10.1/dist/classifier.jar merge-detail \
> -o combined_rdp.txt -h combined_hierarchy.txt -c 0.5 --gene fungalits_unite \
> S14MNOARP5.trimmed.noplant/S14MNOARP5.trimmed.noplant_UNITE_public_30.12.2014_UniVec.ovl.dusted.valid.rdp.tmp \
> S14MNOAEP5.trimmed.noplant/S14MNOAEP5.trimmed.noplant_UNITE_public_30.12.2014_UniVec.ovl.dusted.valid.rdp.tmp
Command Error: fungalits_unite is NOT valid, only allows 16srrna, fungallsu, fungalits_warcup and fungalits_unite
Any ideas why? The classification portion worked without issue.
Hi,
I have a dataset of 16S sequencing composed by multiple fasta files (54 files) one for each sample. I would like to classify all the sequences in the dataset using the dereplication pipeline in order to speed up the entire process. However when the classification process reaches the end of the analysis, in the hier file there are only the assignments of the dereplicated file and not the assignments of the original files (the 54 files). any help would be appreciated!
Thanks in advance
Giovanni
Hi, rdpstaff group,
When I use
java -Xmx1g -jar /path/to/classifier.jar classify -t mytrained/rRNAClassifier.properties -o result.tax.txt asv_seqs.fasta
and found that there there is a lot of s__uncultured_bacterium_qiime_unique_taxon_tag_xxxx and o__norank_qiime_unique_taxon_tag_xxxx results in my taxonomic classification result.
But I have seen any unique taxon_tag in my raw database.
Is this norank or uncultured came from classifier.jar ?
Hi,
When I use RDP classifier with my own databank (a very large 16S databank) the CPU usage of RDP is unacceptable : up to 2360% (see below).
This phenomena doesn't appear with the default databank and is more reduced with the databank provided in example of RDP train classifier.
How can I reduce the CPU consumption/nb threads of RDP classifier ?
Command with my databank:
java -Xmx15g -jar path/to/classifier.jar classify -c 0.8 -t path/to/my_bank.properties -o result.rdp sub.fasta
Consumption:
top - 09:51:00 up 56 days, 22:36, 0 users, load average: 15.10, 23.87, 20.76
Tasks: 840 total, 11 running, 829 sleeping, 0 stopped, 0 zombie
Cpu(s): 81.2%us, 0.2%sy, 0.0%ni, 18.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 264438700k total, 84939736k used, 179498964k free, 174172k buffers
Swap: 16777208k total, 36100k used, 16741108k free, 64703676k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
65765 fescudie 20 0 18.4g 6.4g 10m S 2360.5 2.6 4:59.91 java
65850 fescudie 20 0 13684 1776 880 R 0.7 0.0 0:00.05 top
65432 fescudie 20 0 104m 1948 1408 S 0.0 0.0 0:00.15 bash
Consumption with threads:
top - 10:33:10 up 56 days, 23:18, 0 users, load average: 14.83, 10.51, 10.28
Tasks: 1305 total, 11 running, 1294 sleeping, 0 stopped, 0 zombie
Cpu(s): 41.4%us, 2.5%sy, 0.0%ni, 56.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 264438700k total, 83889500k used, 180549200k free, 174876k buffers
Swap: 16777208k total, 36100k used, 16741108k free, 64773160k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
66871 fescudie 20 0 18.4g 5.3g 9m R 70.3 2.1 0:16.90 java
66876 fescudie 20 0 18.4g 5.3g 9m S 29.7 2.1 0:02.20 java
66889 fescudie 20 0 18.4g 5.3g 9m S 29.7 2.1 0:02.31 java
66891 fescudie 20 0 18.4g 5.3g 9m S 29.7 2.1 0:02.22 java
66897 fescudie 20 0 18.4g 5.3g 9m S 29.7 2.1 0:02.27 java
66878 fescudie 20 0 18.4g 5.3g 9m S 29.4 2.1 0:02.05 java
66879 fescudie 20 0 18.4g 5.3g 9m S 29.4 2.1 0:02.01 java
66881 fescudie 20 0 18.4g 5.3g 9m S 29.4 2.1 0:02.07 java
66882 fescudie 20 0 18.4g 5.3g 9m S 29.4 2.1 0:02.13 java
66884 fescudie 20 0 18.4g 5.3g 9m S 29.4 2.1 0:01.99 java
66886 fescudie 20 0 18.4g 5.3g 9m S 29.4 2.1 0:02.19 java
66890 fescudie 20 0 18.4g 5.3g 9m S 29.4 2.1 0:02.12 java
66892 fescudie 20 0 18.4g 5.3g 9m S 29.4 2.1 0:02.16 java
66893 fescudie 20 0 18.4g 5.3g 9m S 29.4 2.1 0:02.29 java
66894 fescudie 20 0 18.4g 5.3g 9m S 29.4 2.1 0:01.68 java
66895 fescudie 20 0 18.4g 5.3g 9m S 29.4 2.1 0:02.04 java
66896 fescudie 20 0 18.4g 5.3g 9m S 29.4 2.1 0:02.27 java
66898 fescudie 20 0 18.4g 5.3g 9m S 29.4 2.1 0:02.11 java
66875 fescudie 20 0 18.4g 5.3g 9m S 29.1 2.1 0:02.22 java
66877 fescudie 20 0 18.4g 5.3g 9m S 29.1 2.1 0:02.26 java
66899 fescudie 20 0 18.4g 5.3g 9m S 29.1 2.1 0:02.26 java
66885 fescudie 20 0 18.4g 5.3g 9m S 28.7 2.1 0:02.13 java
66880 fescudie 20 0 18.4g 5.3g 9m S 28.4 2.1 0:02.19 java
66874 fescudie 20 0 18.4g 5.3g 9m S 28.1 2.1 0:02.01 java
66872 fescudie 20 0 18.4g 5.3g 9m S 26.8 2.1 0:01.99 java
66873 fescudie 20 0 18.4g 5.3g 9m S 26.1 2.1 0:02.00 java
66883 fescudie 20 0 18.4g 5.3g 9m S 24.1 2.1 0:02.03 java
66888 fescudie 20 0 18.4g 5.3g 9m S 22.1 2.1 0:01.62 java
66887 fescudie 20 0 18.4g 5.3g 9m S 21.8 2.1 0:01.92 java
66912 fescudie 20 0 14080 2168 884 R 1.0 0.0 0:00.11 top
65432 fescudie 20 0 104m 1948 1408 S 0.0 0.0 0:00.44 bash
66870 fescudie 20 0 18.4g 5.3g 9m S 0.0 2.1 0:00.00 java
66900 fescudie 20 0 18.4g 5.3g 9m S 0.0 2.1 0:00.00 java
66901 fescudie 20 0 18.4g 5.3g 9m S 0.0 2.1 0:00.00 java
66902 fescudie 20 0 18.4g 5.3g 9m S 0.0 2.1 0:00.00 java
66903 fescudie 20 0 18.4g 5.3g 9m S 0.0 2.1 0:00.00 java
66904 fescudie 20 0 18.4g 5.3g 9m S 0.0 2.1 0:00.10 java
66905 fescudie 20 0 18.4g 5.3g 9m S 0.0 2.1 0:00.09 java
66906 fescudie 20 0 18.4g 5.3g 9m S 0.0 2.1 0:00.00 java
66907 fescudie 20 0 18.4g 5.3g 9m S 0.0 2.1 0:00.00 java
Command with RDP default databank:
java -Xmx15g -jar path/to/classifier.jar classify -c 0.8 -o result.rdp sub.fasta
Consumption:
top - 09:53:41 up 56 days, 22:39, 0 users, load average: 9.96, 17.82, 18.93
Tasks: 840 total, 10 running, 830 sleeping, 0 stopped, 0 zombie
Cpu(s): 25.0%us, 0.0%sy, 0.0%ni, 75.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 264438700k total, 78978564k used, 185460136k free, 174216k buffers
Swap: 16777208k total, 36100k used, 16741108k free, 64768832k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
65863 fescudie 20 0 18.4g 703m 10m S 100.1 0.3 1:18.87 java
65917 fescudie 20 0 13684 1784 880 R 0.3 0.0 0:00.36 top
65432 fescudie 20 0 104m 1948 1408 S 0.0 0.0 0:00.16 bash
Command with 'Example command to train classifier':
java -Xmx1g -jar path/to/classifier.jar train -o mytrained -s path/to/RDPTools/classifier/samplefiles/new_trainset.fasta -t path/to/RDPTools/classifier/samplefiles/new_trainset_db_taxid.txt
cp path/to/RDPTools/classifier/samplefiles/rRNAClassifier.properties mytrained
java -Xmx15g -jar path/to/classifier.jar classify -c 0.8 -t mytrained/rRNAClassifier.properties -o result.rdp sub.fasta
Consumption:
top - 10:23:54 up 56 days, 23:09, 0 users, load average: 9.19, 8.95, 10.32
Tasks: 840 total, 10 running, 830 sleeping, 0 stopped, 0 zombie
Cpu(s): 25.5%us, 0.1%sy, 0.0%ni, 74.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 264438700k total, 78953232k used, 185485468k free, 174720k buffers
Swap: 16777208k total, 36100k used, 16741108k free, 64773140k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
66617 fescudie 20 0 18.4g 590m 10m S 120.6 0.2 0:29.30 java
66655 fescudie 20 0 13684 1784 884 R 0.7 0.0 0:00.13 top
65432 fescudie 20 0 104m 1948 1408 S 0.0 0.0 0:00.30 bash
Thanks in advance.
Hi - Hola .
I hope your help, please.
I download classifier with command:
$git clone https://github.com/rdpstaff/classifier.git
$cd classifier #for enter.
$ls #for list:
build build.xml ivy.xml lib LICENSE manifest.mf nbproject README samplefiles src test
and finally for install classifier:
$ant -f build.xml
and have a error in one test:
-do-test-run:
[junit] Testsuite: edu.msu.cme.rdp.classifier.rrnaclassifier.ClassifierTest
[junit] testClassify
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0,072 sec
[junit]
[junit] ------------- Standard Output ---------------
[junit] testClassify
[junit] ------------- ---------------- ---------------
[junit] testGetName
[junit] Testsuite: edu.msu.cme.rdp.classifier.rrnaclassifier.HierarchyTreeTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0,069 sec
[junit]
[junit] ------------- Standard Error -----------------
[junit] testGetName
[junit] ------------- ---------------- ---------------
[junit] Testsuite: edu.msu.cme.rdp.classifier.rrnaclassifier.ParsedSequenceTest
[junit] testGetReversedWord
[junit] testGetWordIndex
[junit] testCreateWordIndexArr
[junit] testGetReversedSeq
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0,077 sec
[junit]
[junit] ------------- Standard Output ---------------
[junit] testGetReversedWord
[junit] testGetWordIndex
[junit] testCreateWordIndexArr
[junit] testGetReversedSeq
[junit] ------------- ---------------- ---------------
[junit] Testsuite: edu.msu.cme.rdp.classifier.rrnaclassifier.TrainingInfoTest
[junit] testCreateTree
[junit] testCreateLogWordPriorArr
[junit] testCreateProbIndexArr
[junit] testCreateClassifier
[junit] testCreateGenusWordConditionalProbList
[junit] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0,565 sec
[junit]
[junit] ------------- Standard Output ---------------
[junit] testCreateTree
[junit] testCreateLogWordPriorArr
[junit] testCreateProbIndexArr
[junit] testCreateClassifier
[junit] testCreateGenusWordConditionalProbList
[junit] ------------- ---------------- ---------------
[junit] Testsuite: edu.msu.cme.rdp.classifier.rrnaclassifier.TreeFileParserTest
[junit] testParseTreeFile
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0,143 sec
[junit]
[junit] ------------- Standard Output ---------------
[junit] testParseTreeFile
[junit] ------------- ---------------- ---------------
[junit] Testsuite: edu.msu.cme.rdp.classifier.train.GoodWordIteratorTest
[junit] testNext
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0,061 sec
[junit]
[junit] ------------- Standard Output ---------------
[junit] testNext
[junit] ------------- ---------------- ---------------
[junit] Testsuite: edu.msu.cme.rdp.classifier.train.RawHierarchyTreeTest
[junit] testInitWordOccurrence()
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0,083 sec
[junit]
[junit] ------------- Standard Output ---------------
[junit] testInitWordOccurrence()
[junit] ------------- ---------------- ---------------
[junit] Testsuite: edu.msu.cme.rdp.classifier.train.RawSequenceParserTest
[junit] testNext()
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0,083 sec
[junit]
[junit] ------------- Standard Output ---------------
[junit] testNext()
[junit] ------------- ---------------- ---------------
[junit] Testsuite: edu.msu.cme.rdp.classifier.train.TreeFactoryTest
[junit] testAddSequence
[junit] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0,139 sec
[junit]
[junit] ------------- Standard Output ---------------
[junit] testAddSequence
[junit] ------------- ---------------- ---------------
[junit] Testcase: testAddSequence(edu.msu.cme.rdp.classifier.train.TreeFactoryTest): FAILED
[junit] null expected:<G[1]> but was:<G[2]>
[junit] junit.framework.ComparisonFailure: null expected:<G[1]> but was:<G[2]>
[junit] at edu.msu.cme.rdp.classifier.train.TreeFactoryTest.testAddSequence(TreeFactoryTest.java:55)
[junit]
[junit]
[junit] Test edu.msu.cme.rdp.classifier.train.TreeFactoryTest FAILED
[junit] Testsuite: edu.msu.cme.rdp.classifier.train.validation.AddLogsTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0,07 sec
[junit]
[junit] Testsuite: edu.msu.cme.rdp.classifier.train.validation.DecisionMakerTest
[junit] testGetBestClasspath
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0,54 sec
[junit]
[junit] ------------- Standard Output ---------------
[junit] testGetBestClasspath
[junit] ------------- ---------------- ---------------
[junit] Testsuite: edu.msu.cme.rdp.classifier.train.validation.GoodWordIteratorTest
[junit] testNext
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0,072 sec
[junit]
[junit] ------------- Standard Output ---------------
[junit] testNext
[junit] ------------- ---------------- ---------------
[junit] Testsuite: edu.msu.cme.rdp.classifier.train.validation.HierarchyTreeTest
[junit] testHideSeq
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0,133 sec
[junit]
[junit] ------------- Standard Output ---------------
[junit] testHideSeq
[junit] ------------- ---------------- ---------------
[junit] Testsuite: edu.msu.cme.rdp.classifier.train.validation.NBClassifierTest
[junit] testassignClass
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0,107 sec
[junit]
[junit] ------------- Standard Output ---------------
[junit] testassignClass
[junit] ------------- ---------------- ---------------
[junit] Testsuite: edu.msu.cme.rdp.classifier.train.validation.SequenceParserTest
[junit] testNext()
[junit] testHasNext
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0,088 sec
[junit]
[junit] ------------- Standard Output ---------------
[junit] testHasNext
[junit] ------------- ---------------- ---------------
[junit] ------------- Standard Error -----------------
[junit] testNext()
[junit] ------------- ---------------- ---------------
[junit] Testsuite: edu.msu.cme.rdp.classifier.train.validation.TreeFactoryTest
[junit] testAddSequence
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0,092 sec
[junit]
[junit] ------------- Standard Output ---------------
[junit] testAddSequence
[junit] ------------- ---------------- ---------------
test-report:
-post-test-run:
BUILD FAILED
/home/orson/RDPTools/classifier/nbproject/build-impl.xml:1304: Some tests failed; see details above.
Total time: 10 seconds
Please. Help me. Thanks
Hi
I am using the standalone RDP classifier to annotate our assemblies. But I find that the RDP nomenclature is different from NCBI nomenclature, such as the NCBI genus Anabaena of phylum Cyanobacteria is named as GpI according to RDP classification. The inconsistence between two nomenclatural systems makes me confuse and cannot determine the identity apparently. Is there any tool can give the mapping between these two nomenclatures?
Thank you very much.
Hello classifier team,
I'm interested to use copy number utility from RDP classifier. I've following as my query:
1- The documentation mentions to provide copy number files with flag -c. However, in -help, -c is for the confidence used for the classification.
Am I misunderstanding something from help on IO screen and your documentation?
Kindly guide.
RDPstaff,
I am trying to retrain the RDP classifier and have an issue with the -c option. I have already prepped my seq and tax files (end of email) and trained RDP against them.
It output 4 files (below), but none of them is the properties file.
bergeyTrainingTree.xml logWordPrior.txt
genus_wordConditionalProbList.txt wordConditionalProbIndexArr.txt
Do I need to include the -c file to get this? If so, there is no information anywhere on how to generate it that I can find so I was hoping can help. According to the README, "It should at least three columns: name, rank and mean for the lowest rank taxon to be trained". What do you mean by mean in the context of this file? Furthermore, how should I go about generating the whole file?
SEQ FILE
AB353770|AB353770.1.1740_U Root;Eukaryota;Alveolata;Dinoflagellata;Dinophyceae;Peridiniales;Kryptoperidiniaceae;Unruhdinium
ATGCTTGTCTCAAAGATTAAGCCATGCATGTCTCAGTATAAGCTTTTACATGGCGAAACTGCGAATGGCTCATTAAAACAGTTACAGTTTATTTGAAG (cont.)
TAX FILE
0*Root*-1*0*rootrank
1*Eukaryota*0*1*domain
2*Alveolata*1*2*supergroup
3*Dinoflagellata*2*3*division
4*Dinophyceae*3*4*class
5*Peridiniales*4*5*order
Thanks for the help
-Andrew Davis
Hello,
I would like to create my own training data. Could you send me the scripts, lineage2taxTrain.py and addFullLineage.py ? I will really appreciate that. My email address is [email protected]
Thanks !
Hi guys, Im trying to perform leave one sequence out testing on the rdp classifier such that i can get all the classifications for all 13212 sequences in the RDP dataset. The output of the leave one sequence out testing only returns the misclassified sequences with their confidences without all levels. How do i get the leave one sequence out testing format to give me all the classifications at all levels with all confidences. So far i've tried to set the confidence like this but it doesnt seem to work :
java -Xmx1g -jar /path/to/classifier.jar loot -q -allrank -c 0.0 samplefiles/Armatimonadetes.fasta -s samplefiles/new_trainset.fasta -t samplefiles/new_trainset_db_taxid.txt -l 400 -o Armatimonadetes_400_loso_test.txt
Can you guys help out?
Thanks
Hi,
From the RDP classifier paper I read about, it says the word size is 8 (Just to make sure I am understanding it right, the size here should be the length of word, such as ATCTGGTC, right?), which is the optimal, because the other word size of 6,7 or 9 is not accurate enough comparing to size 8 according to preliminary experiments.
- Is there an option for me to pick another word size when I am training my own classifier with a customized database?
Also, I want to know how do you chop up the reads in the database? It says all the words should be non-overlapping, which is to satisfy the assumption for Bayes Rule that all features are independent (correct me if I am understanding it incorrectly). Say I have a sequence in the database:
SeqA: AAAAAAAA TTTTTTTT GGGGGGGG TTTTTTTT
If I chop up from the very first nt, then I should get the 8-size word:
AAAAAAA X1, TTTTTTTT X2, and GGGGGGGG X1, and this will be recorded as the features for this particular genus.
But what if I have a test sequence:
SeqB: ATTTTTTT TGG, clearly you can tell it's a subset from SeqA (I make the subset bold in SeqA), but if I chop up from the very first nt, it won't give me the same feature word as you could get from SeqA. I will get ATTTTTTT, and whatever the leftover: TGG. I am curious, what do you do with the leftover nt? Just throw them away?
- I think I need a little insight about how to chop up the database into kmers, and how you define the features?
I am a beginner in Machine Learning algorithms, and still trying to learn more about RDP classifier. If my understanding is wrong, I am welcome to any suggestion.
Thanks a lot!
Eddi
Hi,
When trying to merge the existing results of the classifier I get the following error:
cmd-> java -Xmx16g -jar /bio_bin/rdp_classifier_2.8/dist/classifier.jar merge-detail \
-o merged_classified.txt
-h merged_classified.hier.txt
-c 0.5
--train_propfile /bioinformatics/bio_db/silva_SSURef_108_tax_silva_trunc/qiime/Silva_108/taxa_mapping/CombinedClassifier/rRNAClassifier.properties
./Corn-Root-P1-MP/Corn-Root-P1-MP.16S18S.univec.rdp ./Corn-Root-P1-Mobio/Corn-Root-P1-Mobio.16S18S.univec.rdp
Exception in thread "main" java.lang.IllegalArgumentException: taxon Node environmental samples in line "M01224:135:000000000-A9TYB:1:1107:21832:7607 Root norank 1.0 Eukaryota Superkingdom 1.0 Fungi Kingdom 0.99 Dikarya Subkingdom 0.99 Basidiomycota Phylum 0.99 environmental samples Genus 0.75" is not found in the original Classifier training data.
at edu.msu.cme.rdp.classifier.rrnaclassifier.ClassificationParser.next(ClassificationParser.java:107)
at edu.msu.cme.rdp.multicompare.MultiClassifier.multiClassificationParser(MultiClassifier.java:252)
at edu.msu.cme.rdp.multicompare.Reprocess.main(Reprocess.java:184)
at edu.msu.cme.rdp.classifier.cli.ClassifierMain.main(ClassifierMain.java:69)
Is there any way the exception handling could be done more gracefully and keep going with the merging, while putting the offending reads in a separate file or just logging that some reads were problematic?
BTW, all the reads were classified with same training set that was supplied on the command line and should NOT be generating an error.
I ran two LOOTs. When leaving one taxon out, the % misclassified in the last table (**misclassified sequences group by taxon) is always 100%, which is not correct, according to the other tables in the output and according to the LOOT by sequence analysis.
java -Xmx46g -jar classifier.jar loot -h -q MIDORI_UNIQUE_1.1_COI_RDP_.05_seqs.fasta -s MIDORI_UNIQUE_1.1_COI_RDP.fasta -t RDP_taxonomy_file.txt -o midori_leaveonetaxonout_test_0.05.txt
**misclassified sequences group by taxon | ||
---|---|---|
Tested Seqs (non-singleton) | misclassified | pct misclassified |
26881 | 26881 | 1 |
26881 | 26881 | 1 |
16 | 16 | 1 |
11 | 11 | 1 |
11 | 11 | 1 |
10 | 10 | 1 |
0 | 0 | 0 |
... |
java -Xmx46g -jar classifier.jar loot -q MIDORI_UNIQUE_1.1_COI_RDP_.05_seqs.fasta -s MIDORI_UNIQUE_1.1_COI_RDP.fasta -t RDP_taxonomy_file.txt -o midori_leaveoneseqout_test_0.05.txt
**misclassified sequences group by taxon | ||
---|---|---|
Tested Seqs (non-singleton) | misclassified | pct misclassified |
26881 | 2363 | 0.087905956 |
26881 | 2363 | 0.087905956 |
16 | 3 | 0.1875 |
11 | 0 | 0 |
11 | 0 | 0 |
10 | 0 | 0 |
... |
Dear Team,
I noticed the following in your instructions for training RDP.
"Based on our experience, trimming the sequences to a specific region does not improve accuracy."
I wanted to inform you that I explicitly tested this some years ago, and in my experience trimming did have a significant impact on assignment accuracy.
Metabarcoding free-living marine nematodes using curated 18S and CO1 reference sequence databases for species-level taxonomic assignments, DOI: 10.1002/ece3.4814
I have only ever used trimmed training sets since then and would advise other users to do the same.
Kind regards,
Lara
Hi,
My question is about the required format the classifier needs. I created the exact formats that described in tutorials. These are my raw training files:
However, when I try to extract the ready files using the following commands (while changing the appropriate file names):
python lineage2taxTrain.py rawTaxonomy.txt > ready4train_taxonomy.txt
python addFullLineage.py rawTaxonomy.txt rawSeqs.fasta > ready4train_seqs.fasta
It runs, but the output is quite strange. I tried many tweaks and solutions and this is my output ready4train_taxonomy.txt file - I have no idea why there are spaces between every two characters, I added no spaces in the Python scripts:
This is my output ready4train_seqs.fasta sequence file:
The resulting taxonomy file is never something like the following, which is what I normally see should be the output:
Could you kindly guide me to the correct formatting if I am doing something wrong? There are ambiguous characters in the sequences - is this an issue and they should be removed?
Hi RDP team,
Thank you for this tool. I went through details of merging classification files.
I'm interested to merge classification results from an in-house training set with results obtained by default RDP tool.
Can this can be added in enhancement requests for future release? Or any suggestions on how to go ahead would really be helpful.
Thanks.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.