hildebra / lotus2 Goto Github PK

View Code? Open in Web Editor NEW

52.0 52.0 17.0 151.08 MB

Amplicon sequencing pipelines suitable for SSU (16S, 18S), LSU (23S, 28S) and ITS.

Home Page: http://lotus2.earlham.ac.uk/

License: GNU General Public License v3.0

C++ 50.07% Makefile 0.12% Perl 44.00% R 5.81%

lotus2's People

Contributors

Stargazers

Watchers

Forkers

4less nsoranzo jianshu93 wilsonjsjunior caio-andrey alexpersa7 ahmedarslan bikmi splaisan wanjinhu ahmedbajwa03 songdq nilesh-iiita chen318liang ivanv87 yemilawal ayixon

lotus2's Issues

dada2 pooling options

Dear all,

we are testing Lotus2 for some analysis, but we would like to know if it is possible to set pool=TRUE when running DADA2, or if pooling is already set by default?

Best,

Ramiro

duplicated ASV sequence

I am using header name from hashing ASV sequences to integrate ASV table from different datasets. I found that there were duplicated ASV sequences. Why?

Here is my lotus2 command:
lotus2 -i $PWD -m $PWD/1_miSeqMap.sm.txt
-s /mnt/d/Myfile/DATA/beforework/lotus2/1sdm_miSeq.txt
-o lotus2_output
-p miSeq -amplicon_type SSU -tax_group bacteria
-forwardPrimer $front_f
-reversePrimer $front_r
-CL dada2 -refDB SLV -taxAligner lambda
-rdp_thr 0.7 -buildPhylo 0 -t 6 -sdmThreads 6

The problem still happened when closing LULU option. (-lulu 0)

can't remove primer using mapping file

I discovered that Lotus2 cannot delete the primer when I specify the primer sequence using a mapping file, but it works correctly when I specify the primer sequence by command line.

errors when using dada2 clustering

Hi Lotus team.
Using data that has run perfectly on Lotus2 before (with both RDP and SLV DBs), I changed the clustering option to dada2 as I wanted to have ASVs rather than OTUs. However upon adding -CL dada2, a got a huge range of errors according to the progout log. Am i just missing something in my command (below) or is something wrong?

perl ./lotus2 -i Will -o Will/outputSLV_DADA2 -m WillMAP.txt -refDB SLV -taxAligner blast -CL dada2

many thanks
Will

Regarding the fate of reads passed through each step of Lotus2

Hi, thank you for this easy-to-use amplicon read analysis pipeline. I was wondering if there is a way to get the number of reads passed through each step. DADA2 provides read counts across samples for each step of processing through:

getN <- function(x) sum(getUniques(x))
track <- cbind(out, sapply(dadaFs, getN), sapply(mergers, getN), rowSums(seqtab), 
rowSums(seqtab.nochim))
# If processing a single sample, remove the sapply calls: e.g. replace sapply(dadaFs, getN) with getN(dadaFs)
colnames(track) <- c("input", "filtered", "denoised", "merged", "tabled", "nonchim")
rownames(track) <- sample.names
head(track)

It would be very nice if lotus2 could do that.

RDP mv file

I'm running lotus2 mostly with the default settings, but for the RDPclassifier I get an error in the following line

systemL "mv $outdir/hierachy_cnt.tax $outdir/cnadjusted_hierachy_cnt.tax $extendedLogD/;";

Lotus2 stops because cnadjusted_hierachy_cnt.tax does not exist.

When looking through the output folder the hierachy_cnt.tax file is located in a subfolder (ExtraFiles).

If I paste cnadjusted_hierachy_cnt.tax in the output folder (before lotus2 executes RDP) it works.

SDM CentOS7

SDM requires some libraries which are not in the default yum repos. Is there a workaround for the installation on CentOS?

lotus2/bin/sdm: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by lotus2/bin/sdm)
lotus2/bin/sdm: /lib64/libstdc++.so.6: version `CXXABI_1.3.9' not found (required by lotus2/bin/sdm)
lotus2/bin/sdm: /lib64/libstdc++.so.6: version `CXXABI_1.3.11' not found (required by lotus2/bin/sdm)
lotus2/bin/sdm: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.22' not found (required by lotus2/bin/sdm)
lotus2/bin/sdm: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by lotus2/bin/sdm)

Thanks in advance :)
Ulrike

using unoise as clustering

Hi,

I am processing a set of samples (miseq) and I want to compare OTUs and zOTUs. I have added usearch11 to LotuS and run two commands:
perl lotus2 -i /media/fulgencio/DATOS/species_divergence -m para_LOTUS.txt -o zotus_output-s sdm_miSeqDEF.txt -threads 24 -p miSeq -clustering unoise3 -refDB SLV --taxAligner 1 -derepMin 0 -lulu 0 -buildPhylo 0 -verbosity 3
perl lotus2 -i /media/fulgencio/DATOS/species_divergence -m para_LOTUS.txt -o otus_output -s sdm_miSeqDEF.txt -threads 24 -p miSeq -refDB SLV --taxAligner 1 -derepMin 0 -lulu 0 -buildPhylo 0 -verbosity 3

When I look to LotuS_run.log I can see in both cases that OTU id=0.97, and when running the unoise3 the message says that "UNOISE core routine Cluster at 97% "Should I pass -id parameter as 1?

Another question is when looking at demulti.log I see that most of my reverse reads are rejected, and reading that log I cannot understand why this is happening?
Reads processed: 11,632,069; 11,632,069 (pair 1;pair 2)
Rejected: 4,335,473; 10,141,773
Below I paste my smd file
Thank you very much in advance.

Manuel

#sdm options file to control sequence quality filtering, demultiplexing and preparation (can also be used without demultiplexing)
#* indicates alternative quality filtering options, saved in *.add.fna etc. files separately from initial quality filtered dataset
#sequence length refers to sequence length AFTER removal of Primers, Barcodes and trimming. this ensures that downstream analyis tools will have appropiate sequence information
#options with a star in front are lenient parameters for mid qual sequences (only used for estimating OTU abundance, not for OTU building itself).
minSeqLength 250
maxSeqLength 256
minAvgQuality 27
*minSeqLength 170
*minAvgQuality 20
#truncate total Sequence length to X (length after Barcode, Adapter and Primer removals, set to -1 to deactivate)
TruncateSequenceLength -1

#Ambiguous bases in Sequence
maxAmbiguousNT 0
*maxAmbiguousNT 1

#sequence is discarded if a homonucleotide run in sequence is longer
maxHomonucleotide 8

#Filter whole sequence if one window of quality scores is below average
QualWindowWidth 50
QualWindowThreshhold 25

#Trim the end of a sequence if a window falls below quality threshhold. Useful for removing low qulaity trailing ends of sequence
TrimWindowWidth 20
TrimWindowThreshhold 25

#Probabilistic max number of accumulated sequencing errors. After this length, the rest of the sequence will be deleted. Complimentary to TrimWindowThreshhold. (-1) deactivates this option.
maxAccumulatedError 0.75
*maxAccumulatedError -1
#Binomial error model of expected errors per sequence (see https://github.com/fpusan/moira), to deactivate, set BinErrorModelAlpha to -1
BinErrorModelMaxExpError 2.5
BinErrorModelAlpha -1

#Max Barcode Errors
maxBarcodeErrs 0
maxPrimerErrs 0

#keep Barcode / Primer Sequence in the output fasta file - in a normal 16S analysis this should be deactivated (0) for Barcode and de-activated (0) for primer
keepBarcodeSeq 0
keepPrimerSeq 0

#set fastqVersion to 1 if you use Sanger, Illumina 1.8+ or NCBI SRA files. Set fastqVersion to 2, if you use Illumina 1.3+ - 1.7+ or Solexa fastq files. "auto" will look for typical characteristics of either of these and choose the quality offset score automatically.
fastqVersion auto

#if one or more files have a technical adapter still included (e.g. TCAG 454) this can be removed by setting this option
TechnicalAdapter

#delete X NTs (e.g. if the first 5 bases are known to have strange biases)
TrimStartNTs 0

#correct PE header format (1/2) this is to accomodate the illumina miSeq paired end annotations 2="@xxx 1:0:4" insteand of 1="@XXX/1". Note that the format will be automatically detected
PEheaderPairFmt 1

#sets if sequences without match to reverse primer (ReversePrimer) will be accepted (T=reject ; F=accept all); default=F
RejectSeqWithoutRevPrim F
#*RejectSeqWithoutRevPrim F
#sets if sequences without a forward (LinkerPrimerSequence) primer will be accepted (T=reject ; F=accept all); default=F
RejectSeqWithoutFwdPrim F
#*RejectSeqWithoutFwdPrim F

#this option should be "T" if your amplicons are possibly shorter than a single read in a paired end sequencing run (e.g. if the 16S amplicon length is 200bp in a 250x2 miSeq run, set this to "T"). This option increases runtime by 10%, if in doubt just set to "T". Requires LinkerPrimerSequence and ReversePrimer to be defined in mapping file.
AmpliconShortPE T

#options for difficulties during sequencing library construction
#checks if pair1 and pair2 were switched (ignore if single read data)
CheckForMixedPairs F
#checks if whole amplicon was reverse-transcribed sequenced (not switched, just reverse translated)
CheckForReversedSeqs F

pigz db build

First of all, happy new year!
I reinstalled lotus2 after trying to use the update function in lotus, which was first not successful and then deleted the autoInstall.pl during that process to work with KSGP.
However, after reinstalling, I get the following error running lotus which seems to be related to pigz. Anny suggestions how to handle this?
pigz: abort: missing parameter after -p

sh: line 1: 108829 Aborted                lotus2/bin/lambda3 searchn -t 40 --percent-identity 75 --num-matches 200 --e-value 1e-8 -q Lotus2_KSGP_Alex120124/OTU.fa -i lotus2//DB//KSGP_v1.0.fasta.lba.gz -o Lotus2_KSGP/tmpFiles//tax.m8 --output-columns 'qseqid sseqid pident length mismatch gapopen qstart qend sstart send qlen' >> Lotus2_KSGP/LotuSLogS/LotuS_progout.log 2>&1
 CMD failed: lotus2//bin//lambda3 searchn -t 40 --percent-identity 75 --num-matches 200 --e-value 1e-8 -q Lotus2_KSGP/OTU.fa -i lotus2//DB//KSGP_v1.0.fasta.lba.gz -o Lotus2_KSGP/tmpFiles//tax.m8 --output-columns 'qseqid sseqid pident length mismatch gapopen qstart qend sstart send qlen'

same for SLV:

Building LAMBDA index anew forlotus2//DB//SLV_138.1_SSU.fasta (this only happens the first time you use this ref DB, it may take several hours to build)..

 CMD failed: 
pigz -p -1  lotus2//DB//SLV_138.1_SSU.fasta.lba 
see Lotus2_SLV_Alex120124/LotuSLogS/LotuS_progout.log for error log
(base) [uloeber]$ tail Lotus2_SLV_Alex120124/LotuSLogS/LotuS_progout.log
[M::mm_idx_stat] kmer size: 21; skip: 11; is_hpc: 0; #seq: 1
[M::mm_idx_stat::0.010*4.92] distinct minimizers: 900 (100.00% are singletons); average occurrences: 1.000; average spacing: 5.984
[M::worker_pipeline::0.016*4.51] mapped 526 sequences
[M::main] Version: 2.17-r941
[M::main] CMD:lotus2//bin//minimap2-2.17_x64-linux/minimap2 -x sr --sr -u both --secondary=no -N 30 -c -t 40 -o Lotus2_SLV_Alex120124/tmpFiles//otu_seeds.fna.phiX.0.cont_hit.paf lotus2//DB//phiX.fasta Lotus2_SLV_Alex120124/tmpFiles//otu_seeds.fna
[M::main] Real time: 0.017 sec; CPU: 0.074 sec; Peak RSS: 0.003 GB
Loading Subject Sequences and Ids... done.
Generating Index... done.
Writing Index to disk... done.
pigz: abort: missing parameter after -p

Thanks in advance! Cheers,
Ulrike

V-Xtractor HMM database

Question: Is lOTUs using the HMM database shipped in http://software.microbiome.ch/vxtractor.zip (linked from https://www.microbiome.ch/software ) ? It seems the main ones in http://lotus2.earlham.ac.uk/lotus/packs/VXtractor/HMMs.zip are neither the LSU or SSU ones.

I am considering what would be best for the lOTUs conda package, do we need a conda package for V-Xtractor ? Just the Perl script or also the HMM database(s)?

Problem with example trial

I have tried to run the example and got errors everytime. I managed to troubleshoot all of those related to DADA2 ASV clustering but now I get an error with building LAMDA index:

'Building LAMBDA index anew (may take up to an hour first time)..
CMD failed: /home/assem/Downloads/lotus2/lotus2//bin//lambda/lambda_indexer -p blastn -t 1 -d /home/assem/Downloads/lotus2/lotus2//DB//SLV_138.1_SSU.fasta
see myTestRun2/LotuSLogS/LotuS_progout.log for error log'

Attached is the progout.log and that's its last part:

'Writing OTU matrix to myTestRun2/OTU.txt
Recruited 214 reads in OTU matrix
Done
Time taken: : 3ms
[M::mm_idx_gen::0.0021.02] collected minimizers
[M::mm_idx_gen::0.0031.02] sorted minimizers
[M::main::0.0031.02] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.0031.02] mid_occ = 1000
[M::mm_idx_stat] kmer size: 21; skip: 11; is_hpc: 0; #seq: 1
[M::mm_idx_stat::0.0031.02] distinct minimizers: 900 (100.00% are singletons); average occurrences: 1.000; average spacing: 5.984
[M::worker_pipeline::0.0031.02] mapped 15 sequences
[M::main] Version: 2.17-r941
[M::main] CMD: /home/assem/Downloads/lotus2/lotus2//bin//minimap2-2.17_x64-linux/minimap2 -x sr --sr -u both --secondary=no -N 30 -c -t 1 -o myTestRun2/tmpFiles//otu_seeds.fna.phiX.0.cont_hit.paf /home/assem/Downloads/lotus2/lotus2//DB//phiX.fasta myTestRun2/tmpFiles//otu_seeds.fna
[M::main] Real time: 0.004 sec; CPU: 0.004 sec; Peak RSS: 0.003 GB
Loading Subject Sequences and Ids... done.
Dumping Subj Ids... done.
No Seg-File specified, no masking will take place.
Dumping binary seqan mask file... done.

Dumping unreduced Subj Sequences... done.
Generating 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
(1) SuffixArray |Killed'

LotuS_progout.log

triming process and saving trimmed reads

Dear all,

We are trying to se Lotus2, with dada2, to analyse some Illumina 16S data. We have two questions:

When using dada2 directly, one would typically have to define the size of the reads for trimming, for example triming 250bp reads to 230bp, which would be applied to all fwd/rev reads. With Lotus2, we are setting TrimWindowWidth=20 and TrimWindowThreshold=25. My question is whether this is if the same length is then applied to all reads, as with dada2, or if we have different lengths depending on the specific quality of that read?
If we want to save the quality trimmed reads, would we do it through the "-saveDemultiplex" option? (asking as the documentation mentions the quality filtering but not trimming)

Thanks for any help,

Ramiro

iqtree and mafft paths in lOTUs.cfg

are not correctly set by autoInstall.pl (they are left to paths starting with /hpc-home/hildebra/dev/lotus//bin/ ).

False positive of mapping ASV back to reads?

Lotus2 is a great tool!
I am recently curious that the table generated by Lotus2 shows more shared ASV between samples than other tools like QIIME2. Do you notice or have this phenomenon?
Are they false positve or real shared between samples? If so, changing minimap2 parameters may alleviate this problem. I guess the threshold needed to be tuned for different Lotus2 setting (maping reads for ASV and 97% OTU intuitively requried different parameters)

analysing only a part of the amplicons

Sorry for the redundance, this is a shorter report related to #31
I can analyse the amplicons V1-V9 using lotus 2 on the same reads but when I try to get lotus2 look only to the central part (V3-V4) it fails

I was hoping that specifying primer sequences for V3 and V4 (rev-comp) would take care of trimming the first 300bps (until forward primer in V3) and long back tail (after V4 primer) and analyze the resulting 444bps fragment from the 1500bps original amplicon.

The sequences all end up in the (Mid qual) class while for the full amplicon run they were all in (High qual)

Is it possible that the triming of such long ends is not possible using lotus the way I tried?

If not possible, this will also answer ticket #31, and I need to trim my reads externally before running lotus2

Best regards,
Stephane

$ mapping_file_V3V4.tsv
#SampleID       fastqFile       ForwardPrimer   ReversePrimer
4170_bc1005--bc1096     4170_bc1005--bc1096.fastq.gz    CCTACGGGNGGCWGCAG       GGATTAGATACCCBDGTAGTC
4356_bc1005--bc1112     4356_bc1005--bc1112.fastq.gz    CCTACGGGNGGCWGCAG       GGATTAGATACCCBDGTAGTC
4285_bc1022--bc1107     4285_bc1022--bc1107.fastq.gz    CCTACGGGNGGCWGCAG       GGATTAGATACCCBDGTAGTC   
4296_bc1022--bc1060     4296_bc1022--bc1060.fastq.gz    CCTACGGGNGGCWGCAG       GGATTAGATACCCBDGTAGTC   
4356_bc1012--bc1098     4356_bc1012--bc1098.fastq.gz    CCTACGGGNGGCWGCAG       GGATTAGATACCCBDGTAGTC   
4112_bc1008--bc1075     4112_bc1008--bc1075.fastq.gz    CCTACGGGNGGCWGCAG       GGATTAGATACCCBDGTAGTC   
4128_bc1005--bc1107     4128_bc1005--bc1107.fastq.gz    CCTACGGGNGGCWGCAG       GGATTAGATACCCBDGTAGTC 


Using Silva SSU ref seq database.
--------------------------------------------------------------------------------
 00:00:00 LotuS 2.23
          COMMAND
          perl /opt/miniconda3/envs/lotus2.23/bin/lotus2 -i /data/analyses/Zymo-SequelIIe-Hifi-V3V4/reads
          -m mapping_file_V3V4.tsv -o lotus2_pacbio_V3V4 -tmp /data/analyses/Zymo-SequelIIe-Hifi-V3V4/tmp
          -s sdm_PacBio_LSSU_V3V4.txt -p PacBio -t 80 -amplicon_type
          SSU -CL cdhit -refDB SLV -taxAligner lambda -useVsearch 1
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
 00:00:00 Reading mapping file
          Sequence files are indicated in mapping file.
--------------------------------------------------------------------------------
------------ I/O configuration --------------
Input       /data/analyses/Zymo-SequelIIe-Hifi-V3V4/reads
Output      lotus2_pacbio_V3V4
SDM options sdm_PacBio_LSSU_V3V4.txt
TempDir     /data/analyses/Zymo-SequelIIe-Hifi-V3V4/tmp
------------ Configuration LotuS --------------
de novo sequence clustering with CD-HIT into OTU's
Sequencing platform     pacbio
Amplicon target         bacteria, SSU
Dereplication filter    0
Clustering algorithm    CD-HIT into OTU's
Read mapping (non tax)  minimap2
OTU nt id      0.97
Precluster read merging No
Ref Chimera checking    Yes (DB=/opt/miniconda3/envs/lotus2.23/share/lotus2-2.23-0//DB//rdp_gold.fa, -chim_skew 2)
deNovo Chimera check    Yes
Tax assignment          Lambda (-LCA_frac 0.8, -LCA_cover 0.5, -LCA_idthresh 97,95,93,91,88,78,0)
ReferenceDatabase       SILVA
RefDB location          /opt/miniconda3/envs/lotus2.23/share/lotus2-2.23-0//DB//SLV_138.1_SSU.fasta
OTU phylogeny           Yes (mafft, fasttree2)
Unclassified OTU's      Kept in matrix
--------------------------------------------
--------------------------------------------------------------------------------
 00:00:00 Demultiplexing, filtering, dereplicating input files, this
          might take some time..
          check progress at lotus2_pacbio_V3V4/LotuSLogS/LotuS_progout.log
 00:00:12 Finished primary read processing with sdm:
          Reads processed: 255,918
          Accepted (High qual): 0 (4,953 end-trimmed)
          Accepted (Mid qual): 252,175
          Rejected: 3,743
          Dereplication block 0: 0 unique sequences (avg size -nan; 0 counts)
          For an extensive report see lotus2_pacbio_V3V4/LotuSLogS//demulti.log
--------------------------------------------------------------------------------
The sdm dereplicated output file was either empty or not existing, aborting lotus.
/data/analyses/Zymo-SequelIIe-Hifi-V3V4/tmp/derep.fas

%@#%@#%@#%@%@#@%#@%#@#%@#%@#%@#@%#@%#@%#@#%@#%@#%@##
      LotuS2 encounterend an error:
The sdm dereplicated output file was either empty or not existing, aborting lotus.
/data/analyses/Zymo-SequelIIe-Hifi-V3V4/tmp/derep.fas


First check if the last error occurred  in a program called by LotuS2 
"tail lotus2_pacbio_V3V4/LotuSLogS/LotuS_progout.log"
, if there is an obvious solution (e.g. external program breaking, this we can't fix). To see (and execute) the last commands by the pipeline, run 
"tail lotus2_pacbio_V3V4/LotuSLogS/LotuS_cmds.log".
In case you decide to contact us on "https://github.com/hildebra/lotus2/", please try to include information from these approaches in your message, this will increase our response time. Thank you.
%@#%@#%@#%@%@#@%#@%#@#%@#%@#%@#@%#@%#@%#@#%@#%@#%@##

stops at an error -- can't open /tmpFiles//RDPotus.tax

Dr Falk Hildebrand,
Thank you for Lotus2 - as a great addition to lotus.

I have a question -- after installation (I used docker a "penanalytics/r-base" as Ubuntu basis for lotus2 installation) and run example an example via

    "perl ./lotus2.pl -i Example/ -m Example/miSeqMap.sm.txt -s configs/sdm_miSeq.txt -p miSeq -o /mydir/myTestRun"

the script does not perform analysis and stops at an error --

    "Can't open /mydir/myTestRun/tmpFiles//RDPotus.tax :
    No such file or directory at ./lotus2.pl line 4078."

I tried reinstalling several times, and also used the desktop UBUNTU version image. But the error occurred every time.

Could you help with this issue?
Sincerely, Dmitry

run log:

00:00:00 LotuS 2.00
COMMAND
perl ./lotus2.pl -i Example/ -m Example/miSeqMap.sm.txt
-s configs/sdm_miSeq.txt -p miSeq -o /mydir/myTestRun

00:00:00 Reading mapping file
Sequence files are indicated in mapping file.
Switching to paired end read mode

de novo sequence clustering with UPARSE into OTU's
------------ I/O configuration --------------
Input= Example/
Output= /mydir/myTestRun
TempDir= /mydir/myTestRun/tmpFiles/
------------ Configuration LotuS --------------
Sequencing platform=miseq
AmpliconType=SSU
OTU id=0.97
min unique read abundance=-1
UCHIME_REFDB, ABSKEW=/lotus2//DB//rdp_gold.fa, 2
OTU, Chimera prefix=OTU, CHIMERA_
TaxonomicGroup=bacteria
keeping taxonomic unclassified OTU's in matrix

00:00:00 Demultiplexing, filtering, dereplicating input files, this
might take some time..
check progress at /mydir/myTestRun/LotuSLogS/LotuS_progout.log
00:00:01 Finished primary read processing with sdm:
Reads processed: 1,250; 1,250 (pair 1;pair 2)
Rejected: 932; 932
Accepted (Mid+High qual): 318; 318 (110; 110 were end-trimmed)
For an extensive report see /mydir/myTestRun/LotuSLogS//demulti.log

00:00:01 UPARSE core routine
Cluster at 97
00:00:01 Finished

00:00:01 Extending and merging pairs of OTU Seeds

sdms /mydir/myTestRun/tmpFiles//otu_seeds.1.singl.fq

00:00:01 Found 24 fasta seed sequences based on seed extension and read merging

00:00:05 Ref chimera filter using usearch uchime2_ref
Total removed OTUs: (0/24)

00:00:05 Found 0 OTU's using minimap2 (phiX0: /lotus2//DB//phiX.fasta)

00:00:05 Postfilter:
Extended logs active, contaminant and chimeric matrix will be created.
After filtering 24 OTU (289 reads) remaining in matrix.

Can't open /mydir/myTestRun/tmpFiles//RDPotus.tax :
No such file or directory at ./lotus2.pl line 4078.

Analysis for protist

Hi,
I tried to use lotus2 for my data (18s sequncese targeting protists，not bacteria and fungi) and I noticed that there are two options for the -tax_group .

My question is , what is the difference between the "bacteria|fungi" for tax_group ? Does -tax_group only influence annotation or also effect the upstream asv-generating steps?The results (asv/otu numbers and annotation results) using different tax_group settings seem different.

UNITE 8 vs 9

Hi,

I see you use UNITE v8 as a database for ITS/fungi. When comparing results with the latest version of the unite database, v9, taxonomic identification results are very different for many OTUs, and less OTUs get an identification at species or genus level. Is that the reason you work with UNITE v8 in Lotus2?

Best,
Sam

Dada2 - installation

I wonder how I can Install dada2 to use it when I am using Lotus2, I tried to install this via autoInstall.pl but It did not work. Thanks!

Error Assigning taxonomy with SILVA

I am getting below error while using SILVA as database. However earlier it worked well with custom database.

CMD failed: /home/rjain/miniconda3/envs/lotus2/bin/vsearch --usearch_global /zfs/omics/personal/rjain/AMF/output_SLV_lotus2//OTU.fna --db /home/rjain/miniconda3/envs/lotus2/share/lotus2-2.25-0//DB//SLV_138.1_SSU.fasta.vudb --id 0.75 --query_cov 0.5 -userfields query+target+id+alnlen+mism+opens+qlo+qhi+tlo+thi+ql -userout /zfs/omics/personal/rjain/AMF/output_SLV_lotus2//tmpFiles//tax.0.blast --maxaccepts 100 --maxrejects 100 -strand both --threads 24

these are the last few line of Lotus_progout.log

[M::main] CMD: /home/rjain/miniconda3/envs/lotus2/bin/minimap2 -x sr --sr -u both --secondary=no -N 30 -c -t 24 -o /zfs/omics/personal/rjain/AMF/output_SLV_lotus2//tmpFiles//otu_seeds.fna.phiX.0.cont_hit.paf /home/rjain/miniconda3/envs/lotus2/share/lotus2-2.25-0//DB//phiX.fasta /zfs/omics/personal/rjain/AMF/output_SLV_lotus2//tmpFiles//otu_seeds.fna
[M::main] Real time: 0.072 sec; CPU: 0.043 sec; Peak RSS: 0.003 GB
vsearch v2.23.0_linux_x86_64, 251.8GB RAM, 64 cores
https://github.com/torognes/vsearch

Reading UDB file /home/rjain/miniconda3/envs/lotus2/share/lotus2-2.25-0//DB//SLV_138.1_SSU.fasta.vudb

Fatal error: Unable to read from UDB file or invalid UDB file

Any suggestions are much appreciated. Thanks in advance!

error when using DADA2

Hi, i have been using Lotus2 with no issue for some time now clustering via OTUs (standard 16s microbiome data). I have tried to change from UPARSE to DADA2. i have followed the suggestions on the lotus website but it doesn't seem to run. Is there a clear error in my input? (code below)

./lotus2 -i Will -o output_SLV_DADA2_2024 -m WillMAP.txt -refDB SLV -CL dada2 -taxAligner lambda

all help is appreciated
-Will

Fail to install with conda

I created a fresh conda env for lotus2, but attempting installation yield the error

~# conda install -c bioconda lotus2                                                      
Collecting package metadata (current_repodata.json): done                                                               
Solving environment: failed with initial frozen solve. Retrying with flexible solve.                                    
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.             
Collecting package metadata (repodata.json): done                                                                       
Solving environment: failed with initial frozen solve. Retrying with flexible solve.                                    
Solving environment: |                                                                                                 
Found conflicts! Looking for incompatible packages.                                                                    
This can take several minutes.  Press CTRL-C to abort.                                                                failed                                                                                                                                                                                                           UnsatisfiableError:

Installing through git and perl works ok, though

Which pipeline results should I trust ? LotuS2 - DADA2 or stand alone DADA2 output?

Hi @hildebra,

Thank you and your team for developing the LotuS2 pipeline. Recently, I came across this pipeline as it has many options for clustering and denoising algorithms. I am interested in DADA2 denoising algorithm. I read the manuscript and thought of an interesting conceptual question. It is about the midQual reads after the quality filtering will be used for 'Backmapping onto ASVs'. My question is, doesn't it inflate the abundances of the ASVs which might be PCR or sequencing errors ? Further, this will affect the downstream microbial ecological metrics. Two of our collaborators used DADA2 denoising on the same data but with different pipelines, one with stand alone DADA2 and other group with LotuS2. The results are different no. of sequences, ASVs and finally different diversity estimates. Some of us are confused, which pipeline to go for and actually which one is correct ? I would appreciate your reply and suggestions. Thank you!

Best reagards,
BalaVeera

Issue running Lotus on PacBio HiFi data

Dear,

I am trying to simulate what analysing a shorter amplicon (V3V4 444bps) from Sequel IIe HiFi reads would give.
All files are replicates of the Zymo mock community V1V9 full length amplicon.
For that I produced a new file from the provided (and working) sdm_PacBio_LSSU.txt

sdm_PacBio_LSSU_V3V4.txt

In both cases I use the V1V9 HiFi reads as input which include the V3V4 region as well, is this a problem here?

For the complete V1V9 I used the following mapping file

#SampleID	fastqFile	ForwardPrimer	ReversePrimer
4170_bc1005--bc1096	4170_bc1005--bc1096.fastq.gz	AGRGTTYGATYMTGGCTCAG	RGYTACCTTGTTACGACTT
4356_bc1005--bc1112	4356_bc1005--bc1112.fastq.gz	AGRGTTYGATYMTGGCTCAG	RGYTACCTTGTTACGACTT
4285_bc1022--bc1107	4285_bc1022--bc1107.fastq.gz	AGRGTTYGATYMTGGCTCAG	RGYTACCTTGTTACGACTT
4296_bc1022--bc1060	4296_bc1022--bc1060.fastq.gz	AGRGTTYGATYMTGGCTCAG	RGYTACCTTGTTACGACTT
4356_bc1012--bc1098	4356_bc1012--bc1098.fastq.gz	AGRGTTYGATYMTGGCTCAG	RGYTACCTTGTTACGACTT
4112_bc1008--bc1075	4112_bc1008--bc1075.fastq.gz	AGRGTTYGATYMTGGCTCAG	RGYTACCTTGTTACGACTT
4128_bc1005--bc1107	4128_bc1005--bc1107.fastq.gz	AGRGTTYGATYMTGGCTCAG	RGYTACCTTGTTACGACTT

For the V3V4 analysis I adapted it to

#SampleID	fastqFile	ForwardPrimer	ReversePrimer
4170_bc1005--bc1096	4170_bc1005--bc1096.fastq.gz	CCTACGGGNGGCWGCAG	GGATTAGATACCCBDGTAGTC
4356_bc1005--bc1112	4356_bc1005--bc1112.fastq.gz	CCTACGGGNGGCWGCAG	GGATTAGATACCCBDGTAGTC
4285_bc1022--bc1107	4285_bc1022--bc1107.fastq.gz	CCTACGGGNGGCWGCAG	GGATTAGATACCCBDGTAGTC	
4296_bc1022--bc1060	4296_bc1022--bc1060.fastq.gz	CCTACGGGNGGCWGCAG	GGATTAGATACCCBDGTAGTC	
4356_bc1012--bc1098	4356_bc1012--bc1098.fastq.gz	CCTACGGGNGGCWGCAG	GGATTAGATACCCBDGTAGTC	
4112_bc1008--bc1075	4112_bc1008--bc1075.fastq.gz	CCTACGGGNGGCWGCAG	GGATTAGATACCCBDGTAGTC	
4128_bc1005--bc1107	4128_bc1005--bc1107.fastq.gz	CCTACGGGNGGCWGCAG	GGATTAGATACCCBDGTAGTC

In both cases, the reverse primer is reverse-complemented to match the read sequence in direct since we work here with single end reads (correct right?)

The V1V9 run succeeds

stdout

lotus2 -i /data/analyses/Zymo-SequelIIe-Hifi-V3V4/reads -o runV1V9 -s sdm_PacBio_LSSU_V1V9.txt -t 40 -m mapping_file_V1V9.tsv 
RefDB SLV requested, but -taxAligner set to "0": therefore RDP classification of reads will be done
--------------------------------------------------------------------------------
00:00:00 LotuS 2.22
        COMMAND
        perl /opt/miniconda3/envs/lotus2/bin/lotus2 -i /data/analyses/Zymo-SequelIIe-Hifi-V3V4/reads
        -o runV1V9 -s sdm_PacBio_LSSU_V1V9.txt -t 40 -m mapping_file_V1V9.tsv
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
00:00:00 Reading mapping file
        Sequence files are indicated in mapping file.
--------------------------------------------------------------------------------
------------ I/O configuration --------------
Input       /data/analyses/Zymo-SequelIIe-Hifi-V3V4/reads
Output      runV1V9
SDM options sdm_PacBio_LSSU_V1V9.txt
TempDir     runV1V9/tmpFiles/
------------ Configuration LotuS --------------
de novo sequence clustering with UPARSE into OTU's
Sequencing platform     miseq
Amplicon target         bacteria, SSU
Dereplication filter    8:1,4:2,3:3
Clustering algorithm    UPARSE into OTU's
Read mapping (non tax)  minimap2
Precluster read merging No
Ref Chimera checking    Yes (DB=/opt/miniconda3/envs/lotus2/share/lotus2-2.22-0//DB//rdp_gold.fa, -chim_skew 2)
deNovo Chimera check    Yes
Tax assignment          RDPclassifier (-rdp_thr  0.8)
OTU phylogeny           Yes (mafft, fasttree2)
Unclassified OTU's      Kept in matrix
--------------------------------------------
--------------------------------------------------------------------------------
00:00:00 Demultiplexing, filtering, dereplicating input files, this
        might take some time..
        check progress at runV1V9/LotuSLogS/LotuS_progout.log
00:00:14 Finished primary read processing with sdm:
        Reads processed: 255,918
        Accepted (High qual): 191,016 (5,449 end-trimmed)
        Accepted (Mid qual): 61,667
        Rejected: 3,235
        Dereplication block 0: 2,386 unique sequences (avg size
        43; 101,408 counts)
        For an extensive report see runV1V9/LotuSLogS//demulti.log
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
00:00:14 UPARSE OTU clustering
        Cluster at 97
00:00:14 Finished
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
00:00:14 Starting backmapping of 
          - low-abundant dereplicated Reads
          - mid-quality reads
        to OTU's using minimap2
00:00:55 Backmapping  mid qual reads: 
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
00:00:55 Extending and merging pairs of OTU Seeds
--------------------------------------------------------------------------------
No ref based chimera detection
--------------------------------------------------------------------------------
00:00:56 Found 0 OTU's using minimap2 (phiX.0: /opt/miniconda3/envs/lotus2/share/lotus2-2.22-0//DB//phiX.fasta)
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
00:00:56 Postfilter:
        Extended logs active, contaminant and chimeric matrix will be created.
        After filtering 8 OTU's (229130 reads) remaining in matrix.
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
00:00:56 Assigning taxonomy with RDP
--------------------------------------------------------------------------------
Removed 0 tax annotations ("")
--------------------------------------------------------------------------------
00:01:00 Calculating Taxonomic Abundance Tables from RDP 
        classifier assignments, Confidence 0.8 
--------------------------------------------------------------------------------

Calculating higher abundance levels
Adding 0 unclassified OTU's to output matrices
Total reads in matrix: 229130
TaxLvl  %Assigned_Reads %Assigned_OTUs
Phylum  100     100
Class   100     100
Order   100     100
Family  100     100
Genus   91      87
Species 0       0

--------------------------------------------------------------------------------
00:01:00 Building tree (fasttree) and aligning (mafft) OTUs
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
00:01:06 LotuS2 finished. Output in:
        runV1V9
                  Next steps:          
        - Phyloseq: load runV1V9/phyloseq.Rdata directly with the
        phyloseq package in R
        - Phylogeny: OTU phylogentic tree available in runV1V9/OTUphylo.nwk
        - .biom: runV1V9/OTU.biom contains biom formatted output
        - Alpha diversity/rarefaction curves: rtk (available as
        R package or in bin/rtk)
        - LotuSLogS/ contains run statistics (useful for describing
        data/amount of reads/quality and citations to programs used
        - Tutorial: Visit http://lotus2.earlham.ac.uk for a numerical
        ecology tutorial
--------------------------------------------------------------------------------

But the V3V4 run fails
Could you please help me fix this
Thanks in advance
Stephane

lotus2 -i /data/analyses/Zymo-SequelIIe-Hifi-V3V4/reads -o runV3V4 -s sdm_PacBio_LSSU_V3V4.txt -t 40 -m mapping_file_V3V4.tsv 
RefDB SLV requested, but -taxAligner set to "0": therefore RDP classification of reads will be done
--------------------------------------------------------------------------------
 00:00:00 LotuS 2.22
          COMMAND
          perl /opt/miniconda3/envs/lotus2/bin/lotus2 -i /data/analyses/Zymo-SequelIIe-Hifi-V3V4/reads
          -o runV3V4 -s sdm_PacBio_LSSU_V3V4.txt -t 40 -m mapping_file_V3V4.tsv
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
 00:00:00 Reading mapping file
          Sequence files are indicated in mapping file.
--------------------------------------------------------------------------------
------------ I/O configuration --------------
Input       /data/analyses/Zymo-SequelIIe-Hifi-V3V4/reads
Output      runV3V4
SDM options sdm_PacBio_LSSU_V3V4.txt
TempDir     runV3V4/tmpFiles/
------------ Configuration LotuS --------------
de novo sequence clustering with UPARSE into OTU's
Sequencing platform     miseq
Amplicon target         bacteria, SSU
Dereplication filter    8:1,4:2,3:3
Clustering algorithm    UPARSE into OTU's
Read mapping (non tax)  minimap2
Precluster read merging No
Ref Chimera checking    Yes (DB=/opt/miniconda3/envs/lotus2/share/lotus2-2.22-0//DB//rdp_gold.fa, -chim_skew 2)
deNovo Chimera check    Yes
Tax assignment          RDPclassifier (-rdp_thr  0.8)
OTU phylogeny           Yes (mafft, fasttree2)
Unclassified OTU's      Kept in matrix
--------------------------------------------
--------------------------------------------------------------------------------
 00:00:00 Demultiplexing, filtering, dereplicating input files, this
          might take some time..
          check progress at runV3V4/LotuSLogS/LotuS_progout.log
 00:00:12 Finished primary read processing with sdm:
          Reads processed: 255,918
          Accepted (High qual): 0 (4,953 end-trimmed)
          Accepted (Mid qual): 252,175
          Rejected: 3,743
          Dereplication block 0: 0 unique sequences (avg size -nan; 0 counts)
          For an extensive report see runV3V4/LotuSLogS//demulti.log
--------------------------------------------------------------------------------
The sdm dereplicated output file was either empty or not existing, aborting lotus.
runV3V4/tmpFiles//derep.fas

%@#%@#%@#%@%@#@%#@%#@#%@#%@#%@#@%#@%#@%#@#%@#%@#%@##
      LotuS2 encounterend an error:
The sdm dereplicated output file was either empty or not existing, aborting lotus.
runV3V4/tmpFiles//derep.fas


First check if the last error occurred  in a program called by LotuS2 
"tail runV3V4/LotuSLogS/LotuS_progout.log"
, if there is an obvious solution (e.g. external program breaking, this we can't fix). To see (and execute) the last commands by the pipeline, run 
"tail runV3V4/LotuSLogS/LotuS_cmds.log".
In case you decide to contact us on "https://github.com/hildebra/lotus2/", please try to include information from these approaches in your message, this will increase our response time. Thank you.
%@#%@#%@#%@%@#@%#@%#@#%@#%@#%@#@%#@%#@%#@#%@#%@#%@##

tail runV3V4/LotuSLogS/LotuS_cmds.log
[cmd] rm -f -r runV3V4/tmpFiles/
[cmd] mkdir -p runV3V4/tmpFiles/
[cmd] cp sdm_PacBio_LSSU_V3V4.txt runV3V4/primary
[cmd] /opt/miniconda3/envs/lotus2/bin/sdm  -i_path /data/analyses/Zymo-SequelIIe-Hifi-V3V4/reads  -o_fna runV3V4/tmpFiles//demulti.fna  -o_fna2 runV3V4/tmpFiles//demulti.add.fna  -sample_sep ___  -log runV3V4/LotuSLogS//demulti.log -map runV3V4/primary/in.map   -options sdm_PacBio_LSSU_V3V4.txt    -o_dereplicate runV3V4/tmpFiles//derep.fas -dere_size_fmt 0 -min_derep_copies 8:1,4:2,3:3 -suppressOutput 1   -o_qual_offset 33 -paired 1  -oneLineFastaFormat 1   -threads 6 
[cmd] mkdir -p runV3V4/LotuSLogS//SDMperFile/
[cmd]  mv runV3V4/LotuSLogS//demulti.log0* runV3V4/LotuSLogS//SDMperFile/
[cmd] rm -f runV3V4/tmpFiles//finalOTU.uc runV3V4/tmpFiles//finalOTU.ADD.paf runV3V4/tmpFiles//finalOTU.ADDREF.paf runV3V4/tmpFiles//finalOTU.REST.paf runV3V4/tmpFiles//finalOTU.RESTREF.paf

cat runV3V4/LotuSLogS//demulti.log
sdm 2.05 beta
Input File:  several
Output File: runV3V4/tmpFiles//demulti.fna

Reads processed: 255,918
13 reads reverse-translated
Rejected: 3,743
Accepted (High qual): 0 (4,953 end-trimmed)
Accepted (Mid qual): 252,175
Bad Reads recovered with dereplication: 0
Short amplicon mode.
Min/Avg/Max stats Pair 1
     - sequence Length : 0/-nan/0
     - Quality :   0/-nan/0
     - Median sequence Length : 0, Quality : 0
     - Accum. Error -nan
Trimmed due to:
  > 25 avg qual_ in 20 bp windows :         0
Rejected due to:
  < min Sequence length (250)  :                0
  < avg Quality (27)  :                    3,737
  < window (50 nt) avg. Quality (25)  :    3,275
  > max Sequence length (550)  :                0
  > (16) homo-nt run  :                    6
  > (2) amb. Bases  :                      0
Specific sequence searches:
  -With fwd Primer remaining (<= 0 mismatches, required) : 0
  -With rev Primer remaining (<= 0 mismatches) :           0
  -Barcode unidentified (max 0 errors) :                   0

SampleID        Barcode Instances
4112_bc1008--bc1075             0
4128_bc1005--bc1107             0
4170_bc1005--bc1096             0
4285_bc1022--bc1107             0
4296_bc1022--bc1060             0
4356_bc1005--bc1112             0
4356_bc1012--bc1098             0

long read processing fails

Dear lotus2 team,
after you uploaded v 2.07 with the long read fix I guess, I tried to process some pacbio data using cd-hit for clustering

perl ~/lotus2/lotus2 -i ./ -m lotus2_cleaned.map -s ~/lotus2/configs/sdm_PacBio.txt -o lotus2_SLV138/ -threads 30 -refDB SLV -CL

and unfortunately get the following error:

   685605  finished     108831  clusters

Apprixmated maximum memory consumption: 1896M
writing new database
writing clustering information
program completed !

Total CPU time 40411.72

This is sdm (simple demultiplexer) 1.85 beta.

sdm run in No Map Mode.
Could not open uc file
lotus2_SLV138//tmpFiles//finalOTU.uc

Do you have any suggestions what might have caused this issue?
Best,
Ulrike

p.s. minor typos in the log: Apprixmated; remove space after completed :)

OTU table is not generated while test run

I have installed lotus2 with bioconda and cloned the github repo. When I try to run the example with the following command
lotus2 -i Example/ -m Example/miSeqMap.sm.txt -o myTestRun
It doesn't generate any OTU table in the output directory rather I get several warnings and errors.

minimap2 not installed by autoInstall.pl

but is now required to run lotus2.pl

option for hashed OTU ids

Providing option for hashed OTU ids may help integrate ASV/OTU tables from different studies or datasets.

dada2 Error: Can't find files for block a expected files such as

I am trying lotus on a set of simulated reads. For this, I have a fasta with my sequences (fragment of 16S, depending on the primer pair), that I use to simulate reads:

for file in *fa; do art_illumina -amp -p -l 250 -f 500 -ss MSv3 -i $file -o $(basename $file .fa). -m 300 -s 10; done && rm *aln && pigz *fq

My mapping file looks like this:

#SampleID	fastqFile	ForwardPrimer	ReversePrimer
314F-806R	314F-806R.1.fq.gz,314F-806R.2.fq.gz	CCTAYGGGRBGCASCAG	GGACTACNNGGGTATCTAAT
515F-806R	515F-806R.1.fq.gz,515F-806R.2.fq.gz	GTGCCAGCMGCCGCGGTAA	GGACTACNNGGGTATCTAAT
515F-907R	515F-907R.1.fq.gz,515F-907R.2.fq.gz	GTGCCAGCMGCCGCGGTAA	CCGTCAATTCCTTTGAGTTT
799F-1193R	799F-1193R.1.fq.gz,799F-1193R.2.fq.gz	AACMGGATTAGATACCCKG	ACGTCATCCCCACCTTCC

And I am running lotus as

./lotus2/lotus2 -i . -m map.txt -o mytest -amplicon_type SSU -CL dada2 -refDB SLV

However, I am getting the following error in the dada2 step;

SampleID	Barcode	Instances
314F-806R		2
515F-806R		17
515F-907R		21
799F-1193R		17
Time taken: : 502ms
Error: Can't find files for block a expected files such as


Aborting dada2 run
Execution halted

Running it without dada2 works fine:

./lotus2/lotus2 -i . -m map.txt -o mytest
./lotus2/lotus2 -i . -m map.txt -o mytest2 -amplicon_type SSU -refDB SLV

BOLD DB with COI references

Would be amazing if you could add a COI Reference DB in future updates.
Is there a workaround for now or more detailed information on how I can use a custom DB?
E.g. a tool to create the tax4refDB file?

Concatenation error generates empty mapping files

When I run the following to create a map:

/home/miniconda3/envs/lotus2/share/lotus2-2.21-0/lotus2 -create_map 01_map

An empty mapping file is created accompanied by the following message:

Use of uninitialized value $first in concatenation (.) or string at /home/miniconda3/envs/lotus2/share/lotus2-2.21-0/lotus2 line 5667.
common prefix
Map is 01_map
Please check that all files required are present in map 01_map.
Use of uninitialized value $pathPre in concatenation (.) or string at /home/miniconda3/envs/lotus2/share/lotus2-2.21-0/lotus2 line 5756.

The corresponding lines are:

5667: print "$first common prefix\n";

5756: print "==========================\nTo start analysis:\nlotus2 -m $ofile -i $pathPre/ -o [outdir] [further parameters if desired]\n==========================\n";

I am not familiar enough with Perl to fully understand what is going on, but I am curuious as to why there would be any issue with the script itself when I did not edit in in any way. Am I missing something in terms of my input? I have run this from the input directory containing 100 files (R1 and R2 of 50 samples) in fastq format.

TaxOnly option specified, but not an output dir.

Hi,

When I run:

lotus2 -taxOnly OTUs.fasta -o lotustax -refDB KSGP_v1.0.fasta -tax4refDB KSGP_v1.0.tax -taxAligner usearch -ITSx 0 -t 48

I get:

TaxOnly option specified, but not an output dir. Assumming:
mkdir: cannot create directory ‘/primary/’: Permission denied
mkdir: missing operand
Try 'mkdir --help' for more information.
mkdir: cannot create directory ‘/LotuSLogS/’: Permission denied
mkdir: cannot create directory ‘/ExtraFiles/’: Permission denied
Can't open Logfile /LotuSLogS/LotuS_run.log

I also tried with the absolute path to the output directory etc, but I keep getting the same error message.

Any idea what I am doing wrong?

issues with sdm and autoInstall

Dear Falk, dear Joachim,
could you check the autoInstall.pl re sdm path?

Do you want to
 (1) install utax taxonomic classification databases (16S, ITS)?
 (0) no utax related databases
 Answer:1
Can't open sdm file lotus2//sdm_src/IO.h

Linux xxx 5.4.0-56-generic #62-Ubuntu SMP Mon Nov 23 19:20:19 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Thanks in advance! I used the linear version provided on the lotus2 webpage.
Best,
Ulrike

blast output

I would like to use the blast output in MEGAN, but the conversion from blast to rma fails.
The blast output should be tab separated and with 12 columns. I get 11 columns (without bitscore) from Lotus2?

How to input demultiplexed, pre-cleaned data

Dear lotus2 team,

I have 16S data that was already demultiplexed & cleaned (i.e., no primer parts left).
I created a basic sample map using lotus -create_map and then ran lotus2.

Running lotus2 this way creates a warning: No forward PCR primer for amplicon found in mapping file (column header "ForwardPrimer". This might invalidate chimera checks).

However, when I add the primer sequences (i.e., using the -forwardPrimer and -reversePrimer arguments), demultiplexing fails with an empty output file.

Is it safe to just run lotus2 without specifying the original primers? If not: is there a way to allow reads to pass QC even if the primer is not present anymore?

FR: yaml config file for easy reuse and sharing parameters

Hi,

Unless I missed it, we have now 'only' files in the config folder which set a number of default parameters for several platforms (using '-c', which is great already but not very easy to parse).

Would it be possible to also allow yaml import of complex parameter sets, made of your >60 available flags, so that users can easily reuse complex commands and share them in papers or protocols?

Another great addition would be to have a file created after submitting a manual command at CLI and that would include all user-defined AND default parameters for that command. This file could be used as template for the next command to increase reproducibility as in my previous paragraph.

If I had the choice, I would favor yaml over xml or jason for readability but the later would also be OK as they can be converted with yq like tools.

Thanks
Stephane

Lotus2 not reading

Hello,

I am trying to use Lotus2 to process ITS2 reads. I have tried to set the taxAligner to blast but it does not want to read it, have you encountered this issue? It seems that Lotus2 is not reading multiple of my commands? My code is below.

Error: "RefDB UNITE requested, but -taxAligner set to "0": therefore RDP classification of reads will be done"

It also puts out this error sometimes:
zsh: command not found: -amplicon_type
zsh: command not found: -tax_group
zsh: command not found: -taxAligner
zsh: command not found: -clustering

My Code:

lotus2 -i Seqs
-o lotus2
-m TestMap.txt
-refDB UNITE \
-amplicon_type ITS2 \
-tax_group fungi \
-taxAligner blast
-clustering vsearch
-id 0.97

save demultiplex

Hi,
If I run lotus2 with -saveDemultiplex 3 it results in an error. Can't say exactly what is happening.
I don't find the log file where the error should be described?

The lotus2 version I use was installed with conda (installed today, 05.01.24)

Need non-interactive installation method

Presently the autoInstall.pl script requires user input from the console.
To create a conda package for lOTUs we can skip calling autoInstall.pl for the software dependencies (since these will be installed via package requirements), but the required databases need to be installed/uninstalled as part of the lotus package installation/uninstallation, i.e. via post-link.sh/pre-unlink.sh scripts distributed with the conda package. In these scripts we could just run autoInstall.pl with some combination of parameters, but the script must be updated to be callable non-interactively.

parsing single read

Hi Falk Hildebrand,
I wonder if I can parse the single read sequences using lotus2? If so, how can I set the codes? Thanks,
Junhui

custom DB causes ParseError thrown: Unexpected character

Hi, very new to Lotus2 so this might be trivial. I have been trying to use the AF_full_region database produced by the Anaerobic Fungi Network (https://anaerobicfungi.org/databases/) but i got the error below

[M::main] CMD: /root/lotus2//bin//minimap2-2.17_x64-linux/minimap2 -x sr --sr -u both --secondary=no -N 30 -c -t 1 -o Will_AF/output_AFN/tmpFiles//otu_seeds.fna.phiX.0.cont_hit.paf /root/lotus2//DB//phiX.fasta Will_AF/output_AFN/tmpFiles//otu_seeds.fna
[M::main] Real time: 0.005 sec; CPU: 0.004 sec; Peak RSS: 0.004 GB
Loading Subject Sequences and Ids...
ParseError thrown: Unexpected character '-' found.
Make sure that the file is standards compliant. If you get an unexpected character warning make sure you have set the right program parameter (-p), i.e. Lambda expected nucleic acid alphabet, maybe the file was protein?

in response I substituted the '-' for '' thinking Lotus2 couldn't understand '-'. this still didn't work the same error appeared but with
ParseError thrown: Unexpected character '' found.

is the problem the characters in my DB files? if so what character can i substitute with that Lotus2 can read.
Many thanks :)
(i cant submit the AF_full_region FASTA file to github apologies)

Does lotus2 allows mismatches in barcodes

I am testing lotus2 on my amplicon sequencing data. I used lima for that but I don't like its demultiplexing report. I read your tutorial but didn't find anywhere stating if mismatches in barcodes can be accommodated.
Would you please help with this question?
Thanks.

fail to install under OSX with conda

OSX 10.15.7
conda 22.9.0 (part of my base env)
mamba 1.0.0 (part of my base env)

Following info on the related issue 24, I tried the following:

conda create -c conda-forge -c bioconda --strict-channel-priority -n lotus2 
Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /opt/miniconda3/envs/lotus2



Proceed ([y]/n)? y

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate lotus2
#
# To deactivate an active environment, use
#
#     $ conda deactivate

Retrieving notices: ...working... done

I tried installing with conda but it failed

$ conda activate lotus2
$ conda install -c bioconda lotus2
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: | 
Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abort.
failed                                                                                                                  

UnsatisfiableError:

I then tried with mamba

$ mamba install -c conda-forge -c bioconda lotus2

                  __    __    __    __
                 /  \  /  \  /  \  /  \
                /    \/    \/    \/    \
███████████████/  /██/  /██/  /██/  /████████████████████████
              /  / \   / \   / \   / \  \____
             /  /   \_/   \_/   \_/   \    o \__,
            / _/                       \_____/  `
            |/
        ███╗   ███╗ █████╗ ███╗   ███╗██████╗  █████╗
        ████╗ ████║██╔══██╗████╗ ████║██╔══██╗██╔══██╗
        ██╔████╔██║███████║██╔████╔██║██████╔╝███████║
        ██║╚██╔╝██║██╔══██║██║╚██╔╝██║██╔══██╗██╔══██║
        ██║ ╚═╝ ██║██║  ██║██║ ╚═╝ ██║██████╔╝██║  ██║
        ╚═╝     ╚═╝╚═╝  ╚═╝╚═╝     ╚═╝╚═════╝ ╚═╝  ╚═╝

        mamba (1.0.0) supported by @QuantStack

        GitHub:  https://github.com/mamba-org/mamba
        Twitter: https://twitter.com/QuantStack

█████████████████████████████████████████████████████████████


Looking for: ['lotus2']

pkgs/main/osx-64                                              No change
bioconda/noarch                                               No change
bioconda/osx-64                                               No change
pkgs/r/noarch                                                 No change
pkgs/main/noarch                                              No change
pkgs/r/osx-64                                                 No change
conda-forge/noarch                                  10.2MB @   3.8MB/s  3.2s
conda-forge/osx-64                                  25.2MB @   2.3MB/s 12.3s
Encountered problems while solving:
  - nothing provides lambda <2 needed by lotus2-2.01-0

The error is not very verbose, any idea how to fix this?

Thanks in advance

contamination of mitochondrion

I found that Lotus2(v2.23) or RDP classifier is failed to identify contamination of mitochondrion, for example, some OTU actually are mitochondrion sequence, but the taxonomy assignment for these OTU are still bacteria.
The blow is the blast result for a OTU:

The command line I used: lotus2 -t 50 -i data -m 16s_map.txt -s sdm_miSeq.txt -o uparse -CL uparse
sdm_miSeq.txt

PR2 version

Hi,

I was wondering what version of the PR2 database is used in Lotus2? I noticed the taxonomy file of the PR2 DB version that is used with the latest github version of Lotus2 has 7 taxonomic levels, while I read on the PR2 website that:

Version 5.0 and above
9 levels : Domain / Supergroup / Division / Subdivision / Class / Order / Family / Genus / Species

Version 4.14.1 and below
8 levels : Kingdom / Supergroup / Division / Class / Order / Family / Genus / Species

Just wondering how results compare to taxonomic identifications I previously did of the same sequences with other tools.

Best,
Sam

only get a fraction of OTUs back when running lotus2 -taxOnly

Hi,

When I run lotus2 -taxOnly with

lotus2 -taxOnly /kyukon/scratch/gent/vo/001/gvo00123/vsc46214/CRABS/otus92.fa -o lotustax -refDB Olig01_Annelida_crabs_db.fasta -tax4refDB Olig01_Annelida_crabs_db.tax -taxAligner blast -ITSx 0 -LCA_idthresh 94,80,75,70,65,60 -lulu 0 -t 64

I only get 99 of 152 OTUs back in the resulting otus92.fa.hier file. The other ones are completely missing from the file

Do you have any idea why this is happening?

Best,
Sam

can we multithread the tree making step?

Hi All,

It seems that the step: "Building tree (fasttree) and aligning (mafft) OTUs" (run by default) runs on a single thread for quite some time now while I have plenty of free cores available

ps shows:

/opt/miniconda3/envs/lotus2.23/bin/FastTreeMP -nt -gtr -no2nd -spr 4 -log lotus2_pacbio_V1V9/LotuSLogS//fasttree.log -quiet -out lotus2_pacbio_V1V9/OTUphylo.nwk lotus2_pacbio_V1V9/ExtraFiles//OTU.MSA.fna

The name FastTreeMP suggests that this could be multithreaded, is it so and if yes can we speed up that step?

Thanks
Stephane

lotus2.23
FastTree 2.1.11 Double precision (No SSE3), OpenMP (88 threads)

hildebra / lotus2 Goto Github PK

lotus2's People

Contributors

Stargazers

Watchers

Forkers

lotus2's Issues

00:00:00 LotuS 2.00 COMMAND perl ./lotus2.pl -i Example/ -m Example/miSeqMap.sm.txt -s configs/sdm_miSeq.txt -p miSeq -o /mydir/myTestRun

00:00:00 Reading mapping file Sequence files are indicated in mapping file. Switching to paired end read mode

00:00:01 UPARSE core routine Cluster at 97 00:00:01 Finished

00:00:01 Extending and merging pairs of OTU Seeds

sdms /mydir/myTestRun/tmpFiles//otu_seeds.1.singl.fq

00:00:01 Found 24 fasta seed sequences based on seed extension and read merging

00:00:05 Ref chimera filter using usearch uchime2_ref Total removed OTUs: (0/24)

00:00:05 Found 0 OTU's using minimap2 (phiX0: /lotus2//DB//phiX.fasta)

00:00:05 Postfilter: Extended logs active, contaminant and chimeric matrix will be created. After filtering 24 OTU (289 reads) remaining in matrix.

When I run the following to create a map:

An empty mapping file is created accompanied by the following message:

The corresponding lines are:

Recommend Projects

Recommend Topics

Recommend Org

00:00:00 LotuS 2.00
COMMAND
perl ./lotus2.pl -i Example/ -m Example/miSeqMap.sm.txt
-s configs/sdm_miSeq.txt -p miSeq -o /mydir/myTestRun

00:00:00 Reading mapping file
Sequence files are indicated in mapping file.
Switching to paired end read mode

00:00:01 UPARSE core routine
Cluster at 97
00:00:01 Finished

00:00:05 Ref chimera filter using usearch uchime2_ref
Total removed OTUs: (0/24)

00:00:05 Postfilter:
Extended logs active, contaminant and chimeric matrix will be created.
After filtering 24 OTU (289 reads) remaining in matrix.