jsh58 / ngmerge Goto Github PK
View Code? Open in Web Editor NEWMerging paired-end reads and removing adapters
License: MIT License
Merging paired-end reads and removing adapters
License: MIT License
Can we have a more detailed explanation in the documentation for how to appropriately structure a custom quality profile for the -w
option.
Running adapter-removal mode with two fastq.gz files, code below:
./NGmerge -1 DoxEE17_D10_S1_R1_001.fastq.gz -2 DoxEE17_D10_S1_R2_001.fastq.gz -a -o DoxEE17_D10_noA -y -n 12
How do I fix this error?
for the dovetailed alignments possible to retain the adapter sequences at both ends?
Hi, I am trying to use this tool, but after running the following command:
$NGmerge -1 $FILE2 -2 $FIL3 -o sample -a -n 20 -v
I got this: Error! sample: unknown command-line argument
I cannot figure out where the error come from. I will appreciate your help.
Thanks in advance.
Hi everyone
I run the command
./NGmerge -1 /home/planktonecology/Manuscript_Thatha/Metagenome_Thatha/CG_DN_935/AST5_R1.fastq.gz -2 /home/planktonecology/Manuscript_Thatha/Metagenome_Thatha/CG_DN_935/AST5_R2.fastq.gz -o AST5_merged.fastq.gz
I got error as follows
Error! Quality scores outside of set range
Hi,
I am working on reprocessing some samples and I want to use NGmerge to properly merge the PE reads. For this I convert an existing .bam file to fastQ files and use them as input for NGmerge. I execute the program like this:
NGmerge -w resources/qual_profile.txt -u 41 -n 8 -z -1 FILE_R1.fastq.gz -2 FILE_R2.fastq.gz -o FILE_merged.fastq.gz -f FILE_nonmerged -l FILE.log
For most samples, everything works like a charm, but for some I get errors like this:
Error! @HISEQ_172:2:2211:1315:83788 BC:Z:NAGCGTTANGAGTCAA: not matched in input files
Any idea what the problem might be?
it would be nice if this program allowed Bash process substitution to be used for the input files. For example, one might want to run a command like the following:
NGmerge -a -1 <(zcat R1.fastq.gz | head -n4000) -2 <(R2.fastq.gz | head -n4000) -o temp.fastq -i -v
Currently, the above command causes the program to fail with the following error:
Processing files: /dev/fd/63,/dev/fd/62
Error! Input file does not follow fastq format
Below is an example of modifications to the code that work on my system (Ubuntu 16.04). The modified code starts after the comment "push back chars". The solution is to use gzdopen instead of gzopen. See also the attached diff file diff.txt.
bool openRead(char* inFile, File* in) {
// open file or stdin
bool stdinBool = (strcmp(inFile, "-") ? false : true);
FILE* dummy = (stdinBool ? stdin : fopen(inFile, "r"));
if (dummy == NULL)
exit(error(inFile, ERROPEN));
// check for gzip compression: magic number 0x1F, 0x8B
bool gzip = true;
int save = 0; // first char to pushback (for stdin)
int i, j;
for (i = 0; i < 2; i++) {
j = fgetc(dummy);
if (j == EOF)
exit(error(inFile, ERROPEN));
if ( (i && (unsigned char) j != 0x8B)
|| (! i && (unsigned char) j != 0x1F) ) {
gzip = false;
break;
}
if (! i)
save = j;
}
// push back chars
if (ungetc(j, dummy) == EOF)
exit(error("", ERRUNGET));
if (i && ungetc(save, dummy) == EOF)
exit(error("", ERRUNGET));
// open file
if (! stdinBool)
rewind(dummy);
in->f = dummy;
if (gzip) {
in->gzf = gzdopen(fileno(in->f), "r");
if (in->gzf == NULL)
exit(error(inFile, ERROPEN));
}
return gzip;
}
Hello
I'm trying to merge my paired end reads into a single read by NGmerge. The problem is when I run a command like
NGmerge-master/NGmerge -1 AH1-R1.fastq -2 AH1-R2.fastq -o AH1-merged.fastq
the resultant merged file has a huge reduction in the file size and number of reads, for example from 600M to 70M, and from 15,000,000 reads to only 1,000,000 reads!
Could you please tell me what the issue reason might be?
Thank you
I had trouble installing this on Mac OS (10.14.6) due to Apple clang not supporting OpenMP by default, so I got the error message when I ran 'make':
clang: error: unsupported option '-fopenmp'
So a more Mac-friendly installer or a pre-compiled binary would be appreciated.
Regards, Eric
Hi, I'm a big fan of this software but was wondering if it might make sense to provide the option to threshold based on a false positive rate instead of error rate (similar to what SeqPurge does using the binomial distribution calculation), since longer overlaps should be more tolerant of higher error rates. We've found that we obtain the best performance when piping multiple instances of NGmerge to grossly simulate this effect; e.g. to simulate a 1E-6 FP threshold, we allow 8% errors for overlaps of 10-14 bp, 17% errors for overlaps of 15-19 bp, and 23% errors for overlaps of 20+ bp. But obviously this is still overly stringent for longer overlaps, not to mention time consuming.
Dear John,
When I run the following, I get "Error! Input file does not follow fastq format", although I am convinced that my input files are in fastq format (reads.zip):
NGmerge -1 AMBV1527_forward.fastq -2 AMBV1527_reverse.fastq -o merged.fastq
Any idea what the problem might be?
Best regards,
Stijn
My fastq file is Illumina-1.8 Phred+33 format, so I need to edit qual_profile.txt to expand the score range. What numbers in the rows and columns should I add to each "match" and "mismatch" matrix in the file?
My fastq file is Illumina-1.8 Phred+33 format, how to solve this problem?
(I received this question by email and am including it and the response below - jmg)
Maybe I missed it but do you have any data on number of false merging events and false non merging events (when insitu data predicted that the reads could have been joined or such?)
Pretty self-explanatory. We are trying to eliminate the need to ever process data in fastq format in our pipeline. We probably wouldn't need the ability to convert fastq to ubam or vice-versa (although I wouldn't object), but having the ability to run ubam < ngmerge > ubam would be very appreciated.
Hello,
Iโve just tried to use NGmerge to cut the adapter from about paired-end data. Fastqc Report shows that Nextera Transposase Sequence is the adapter (Fig1).
I use NGmerge to cut the adapter with the following command:
NGmerge/NGmerge -z -a -1 R1.fastq.gz -2 R2.fastq.gz -o cut_R
But the cut file still contains some adapters (Fig2)
Do you have any idea about that? Did I use it properly?
Thank you very much
Hien
Hi, I went to install NGmerge through bioconda on an ubuntu terminal (operated on a Windows computer). I received the following error about solving the environment. This error is unique to NGmerge as I have been able to install several other packages through bioconda. I'm using the latest version of anaconda3 for linux x64 (https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh)
Thank you for your help!
(ngmerge) passeguelab@BB11CSCI-M003:~$ conda install -c bioconda ngmerge
Collecting package metadata (current_repodata.json):
done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: |
Found conflicts! Looking for incompatible packages.
This can take several minutes. Press CTRL-C to abort.
failedUnsatisfiableError:'
Is there an option to multi process in a loop for all R1 and R2 fastq inside a folder? I have more than 1000 fastqs to process.. Would be tedious to process them one by one.
I try GNmerge in Linux but is not running with my simulated datasets.
The header patterns are the following:
@gi|110798562|ref|NC_008261.1|-100.101.325660/1
AAGTTCATCATAGTTATTTTGAATAAAATTTAATCTATCAAGTATCATCTATTATCACTCCGTATACAGATTTTCATATTTTACAATTATAGCACACTAC
+
>G9GFGCFGGGG#G#8#G)E##G6GGGBGGGGCGGGGGEFGG8GG:CGGF9,G9EGGGFGGGGGG6GGGGFGGGGCGGGGFGGGGGGGGGGGGGGFGGCG
@gi|110798562|ref|NC_008261.1|-100.101.325660/2
TAGTAGTGGGCTCTCTTTGTAAAATATAAACATCCGTATACGGAGTGATAATAGATTATACTTGATAGATTAAATTTTATTGAAAATAAATATGATGAAC
+
C2C*G*5)(*@4(G##:GGGF4G3,*D*#G#G(G#G*E05GGGGGG+.E+*5DGFG*4G8G1G+G+*GG87CGGCFGEG0FGCGFGGG+GGGGGGGGGGF
My version seems to expect a " " as delimiter to create a single key. Thus, I was getting the error : ..... ": not matched in input files"
I add a " " before the "/" and it solve the issue. I, notice after that a new parameter (-t) was added to handle these situations.
After, another error prompted: "Sequence/quality scores do not match". This is thrown because of "ERRQUAL". The reads do not have any issue and I have been able to run the datasets with many other tools (BBMerge, USEARCH, FLASH, PEAR, etc...)
I am sharing a small dataset, in case you want to investigate what could be the problem?
reads_NC_008261.1.100.101.10_R1.fq.gz
reads_NC_008261.1.100.101.10_R2.fq.gz
Thanks
Hi,
https://github.com/bioconda/bioconda-recipes/blob/master/recipes/ngmerge/meta.yaml
uses the outdated https://github.com/harvardinformatics/NGmerge
repository.
Regards,
Stephan
Getting a very bizarre Error! -2 cannot open file for reading when trying to run Ngmerge in stitch mode but only when NGmerge is run via a SLURM batch script
If I run the exact same NGMerge command (./NGmerge -1 r1.fq -2 r2.fq -o output.fq, e.g.) through the interactive command line, works no problem
If I take that same command and run it as a part of a bash submission script for a SLURM job on a HPCC it fails with Error! -2 cannot open file for reading
It looks like theres some issue when it attempts to stat both files into memory?
Hello, I'm working on merging HMP data, where read IDs in forward vs. reverse reads are delineated by a forward slash, "/".
For example, the first read is @HWI-EAS319_616WC:3๐ฏ10067:14224/1 in the forward reads and @HWI-EAS319_616WC:3๐ฏ10067:14224/2 in the reverse reads. Other mergers have been able to accommodate this, but NGmerge reports these as different reads and fails.
Is there a method to adjust for these? Is there a different forward/reverse read delineator that NGmerge expects?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.