jsh58 / ngmerge Goto Github PK

View Code? Open in Web Editor NEW

43.0 43.0 14.0 845 KB

Merging paired-end reads and removing adapters

License: MIT License

Makefile 0.11% C 65.16% Python 34.73%

ngmerge's People

Contributors

Stargazers

Watchers

Forkers

harvardinformatics springtan munizajunaid transcript slw287r healthvivo renesugar vhu43

ngmerge's Issues

Documentation Requested for Custom Quality Profile

Can we have a more detailed explanation in the documentation for how to appropriately structure a custom quality profile for the -w option.

Error! Cannot close file

Running adapter-removal mode with two fastq.gz files, code below:

./NGmerge -1 DoxEE17_D10_S1_R1_001.fastq.gz -2 DoxEE17_D10_S1_R2_001.fastq.gz -a -o DoxEE17_D10_noA -y -n 12

How do I fix this error?

no adapter removal with dovetailed alignments

for the dovetailed alignments possible to retain the adapter sequences at both ends?

Error! sample: unknown command-line argument

Hi, I am trying to use this tool, but after running the following command:

$NGmerge -1 $FILE2 -2 $FIL3 -o sample -a -n 20 -v

I got this: Error! sample: unknown command-line argument

I cannot figure out where the error come from. I will appreciate your help.

Thanks in advance.

Error! Quality scores outside of set range

Hi everyone
I run the command
./NGmerge -1 /home/planktonecology/Manuscript_Thatha/Metagenome_Thatha/CG_DN_935/AST5_R1.fastq.gz -2 /home/planktonecology/Manuscript_Thatha/Metagenome_Thatha/CG_DN_935/AST5_R2.fastq.gz -o AST5_merged.fastq.gz

I got error as follows
Error! Quality scores outside of set range

Error! not matched in input files

Hi,

I am working on reprocessing some samples and I want to use NGmerge to properly merge the PE reads. For this I convert an existing .bam file to fastQ files and use them as input for NGmerge. I execute the program like this:

NGmerge -w resources/qual_profile.txt -u 41 -n 8 -z -1 FILE_R1.fastq.gz -2 FILE_R2.fastq.gz -o FILE_merged.fastq.gz -f FILE_nonmerged -l FILE.log

For most samples, everything works like a charm, but for some I get errors like this:
Error! @HISEQ_172:2:2211:1315:83788 BC:Z:NAGCGTTANGAGTCAA: not matched in input files

Any idea what the problem might be?

support for Bash process substitution

it would be nice if this program allowed Bash process substitution to be used for the input files. For example, one might want to run a command like the following:

NGmerge -a -1 <(zcat R1.fastq.gz | head -n4000) -2 <(R2.fastq.gz | head -n4000) -o temp.fastq -i -v

Currently, the above command causes the program to fail with the following error:

Processing files: /dev/fd/63,/dev/fd/62
Error! Input file does not follow fastq format

Below is an example of modifications to the code that work on my system (Ubuntu 16.04). The modified code starts after the comment "push back chars". The solution is to use gzdopen instead of gzopen. See also the attached diff file diff.txt.

bool openRead(char* inFile, File* in) {

  // open file or stdin
  bool stdinBool = (strcmp(inFile, "-") ? false : true);
  FILE* dummy = (stdinBool ? stdin : fopen(inFile, "r"));
  if (dummy == NULL)
    exit(error(inFile, ERROPEN));

  // check for gzip compression: magic number 0x1F, 0x8B
  bool gzip = true;
  int save = 0;  // first char to pushback (for stdin)
  int i, j;
  for (i = 0; i < 2; i++) {
    j = fgetc(dummy);
    if (j == EOF)
      exit(error(inFile, ERROPEN));
    if ( (i && (unsigned char) j != 0x8B)
        || (! i && (unsigned char) j != 0x1F) ) {
      gzip = false;
      break;
    }
    if (! i)
      save = j;
  }

  // push back chars
  if (ungetc(j, dummy) == EOF)
    exit(error("", ERRUNGET));
  if (i && ungetc(save, dummy) == EOF)
    exit(error("", ERRUNGET));

  // open file
  if (! stdinBool)
    rewind(dummy);
  in->f = dummy;
  if (gzip) {
    in->gzf = gzdopen(fileno(in->f), "r");
    if (in->gzf == NULL)
      exit(error(inFile, ERROPEN));
  }

  return gzip;
}

Merging problem

Hello

I'm trying to merge my paired end reads into a single read by NGmerge. The problem is when I run a command like

NGmerge-master/NGmerge -1 AH1-R1.fastq -2 AH1-R2.fastq -o AH1-merged.fastq

the resultant merged file has a huge reduction in the file size and number of reads, for example from 600M to 70M, and from 15,000,000 reads to only 1,000,000 reads!

Could you please tell me what the issue reason might be?

Thank you

doesn't easily install on Mac OS

I had trouble installing this on Mac OS (10.14.6) due to Apple clang not supporting OpenMP by default, so I got the error message when I ran 'make':

clang: error: unsupported option '-fopenmp'

So a more Mac-friendly installer or a pre-compiled binary would be appreciated.

Regards, Eric

feature request: use false positive rate instead of error rate?

Hi, I'm a big fan of this software but was wondering if it might make sense to provide the option to threshold based on a false positive rate instead of error rate (similar to what SeqPurge does using the binomial distribution calculation), since longer overlaps should be more tolerant of higher error rates. We've found that we obtain the best performance when piping multiple instances of NGmerge to grossly simulate this effect; e.g. to simulate a 1E-6 FP threshold, we allow 8% errors for overlaps of 10-14 bp, 17% errors for overlaps of 15-19 bp, and 23% errors for overlaps of 20+ bp. But obviously this is still overly stringent for longer overlaps, not to mention time consuming.

Error! Input file does not follow fastq format

Dear John,

When I run the following, I get "Error! Input file does not follow fastq format", although I am convinced that my input files are in fastq format (reads.zip):

NGmerge -1 AMBV1527_forward.fastq -2 AMBV1527_reverse.fastq -o merged.fastq

Any idea what the problem might be?

Best regards,
Stijn

qual_profile

My fastq file is Illumina-1.8 Phred+33 format, so I need to edit qual_profile.txt to expand the score range. What numbers in the rows and columns should I add to each "match" and "mismatch" matrix in the file?

Error! Quality score file missing values for score range

My fastq file is Illumina-1.8 Phred+33 format, how to solve this problem?

False events

(I received this question by email and am including it and the response below - jmg)

Maybe I missed it but do you have any data on number of false merging events and false non merging events (when insitu data predicted that the reads could have been joined or such?)

Bioconda package

Hi,

a Bioconda package would be super useful for this program.

Thanks,
Bjoern

feature request: ubam input/output?

Pretty self-explanatory. We are trying to eliminate the need to ever process data in fastq format in our pipeline. We probably wouldn't need the ability to convert fastq to ubam or vice-versa (although I wouldn't object), but having the ability to run ubam < ngmerge > ubam would be very appreciated.

adapters remains after using NGmerge

Hello,
I’ve just tried to use NGmerge to cut the adapter from about paired-end data. Fastqc Report shows that Nextera Transposase Sequence is the adapter (Fig1).
I use NGmerge to cut the adapter with the following command:
NGmerge/NGmerge -z -a -1 R1.fastq.gz -2 R2.fastq.gz -o cut_R
But the cut file still contains some adapters (Fig2)
Do you have any idea about that? Did I use it properly?
Thank you very much
Hien

bioconda install

Hi, I went to install NGmerge through bioconda on an ubuntu terminal (operated on a Windows computer). I received the following error about solving the environment. This error is unique to NGmerge as I have been able to install several other packages through bioconda. I'm using the latest version of anaconda3 for linux x64 (https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh)

Thank you for your help!

(ngmerge) passeguelab@BB11CSCI-M003:~$ conda install -c bioconda ngmerge

Collecting package metadata (current_repodata.json):
done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: |
Found conflicts! Looking for incompatible packages.
This can take several minutes. Press CTRL-C to abort.
failed

UnsatisfiableError:'

is there any option for batch processing?

Is there an option to multi process in a loop for all R1 and R2 fastq inside a folder? I have more than 1000 fastqs to process.. Would be tedious to process them one by one.

Reads are good but throws error: "Sequence/quality scores do not match"

I try GNmerge in Linux but is not running with my simulated datasets.

The header patterns are the following:

@gi|110798562|ref|NC_008261.1|-100.101.325660/1
AAGTTCATCATAGTTATTTTGAATAAAATTTAATCTATCAAGTATCATCTATTATCACTCCGTATACAGATTTTCATATTTTACAATTATAGCACACTAC
+
>G9GFGCFGGGG#G#8#G)E##G6GGGBGGGGCGGGGGEFGG8GG:CGGF9,G9EGGGFGGGGGG6GGGGFGGGGCGGGGFGGGGGGGGGGGGGGFGGCG

@gi|110798562|ref|NC_008261.1|-100.101.325660/2
TAGTAGTGGGCTCTCTTTGTAAAATATAAACATCCGTATACGGAGTGATAATAGATTATACTTGATAGATTAAATTTTATTGAAAATAAATATGATGAAC
+
C2C*G*5)(*@4(G##:GGGF4G3,*D*#G#G(G#G*E05GGGGGG+.E+*5DGFG*4G8G1G+G+*GG87CGGCFGEG0FGCGFGGG+GGGGGGGGGGF

My version seems to expect a " " as delimiter to create a single key. Thus, I was getting the error : ..... ": not matched in input files"
I add a " " before the "/" and it solve the issue. I, notice after that a new parameter (-t) was added to handle these situations.

After, another error prompted: "Sequence/quality scores do not match". This is thrown because of "ERRQUAL". The reads do not have any issue and I have been able to run the datasets with many other tools (BBMerge, USEARCH, FLASH, PEAR, etc...)

I am sharing a small dataset, in case you want to investigate what could be the problem?
reads_NC_008261.1.100.101.10_R1.fq.gz
reads_NC_008261.1.100.101.10_R2.fq.gz

Thanks

(bio)conda recipe needs to be updated

Hi,

https://github.com/bioconda/bioconda-recipes/blob/master/recipes/ngmerge/meta.yaml uses the outdated https://github.com/harvardinformatics/NGmerge repository.

Regards,
Stephan

Error! -2 cannot open file for reading

Getting a very bizarre Error! -2 cannot open file for reading when trying to run Ngmerge in stitch mode but only when NGmerge is run via a SLURM batch script

If I run the exact same NGMerge command (./NGmerge -1 r1.fq -2 r2.fq -o output.fq, e.g.) through the interactive command line, works no problem

If I take that same command and run it as a part of a bash submission script for a SLURM job on a HPCC it fails with Error! -2 cannot open file for reading

It looks like theres some issue when it attempts to stat both files into memory?

NGmerge failing if read IDs are indicated by a forward slash

Hello, I'm working on merging HMP data, where read IDs in forward vs. reverse reads are delineated by a forward slash, "/".

For example, the first read is @HWI-EAS319_616WC:3💯10067:14224/1 in the forward reads and @HWI-EAS319_616WC:3💯10067:14224/2 in the reverse reads. Other mergers have been able to accommodate this, but NGmerge reports these as different reads and fails.

Is there a method to adjust for these? Is there a different forward/reverse read delineator that NGmerge expects?