Coder Social home page Coder Social logo

ngmerge's People

Contributors

jsh58 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

ngmerge's Issues

Error! Cannot close file

Running adapter-removal mode with two fastq.gz files, code below:

./NGmerge -1 DoxEE17_D10_S1_R1_001.fastq.gz -2 DoxEE17_D10_S1_R2_001.fastq.gz -a -o DoxEE17_D10_noA -y -n 12

How do I fix this error?

Error! sample: unknown command-line argument

Hi, I am trying to use this tool, but after running the following command:

$NGmerge -1 $FILE2 -2 $FIL3 -o sample -a -n 20 -v

I got this: Error! sample: unknown command-line argument

I cannot figure out where the error come from. I will appreciate your help.

Thanks in advance.

Error! Quality scores outside of set range

Hi everyone
I run the command
./NGmerge -1 /home/planktonecology/Manuscript_Thatha/Metagenome_Thatha/CG_DN_935/AST5_R1.fastq.gz -2 /home/planktonecology/Manuscript_Thatha/Metagenome_Thatha/CG_DN_935/AST5_R2.fastq.gz -o AST5_merged.fastq.gz

I got error as follows
Error! Quality scores outside of set range

Error! not matched in input files

Hi,

I am working on reprocessing some samples and I want to use NGmerge to properly merge the PE reads. For this I convert an existing .bam file to fastQ files and use them as input for NGmerge. I execute the program like this:

NGmerge -w resources/qual_profile.txt -u 41 -n 8 -z -1 FILE_R1.fastq.gz -2 FILE_R2.fastq.gz -o FILE_merged.fastq.gz -f FILE_nonmerged -l FILE.log

For most samples, everything works like a charm, but for some I get errors like this:
Error! @HISEQ_172:2:2211:1315:83788 BC:Z:NAGCGTTANGAGTCAA: not matched in input files

Any idea what the problem might be?

support for Bash process substitution

it would be nice if this program allowed Bash process substitution to be used for the input files. For example, one might want to run a command like the following:

NGmerge -a -1 <(zcat R1.fastq.gz | head -n4000) -2 <(R2.fastq.gz | head -n4000) -o temp.fastq -i -v 

Currently, the above command causes the program to fail with the following error:

Processing files: /dev/fd/63,/dev/fd/62
Error! Input file does not follow fastq format

Below is an example of modifications to the code that work on my system (Ubuntu 16.04). The modified code starts after the comment "push back chars". The solution is to use gzdopen instead of gzopen. See also the attached diff file diff.txt.

bool openRead(char* inFile, File* in) {

  // open file or stdin
  bool stdinBool = (strcmp(inFile, "-") ? false : true);
  FILE* dummy = (stdinBool ? stdin : fopen(inFile, "r"));
  if (dummy == NULL)
    exit(error(inFile, ERROPEN));

  // check for gzip compression: magic number 0x1F, 0x8B
  bool gzip = true;
  int save = 0;  // first char to pushback (for stdin)
  int i, j;
  for (i = 0; i < 2; i++) {
    j = fgetc(dummy);
    if (j == EOF)
      exit(error(inFile, ERROPEN));
    if ( (i && (unsigned char) j != 0x8B)
        || (! i && (unsigned char) j != 0x1F) ) {
      gzip = false;
      break;
    }
    if (! i)
      save = j;
  }

  // push back chars
  if (ungetc(j, dummy) == EOF)
    exit(error("", ERRUNGET));
  if (i && ungetc(save, dummy) == EOF)
    exit(error("", ERRUNGET));

  // open file
  if (! stdinBool)
    rewind(dummy);
  in->f = dummy;
  if (gzip) {
    in->gzf = gzdopen(fileno(in->f), "r");
    if (in->gzf == NULL)
      exit(error(inFile, ERROPEN));
  }

  return gzip;
}

Merging problem

Hello

I'm trying to merge my paired end reads into a single read by NGmerge. The problem is when I run a command like

NGmerge-master/NGmerge -1 AH1-R1.fastq -2 AH1-R2.fastq -o AH1-merged.fastq

the resultant merged file has a huge reduction in the file size and number of reads, for example from 600M to 70M, and from 15,000,000 reads to only 1,000,000 reads!

Could you please tell me what the issue reason might be?

Thank you

doesn't easily install on Mac OS

I had trouble installing this on Mac OS (10.14.6) due to Apple clang not supporting OpenMP by default, so I got the error message when I ran 'make':

clang: error: unsupported option '-fopenmp'

So a more Mac-friendly installer or a pre-compiled binary would be appreciated.

Regards, Eric

feature request: use false positive rate instead of error rate?

Hi, I'm a big fan of this software but was wondering if it might make sense to provide the option to threshold based on a false positive rate instead of error rate (similar to what SeqPurge does using the binomial distribution calculation), since longer overlaps should be more tolerant of higher error rates. We've found that we obtain the best performance when piping multiple instances of NGmerge to grossly simulate this effect; e.g. to simulate a 1E-6 FP threshold, we allow 8% errors for overlaps of 10-14 bp, 17% errors for overlaps of 15-19 bp, and 23% errors for overlaps of 20+ bp. But obviously this is still overly stringent for longer overlaps, not to mention time consuming.

Error! Input file does not follow fastq format

Dear John,

When I run the following, I get "Error! Input file does not follow fastq format", although I am convinced that my input files are in fastq format (reads.zip):

NGmerge -1 AMBV1527_forward.fastq -2 AMBV1527_reverse.fastq -o merged.fastq

Any idea what the problem might be?

Best regards,
Stijn

qual_profile

My fastq file is Illumina-1.8 Phred+33 format, so I need to edit qual_profile.txt to expand the score range. What numbers in the rows and columns should I add to each "match" and "mismatch" matrix in the file?

False events

(I received this question by email and am including it and the response below - jmg)

Maybe I missed it but do you have any data on number of false merging events and false non merging events (when insitu data predicted that the reads could have been joined or such?)

feature request: ubam input/output?

Pretty self-explanatory. We are trying to eliminate the need to ever process data in fastq format in our pipeline. We probably wouldn't need the ability to convert fastq to ubam or vice-versa (although I wouldn't object), but having the ability to run ubam < ngmerge > ubam would be very appreciated.

adapters remains after using NGmerge

Hello,
Iโ€™ve just tried to use NGmerge to cut the adapter from about paired-end data. Fastqc Report shows that Nextera Transposase Sequence is the adapter (Fig1).
I use NGmerge to cut the adapter with the following command:
NGmerge/NGmerge -z -a -1 R1.fastq.gz -2 R2.fastq.gz -o cut_R
But the cut file still contains some adapters (Fig2)
Do you have any idea about that? Did I use it properly?
Thank you very much
Hien
Fig1
Fig2

bioconda install

Hi, I went to install NGmerge through bioconda on an ubuntu terminal (operated on a Windows computer). I received the following error about solving the environment. This error is unique to NGmerge as I have been able to install several other packages through bioconda. I'm using the latest version of anaconda3 for linux x64 (https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh)

Thank you for your help!

(ngmerge) passeguelab@BB11CSCI-M003:~$ conda install -c bioconda ngmerge

Collecting package metadata (current_repodata.json):
done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: |
Found conflicts! Looking for incompatible packages.
This can take several minutes. Press CTRL-C to abort.
failed

UnsatisfiableError:'

is there any option for batch processing?

Is there an option to multi process in a loop for all R1 and R2 fastq inside a folder? I have more than 1000 fastqs to process.. Would be tedious to process them one by one.

Reads are good but throws error: "Sequence/quality scores do not match"

I try GNmerge in Linux but is not running with my simulated datasets.

The header patterns are the following:

@gi|110798562|ref|NC_008261.1|-100.101.325660/1
AAGTTCATCATAGTTATTTTGAATAAAATTTAATCTATCAAGTATCATCTATTATCACTCCGTATACAGATTTTCATATTTTACAATTATAGCACACTAC
+
>G9GFGCFGGGG#G#8#G)E##G6GGGBGGGGCGGGGGEFGG8GG:CGGF9,G9EGGGFGGGGGG6GGGGFGGGGCGGGGFGGGGGGGGGGGGGGFGGCG
@gi|110798562|ref|NC_008261.1|-100.101.325660/2
TAGTAGTGGGCTCTCTTTGTAAAATATAAACATCCGTATACGGAGTGATAATAGATTATACTTGATAGATTAAATTTTATTGAAAATAAATATGATGAAC
+
C2C*G*5)(*@4(G##:GGGF4G3,*D*#G#G(G#G*E05GGGGGG+.E+*5DGFG*4G8G1G+G+*GG87CGGCFGEG0FGCGFGGG+GGGGGGGGGGF

My version seems to expect a " " as delimiter to create a single key. Thus, I was getting the error : ..... ": not matched in input files"
I add a " " before the "/" and it solve the issue. I, notice after that a new parameter (-t) was added to handle these situations.

After, another error prompted: "Sequence/quality scores do not match". This is thrown because of "ERRQUAL". The reads do not have any issue and I have been able to run the datasets with many other tools (BBMerge, USEARCH, FLASH, PEAR, etc...)

I am sharing a small dataset, in case you want to investigate what could be the problem?
reads_NC_008261.1.100.101.10_R1.fq.gz
reads_NC_008261.1.100.101.10_R2.fq.gz

Thanks

(bio)conda recipe needs to be updated

Hi,

https://github.com/bioconda/bioconda-recipes/blob/master/recipes/ngmerge/meta.yaml uses the outdated https://github.com/harvardinformatics/NGmerge repository.

Regards,
Stephan

Error! -2 cannot open file for reading

Getting a very bizarre Error! -2 cannot open file for reading when trying to run Ngmerge in stitch mode but only when NGmerge is run via a SLURM batch script

If I run the exact same NGMerge command (./NGmerge -1 r1.fq -2 r2.fq -o output.fq, e.g.) through the interactive command line, works no problem

If I take that same command and run it as a part of a bash submission script for a SLURM job on a HPCC it fails with Error! -2 cannot open file for reading

It looks like theres some issue when it attempts to stat both files into memory?

NGmerge failing if read IDs are indicated by a forward slash

Hello, I'm working on merging HMP data, where read IDs in forward vs. reverse reads are delineated by a forward slash, "/".

For example, the first read is @HWI-EAS319_616WC:3๐Ÿ’ฏ10067:14224/1 in the forward reads and @HWI-EAS319_616WC:3๐Ÿ’ฏ10067:14224/2 in the reverse reads. Other mergers have been able to accommodate this, but NGmerge reports these as different reads and fails.

Is there a method to adjust for these? Is there a different forward/reverse read delineator that NGmerge expects?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.