Coder Social home page Coder Social logo

metageni / focus Goto Github PK

View Code? Open in Web Editor NEW
8.0 2.0 8.0 50.07 MB

FOCUS: An Agile Profiler for Metagenomic Data

License: GNU General Public License v3.0

Python 98.35% Dockerfile 1.65%
metagenomes kmer identify-organisms bioinformatics microbiome

focus's Issues

0-count kmers failing

If a user adds some organisms to their database but has 0 kmers identified in the organism, loadDB will fail because you can't normalize when sum == 0. These instances should be ignored.

DB not extracted

Can you check that the db/k6 and db/k7 files are there, and if they are not can you either extract db.zip or throw a sensible error message so the user is reminded they have not extracted them.

[Exception] "has no k-mers count. Probably not valid file."

Hi,

I'm trying to run FOCUS on an assembled metagenome reads.

I am getting the following exception:

[2020-02-13 06:16:20,955 - INFO] FOCUS: An Agile Profiler for Metagenomic Data
[2020-02-13 06:16:20,956 - INFO] OUTPUT: test_output_pravaler does not exist - just created it :)
[2020-02-13 06:16:20,956 - INFO] 1) Loading Reference DB
[2020-02-13 06:16:22,441 - INFO] 2) Reference DB was loaded with 2785 reference genomes
[2020-02-13 06:16:22,442 - INFO] 3.1) Working on: 4DHI2c-formatted.fna
[2020-02-13 06:16:22,442 - INFO]    Counting k-mers
Failed to open input file 'kmer_counting_0.5626527946381891021'
Failed to open input file 'kmer_counting_0.562652794638189'
rm: cannot remove 'kmer_counting_0.562652794638189': No such file or directory
Traceback (most recent call last):
  File "/home/vini/anaconda3/envs/focus/bin/focus", line 12, in <module>
    sys.exit(main())
  File "/home/vini/anaconda3/envs/focus/lib/python3.8/site-packages/focus_app/focus.py", line 310, in main
    query_count = normalise(count_kmers(Path(query, temp_query), kmer_size, threads, kmer_order))
  File "/home/vini/anaconda3/envs/focus/lib/python3.8/site-packages/focus_app/focus.py", line 148, in count_kmers
    raise Exception('{} has no k-mers count. Probably not valid file'.format(query_file))
Exception: test/4DHI2c-formatted.fna has no k-mers count. Probably not valid file

However, my FASTA file is perfectly valid. On my previous attempt, in which I got the same error, I noticed lower case sequences in my FASTA file. I thought converting them all to upper case would help, but it still gave me this same mistake. I do have ambiguous nucleotides (N characters) in my reads, perhaps that might be the problem? I tried the FOCUS command with a small file (3 sequences, < 200 bp each), and it worked fine.

Thank you for any assistance you can provide,

V

Run aborted at Step 3

I get the following error message:

Counting 7-mers for /input.fasta
jellyfish count -m 7 -o 99189_kmer_counting -s 100M -t 32 -C --disk /input.fasta
Failed to open input file '99189_kmer_counting'
./focus.py:194: RuntimeWarning: invalid value encountered in divide
return result/(SUM(result)*1.)

Can I know what went wrong here?

Combining Jellyfish Output Files

There is an issue when you have a large fastq file, and jellyfish makes several output files - even with jf >=2.0

For example, download this algae metagenome data set which has 50,000 reads and 11,685,706 bp.

If you extract everything:

$ ls -l Algae/
total 91104
-rw-r--r-- 1  24392289 Nov 16 16:48 Coral_11.fastq
-rw-r--r-- 1  24733707 Nov 16 16:48 Coral_12.fastq
-rw-r--r-- 1  20770734 Nov 16 16:48 Coral_13.fastq
-rw-r--r-- 1  23352839 Nov 16 16:48 Coral_14.fastq

and run focus:

$ focus -q Algae/ -o Algae_Focus
[2018-11-16 17:26:48,830 - INFO] FOCUS: An Agile Profiler for Metagenomic Data
[2018-11-16 17:26:48,831 - INFO] OUTPUT: Algae_Focus does not exist - just created it :)
[2018-11-16 17:26:48,831 - INFO] 1) Loading Reference DB
[2018-11-16 17:26:50,084 - INFO] 2) Reference DB was loaded with 2785 reference genomes
[2018-11-16 17:26:50,086 - INFO] 3.1) Working on: Coral_11.fastq
[2018-11-16 17:26:50,086 - INFO]    Counting k-mers
Segmentation fault (core dumped)
Failed to open input file 'kmer_counting_0.42009467543622836'
rm: cannot remove 'kmer_counting_0.42009467543622836': No such file or directory
Traceback (most recent call last):
  File "/usr/local/bin/focus", line 11, in <module>
    load_entry_point('metagenomics-focus==1.4', 'console_scripts', 'focus')()
  File "/usr/local/lib/python3.6/dist-packages/metagenomics_focus-1.4-py3.6.egg/focus_app/focus.py", line 310, in main
    query_count = normalise(count_kmers(Path(query, temp_query), kmer_size, threads, kmer_order))
  File "/usr/local/lib/python3.6/dist-packages/metagenomics_focus-1.4-py3.6.egg/focus_app/focus.py", line 148, in count_kmers
    raise Exception('{} has no k-mers count. Probably not valid file'.format(query_file))
Exception: Algae/Coral_11.fastq has no k-mers count. Probably not valid file

The exception is that it cannot remove kmer_counting_0.42009467543622836 however, that file doesn't exist, instead I have the following k-mer files:

kmer_counting_0.420094675436228360   kmer_counting_0.4200946754362283611  kmer_counting_0.420094675436228362  kmer_counting_0.420094675436228365  kmer_counting_0.420094675436228368 kmer_counting_0.420094675436228361   kmer_counting_0.4200946754362283612  kmer_counting_0.420094675436228363  kmer_counting_0.420094675436228366  kmer_counting_0.420094675436228369 kmer_counting_0.4200946754362283610  kmer_counting_0.4200946754362283613  kmer_counting_0.420094675436228364  kmer_counting_0.420094675436228367

If I take a subset of these reads, and run focus it works great.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.