Coder Social home page Coder Social logo

metageni / focus Goto Github PK

View Code? Open in Web Editor NEW
8.0 2.0 8.0 50.07 MB

FOCUS: An Agile Profiler for Metagenomic Data

License: GNU General Public License v3.0

Python 98.35% Dockerfile 1.65%
metagenomes kmer identify-organisms bioinformatics microbiome

focus's People

Contributors

linsalrob avatar metageni avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

focus's Issues

Combining Jellyfish Output Files

There is an issue when you have a large fastq file, and jellyfish makes several output files - even with jf >=2.0

For example, download this algae metagenome data set which has 50,000 reads and 11,685,706 bp.

If you extract everything:

$ ls -l Algae/
total 91104
-rw-r--r-- 1  24392289 Nov 16 16:48 Coral_11.fastq
-rw-r--r-- 1  24733707 Nov 16 16:48 Coral_12.fastq
-rw-r--r-- 1  20770734 Nov 16 16:48 Coral_13.fastq
-rw-r--r-- 1  23352839 Nov 16 16:48 Coral_14.fastq

and run focus:

$ focus -q Algae/ -o Algae_Focus
[2018-11-16 17:26:48,830 - INFO] FOCUS: An Agile Profiler for Metagenomic Data
[2018-11-16 17:26:48,831 - INFO] OUTPUT: Algae_Focus does not exist - just created it :)
[2018-11-16 17:26:48,831 - INFO] 1) Loading Reference DB
[2018-11-16 17:26:50,084 - INFO] 2) Reference DB was loaded with 2785 reference genomes
[2018-11-16 17:26:50,086 - INFO] 3.1) Working on: Coral_11.fastq
[2018-11-16 17:26:50,086 - INFO]    Counting k-mers
Segmentation fault (core dumped)
Failed to open input file 'kmer_counting_0.42009467543622836'
rm: cannot remove 'kmer_counting_0.42009467543622836': No such file or directory
Traceback (most recent call last):
  File "/usr/local/bin/focus", line 11, in <module>
    load_entry_point('metagenomics-focus==1.4', 'console_scripts', 'focus')()
  File "/usr/local/lib/python3.6/dist-packages/metagenomics_focus-1.4-py3.6.egg/focus_app/focus.py", line 310, in main
    query_count = normalise(count_kmers(Path(query, temp_query), kmer_size, threads, kmer_order))
  File "/usr/local/lib/python3.6/dist-packages/metagenomics_focus-1.4-py3.6.egg/focus_app/focus.py", line 148, in count_kmers
    raise Exception('{} has no k-mers count. Probably not valid file'.format(query_file))
Exception: Algae/Coral_11.fastq has no k-mers count. Probably not valid file

The exception is that it cannot remove kmer_counting_0.42009467543622836 however, that file doesn't exist, instead I have the following k-mer files:

kmer_counting_0.420094675436228360   kmer_counting_0.4200946754362283611  kmer_counting_0.420094675436228362  kmer_counting_0.420094675436228365  kmer_counting_0.420094675436228368 kmer_counting_0.420094675436228361   kmer_counting_0.4200946754362283612  kmer_counting_0.420094675436228363  kmer_counting_0.420094675436228366  kmer_counting_0.420094675436228369 kmer_counting_0.4200946754362283610  kmer_counting_0.4200946754362283613  kmer_counting_0.420094675436228364  kmer_counting_0.420094675436228367

If I take a subset of these reads, and run focus it works great.

Run aborted at Step 3

I get the following error message:

Counting 7-mers for /input.fasta
jellyfish count -m 7 -o 99189_kmer_counting -s 100M -t 32 -C --disk /input.fasta
Failed to open input file '99189_kmer_counting'
./focus.py:194: RuntimeWarning: invalid value encountered in divide
return result/(SUM(result)*1.)

Can I know what went wrong here?

DB not extracted

Can you check that the db/k6 and db/k7 files are there, and if they are not can you either extract db.zip or throw a sensible error message so the user is reminded they have not extracted them.

0-count kmers failing

If a user adds some organisms to their database but has 0 kmers identified in the organism, loadDB will fail because you can't normalize when sum == 0. These instances should be ignored.

[Exception] "has no k-mers count. Probably not valid file."

Hi,

I'm trying to run FOCUS on an assembled metagenome reads.

I am getting the following exception:

[2020-02-13 06:16:20,955 - INFO] FOCUS: An Agile Profiler for Metagenomic Data
[2020-02-13 06:16:20,956 - INFO] OUTPUT: test_output_pravaler does not exist - just created it :)
[2020-02-13 06:16:20,956 - INFO] 1) Loading Reference DB
[2020-02-13 06:16:22,441 - INFO] 2) Reference DB was loaded with 2785 reference genomes
[2020-02-13 06:16:22,442 - INFO] 3.1) Working on: 4DHI2c-formatted.fna
[2020-02-13 06:16:22,442 - INFO]    Counting k-mers
Failed to open input file 'kmer_counting_0.5626527946381891021'
Failed to open input file 'kmer_counting_0.562652794638189'
rm: cannot remove 'kmer_counting_0.562652794638189': No such file or directory
Traceback (most recent call last):
  File "/home/vini/anaconda3/envs/focus/bin/focus", line 12, in <module>
    sys.exit(main())
  File "/home/vini/anaconda3/envs/focus/lib/python3.8/site-packages/focus_app/focus.py", line 310, in main
    query_count = normalise(count_kmers(Path(query, temp_query), kmer_size, threads, kmer_order))
  File "/home/vini/anaconda3/envs/focus/lib/python3.8/site-packages/focus_app/focus.py", line 148, in count_kmers
    raise Exception('{} has no k-mers count. Probably not valid file'.format(query_file))
Exception: test/4DHI2c-formatted.fna has no k-mers count. Probably not valid file

However, my FASTA file is perfectly valid. On my previous attempt, in which I got the same error, I noticed lower case sequences in my FASTA file. I thought converting them all to upper case would help, but it still gave me this same mistake. I do have ambiguous nucleotides (N characters) in my reads, perhaps that might be the problem? I tried the FOCUS command with a small file (3 sequences, < 200 bp each), and it worked fine.

Thank you for any assistance you can provide,

V

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.