Coder Social home page Coder Social logo

merenlab / anvio Goto Github PK

View Code? Open in Web Editor NEW
415.0 32.0 142.0 733.29 MB

An analysis and visualization platform for 'omics data

Home Page: http://merenlab.org/software/anvio

License: GNU General Public License v3.0

Shell 1.74% Python 84.81% HTML 2.51% CSS 0.85% JavaScript 8.91% R 1.03% Dockerfile 0.09% Makefile 0.06%
metagenomics metatranscriptomics pangenomics comparative-genomics science visualization bioinformatics phylogenomics population-genetics python

anvio's Introduction

Releases

Github releases page lists all the stable releases of anvi'o.

Installation and tutorials

The anvi'o project page gives access to installation manuals, user tutorials, and other sweets.

Help on anvi'o programs and artifacts

The anvi'o help pages describe individual anvi'o programs as well as artifacts they consume or produce.

Coding style considerations

Please see relevant discussions.

Community chat

Click this link to join the anvi'o Discord channel.

Others on anvi'o

Read our user testimonials.

anvio's People

Contributors

blankenberg avatar ctb avatar dogancankilment avatar eburgoswisc avatar efogarty11 avatar ekiefl avatar farukuzun avatar floriantrigodet avatar ge0rges avatar gkmngrgn avatar gokmen avatar isaacfink21 avatar ivagljiva avatar jessica-pan avatar jessika-fuessel avatar kekananen avatar mahmoudyousef98 avatar matthewlawrenceklein avatar meren avatar metehaansever avatar mooreryan avatar mschecht avatar ozcan avatar qclayssen avatar semiller10 avatar shaiberalon avatar srinidhi202 avatar telatin avatar vinisalazar avatar watsonar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

anvio's Issues

annotation.py needs love.

Tables with annotation_ prefix should use orfs_ of funtional_ as prefix. It will clear out great deal of confusion.

Also, papi-gen-annotation should behave identically to papi-populate-*.

Colors change between views

From one view (mean_coverage):

image

To another (standard_dev):

image

Thank you Ozcan :) Let me know when you feel overwhelmed! :)

PhymmBL

Anyone who wants to use PhymmBL annotation will need to add this line to scoreReads.pl script that comes with the PhymmBL distribution:

use Cwd; use File::Basename; chdir(dirname($ARGV[0])) or die "cannot change working directory: $!\n";

I don't even know how I am going to check this other than putting it in the documentation.

check contig names for silly characters

check_contig_names is in utils. it needs to be filled and called form a reasonable place.

sometimes bam files contain contigs with characters that shouldn't be. before profiling this needs to be pointed out.

Timeseries SVG raporlama problemi

Birden fazla profilin birleştirilmesi ile oluşturulan projelere ait SVG dosyaları arayüzden export edilince, SVG içinde layer'lara ilişkin hiçbir rapor olmuyor. Eğer bu hatayı tekrarlayamazsan haber ver, bir yere örnek veri seti koyayım :)

interactive-binning += "Title"

birden fazla proje acik iken kullanicinin hangi pencerede hangi proje uzerinde calistigini takip etmesi imkaniz hale geliyor.

fe51b81 numarali commit ile bir data hook'u ekledim. artik javascript icinden projenin ismini ogrenmek mumkun:

image

bunu agacin sol ust kosesinde bir yerde buyukce gostermek leziz olurdu.

MERGE_RUNS

Hey awesome dudes! I am trying to merge runs and I have encountered this error. I successfully merged runs for other groups of samples and this set is no different other than the number of samples being processed.
Thanks for the help

jvineis@rocket:papi-merge-multiple-runs 204_*/RUNINFO.cPickle -o MERGED_RUNS
output_dir .......................................................: /automounts/bpcstorage01/production/users/jvineis/HMP_temp/204/BAM/MERGED_RUNS
num_runs_processed ...............................................: 5
num_splits_found .................................................: 2,926
contigs_total_length .............................................: 152,442
contigs_fasta ....................................................: /automounts/bpcstorage01/production/users/jvineis/HMP_temp/204/BAM/MERGED_RUNS/CONTIGS-CONSENSUS.fa
tnf_matrix .......................................................: /automounts/bpcstorage01/production/users/jvineis/HMP_temp/204/BAM/MERGED_RUNS/TETRANUCLEOTIDE-FREQ-MATRIX.txt
[05 Aug 14 10:13:31 Generating TNF tree] ... Traceback (most recent call last):
File "/groups/merenlab/PaPi/bin/papi-merge-multiple-runs", line 315, in
MultipleRuns(args).merge()
File "/groups/merenlab/PaPi/bin/papi-merge-multiple-runs", line 119, in merge
tnf_tree = self.generate_tnf_tree()
File "/groups/merenlab/PaPi/bin/papi-merge-multiple-runs", line 156, in generate_tnf_tree
PaPi.utils.get_newick_tree_data(self.run.info_dict['tnf_matrix'], newick_tree_file_path)
File "/groups/merenlab/PaPi/PaPi/utils.py", line 566, in get_newick_tree_data
normalized_vector = [p / denominator for p in vector]
ZeroDivisionError: float division by zero

server ctrl^c

Output messages on the terminal for papi-interactive-binning must be clearer.

GC Content for merged

Change the way GC content is calculated for merged runs. The consensus needs to be take into account.

Consensus sequences for splits

Profiler should export consensus sequences for split as well as contigs. If we have access to splits, tools like phymmbl annotation script will be much less complex, and users will be able to use them on data that were not generated by PaPi.

Critical DB issue

Well, I am getting this error with large files that do not happen with smaller ones.

This may be a db.commit() issue. Needs to be checked properly:

meren SSH://MBL /workspace/shared/tom/Infant-gut-FASTA-files $ papi-populate-search-table Infant-gut-assembly-1kb.fa Infant-gut-assembly-1kb.db -L 20000
Database .....................................: A new database, Infant-gut-assembly-1kb.db, has been created.
Split length .................................: 20000
HMM profiles .................................: 3 sources have been loaded: Dupont_et_al (111 genes), Campbell_et_al (139 genes), Wu_et_al (31 genes)

Finding ORFs in contigs
===============================================
Genes ........................................: /tmp/tmpqjiEz2/contigs.genes
Proteins .....................................: /tmp/tmpqjiEz2/contigs.proteins
Log file .....................................: /tmp/tmpqjiEz2/00_log.txt

HMM Profiling for Dupont_et_al
===============================================
Reference ....................................: Dupont et al, http://www.nature.com/ismej/journal/v6/n6/full/ismej2011189a.html
Pfam model ...................................: /groups/merenlab/PaPi/PaPi/data/hmm/Dupont_et_al/genes.hmm.gz
Number of genes ..............................: 111
Temporary work dir ...........................: /tmp/tmpYslaQa
HMM scan output ..............................: /tmp/tmpYslaQa/hmm.output
HMM scan hits ................................: /tmp/tmpYslaQa/hmm.hits
Log file .....................................: /tmp/tmpYslaQa/00_log.txt
Number of raw hits ...........................: 3,945

HMM Profiling for Campbell_et_al
===============================================
Reference ....................................: Campbell et al, http://www.pnas.org/content/110/14/5540.short
Pfam model ...................................: /groups/merenlab/PaPi/PaPi/data/hmm/Campbell_et_al/genes.hmm.gz
Number of genes ..............................: 139
Temporary work dir ...........................: /tmp/tmpI3mhZw
HMM scan output ..............................: /tmp/tmpI3mhZw/hmm.output
HMM scan hits ................................: /tmp/tmpI3mhZw/hmm.hits
Log file .....................................: /tmp/tmpI3mhZw/00_log.txt
Number of raw hits ...........................: 2,364

HMM Profiling for Wu_et_al
===============================================
Reference ....................................: Wu et al, http://genomebiology.com/2008/9/10/R151
Pfam model ...................................: /groups/merenlab/PaPi/PaPi/data/hmm/Wu_et_al/genes.hmm.gz
Number of genes ..............................: 31
Temporary work dir ...........................: /tmp/tmpz_f6JE
HMM scan output ..............................: /tmp/tmpz_f6JE/hmm.output
HMM scan hits ................................: /tmp/tmpz_f6JE/hmm.hits
Log file .....................................: /tmp/tmpz_f6JE/00_log.txt
Number of raw hits ...........................: 946
Traceback (most recent call last):
  File "/groups/merenlab/PaPi/bin/papi-populate-search-table", line 118, in <module>
    main(args)
  File "/groups/merenlab/PaPi/bin/papi-populate-search-table", line 78, in main
    g.populate_search_tables(annotation_db, sources)
  File "/groups/merenlab/PaPi/PaPi/annotation.py", line 179, in populate_search_tables
    search_tables.append(source, reference, kind_of_search, all_genes_searched_against, search_results_dict)
  File "/groups/merenlab/PaPi/PaPi/annotation.py", line 261, in append
    self.db.create_table(self.search_info_table, search_info_table_structure, search_info_table_types)
  File "/groups/merenlab/PaPi/PaPi/db.py", line 68, in create_table
    self._exec('''CREATE TABLE %s (%s)''' % (table_name, db_fields))
  File "/groups/merenlab/PaPi/PaPi/db.py", line 110, in _exec
    ret_val = self.cursor.execute(sql_query)
sqlite3.OperationalError: table search_info already exists

papi-profile: --contigs vs --splits

When papi-profile is used with a PROFILE.cPickle as an input, the user may want to use the split names (obtained through the web interface) instead of contig names to retain from results. When --contigs is used with splits it produces an error. Something must be done about this.

Selecting contigs from tree

It would be really great if we could remove/select nodes from the tree by clicking on the outer layers instead of the leaves of the tree which can be very dense and difficult to select

Phylogram görüntüsü

Dairesel ağaç çok fazla layer olduğu durumlarda performans sorunları çıkarıyor. Bunu aşmak için dairesel olmayan ağaç çizimi opsiyonunu sunabiliyor olmamız gerek.

Merging SVG bugs

A little bug report for Ozcan.

I run into this with infant 1kb. Here is the circular tree, that missing white piece at the end is :

image

This is a screenshot from phylogram view. It seems group names do not correspond to colors shown, and there are white ones even when there is a group assignment:

image

For instance, everything shown here is Group_28 (according to the mouse-over menu) :)

image

Best,

Work with relative paths..

At this point full paths are embedded in RUNINFO.cp and SUMMARY.cp files. In merge-multiple-runs and papi-interactive-binning scripts contain procedures to fix directories if they are carried over from a different machines.

If papi works with relative paths life would have been much easier.

SUMMARY.cp

RUNINFO.cp runs with relative path. but merger still puts in full paths in SUMMARY.cp.

annotation.db issue

if papi-populate-genes-table has not been run on an annotation db papi-interactive crashes.

mock data to test performance

create a mock dataset to test the performance. now we are measuring tree drawing time, it could be useful to mention "expected" time for the test data to be drawn on a mid-level computer. any computer that takes much longer than that value would not be the best platform to run papi.

split length

split length should be a standard layer just like GC content for merged runs

A fix to stop joe's complaints

My current installation of PaPi (the current version downloaded from github) is returning this error

jvineis@Joes-MacBook-Pro-2: papi-interactive-binning -r RUNINFO-mean_coverage.cp
Traceback (most recent call last):
File "/Applications/PaPi/bin/papi-interactive-binning", line 81, in
d = interactive.InputHandler(parser.parse_args())
File "/Applications/PaPi/PaPi/interactive.py", line 62, in init
self.load_from_runinfo_dict(args)
File "/Applications/PaPi/PaPi/interactive.py", line 216, in load_from_runinfo_dict
self.profile_db = PaPi.db.DB(self.P(self.runinfo['profile_db']), PaPi.profiler.version)
KeyError: 'profile_db'

metadata.py

Metadata.py is a mess. Profiler needs a lib similar to annotation.py where all database operations are handled.

Taxonomy - Metadata

TAXONOMY.txt may have more entries in it as far as it does contain every instance appears in the METADATA.txt. This will increase efficiency of the use of one T.txt across different merging ops.

search box to highlight matching contigs

Bilişim tarihinde bir ilk; şiyirli feature request:

Philae yapayalnız iken bir 67P yamacında,
Ben PaPi'yi düşünüyorum gözlerim kapalı.
Mesela diyorum bir arama kutusu olsa,
Oraya yazdığımız metinleri,
PaPi gidip contig isimlerinde arasa,
ve highlhight etse bulduklarını ağaçta...
   Hatta ve hatta, 
      belki bir buton olsa arama tabında,
          "Bunlari al, ekle diğer grupların yanına" demek de misal,
               mümkün olsa bebek, oh mümkün olsa...

Contigs <-> Annotation DB

Each annotation database should keep the sha1sum of the FASTA file that is generated from. Right now it is possible to generate an annotation table, then populate it with search results coming from a totally different FASTA file.

This mistake should not be possible to make.

annotation.db needs to know which tables are populated

self table in the annotation.db should be updated when any of the papi-populate-*-table scripts are run.

it should be clear to the profiler whether a papi-populate-*-table was run and the results were empty, or it was never run at all.

Inclusion of high coverage elements

Could we find a way to include contigs that are highly abundant but too short to make it into the assembly? Perhaps we could use a percentage approach where we would compute the relative abundance of all the short contigs and if they are greater than .1 percent (or some selected value) of the total dataset, they are formed into a separate bin.

We don't want to miss these contigs!
screen shot 2014-08-11 at 10 57 55 am

Contigs with Ns

Hi PaPi,
Could you set a flag in the profiling to ignore contigs with Ns so they are not included in the analysis. Many assemblers will insert Ns for various reasons and it would be great to be able to control for this in the analysis.

Selection Control

This just happened to me: I had a tree with various selections, and then while I was moving it around I inadvertently clicked on one of the root branches. All my selections were overwritten by this new selection, and I had to wait a minute for PaPi to add 1.5K contigs into a bin.

Maybe there should be a popup to warn the user with something like "Yo, you just requested to bin XXX splits into Group X. Do you wish to continue?", if the user attempts to select more than, i.e., 500 splits.

Specialized binning for a group

In some cases TFN resolves a fine cluster, yet more than one genome can be fixed within that cluster with different coverages. There must be a way to focus on that clade, and re-order contigs based on some other information (without breaking up splits), such as coverage.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.