cjprybol / mycelia Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
Evidence = Dict(sequence_identifier => (index: Int, orientation: Bool) for kmer in dataset)
https://www.youtube.com/watch?v=F4RVBAGJcFY
https://juliagraphs.org/Graphs.jl/dev/centrality/#Graphs.betweenness_centrality
Modularity is a measure of the structure of networks or graphs which measures the strength of division of a network into modules (also called groups, clusters or communities). Networks with high modularity have dense connections between the nodes within modules but sparse connections between nodes in different modules. Modularity is often used in optimization methods for detecting community structure in networks. However, it has been shown that modularity suffers a resolution limit and, therefore, it is unable to detect small communities. Biological networks, including animal brains, exhibit a high degree of modularity.
If we take starting nodes for traversing the graph at random, even if we take a weighted sample, could still accidentally grab a node that is off on a very low frequency or erroneous offshoot path.
By taking all nodes with >= 3, we always enable the option to take the better of the two or more adjacent paths
Possibly consider taking a weighted subset of these hub nodes, rather than taking all of them. But by taking all of them, we get a good way of indexing the graph completely
can explore by:
https://www.microbiome-cosi.org/cami/cami/cami2
https://edwards.flinders.edu.au/cami-challenge-datasets/
https://data.cami-challenge.org/participate
https://github.com/CAMI-challenge/data
https://zenodo.org/communities/cami?q=&l=list&p=1&s=10&sort=newest
https://www.microbiome-cosi.org/cami/resources
Need to handle N's and proteins
Grist would be good to try for long read and/or contig classification
https://dib-lab.github.io/genome-grist/output-guide/
I'll need to build my own refseq database, but that shouldn't be too bad - I have the scripts for doing that download
Can also use the underlying commands sourmash gather and sourmash tax
algorithm:
"""
function jellyfish_count(;fasta::String, k::Int, directory::String, jellyfish_path="$(homedir())/jellyfish-linux")
count kmers from fasta file and write outputs to directory
"""
function jellyfish_count(;fasta::String, k::Int, directory::String, jellyfish_path="jellyfish")
id = first(split(last(split(fasta, '/')), '.'))
counts_file = "$directory/$id.$k.counts"
if !isfile(counts_file)
jf_file = "$directory/$id.$k.jf"
run(`$(jellyfish_path) count -m $k -s 100M --canonical -o $jf_file $fasta`)
run(`$(jellyfish_path) dump -ct -o $counts_file $jf_file`)
rm(jf_file)
end
end
When we create an induced subgraph or update the graph, the kmer_counts datastructure is no longer correct and needs to be rebuilt. Should just rebuild on the fly each time using the metadata attached to the node weights
rough algorithm:
When working with observational data (fastq files) rather than reference data (fasta files), use a simplified kmer graph that only records # of supporting pieces of evidence as an Int rather than recording each piece of evidence (e.g. record identifier, index, orientation) individually
https://github.com/GATB/bcalm
conda install -c conda-forge -c bioconda bcalm
https://github.com/pmelsted/bifrost
conda -c bioconda bifrost
They're meant to be a record of development over time, not of an always-up-to-date status of the current state
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.