Coder Social home page Coder Social logo

mycelia's People

Contributors

cjprybol avatar github-actions[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

snystrom jelber2

mycelia's Issues

Use community detection for seperating out protein communities, genome communities

https://www.youtube.com/watch?v=F4RVBAGJcFY

https://juliagraphs.org/Graphs.jl/dev/centrality/#Graphs.betweenness_centrality

Modularity is a measure of the structure of networks or graphs which measures the strength of division of a network into modules (also called groups, clusters or communities). Networks with high modularity have dense connections between the nodes within modules but sparse connections between nodes in different modules. Modularity is often used in optimization methods for detecting community structure in networks. However, it has been shown that modularity suffers a resolution limit and, therefore, it is unable to detect small communities. Biological networks, including animal brains, exhibit a high degree of modularity.

https://juliagraphs.org/Graphs.jl/dev/community/#Graphs.modularity-Tuple{AbstractGraph,%20AbstractVector{%3C:Integer}}

https://en.wikipedia.org/wiki/Louvain_method

Use degree >= 3 nodes for pre-indexing targets

If we take starting nodes for traversing the graph at random, even if we take a weighted sample, could still accidentally grab a node that is off on a very low frequency or erroneous offshoot path.

By taking all nodes with >= 3, we always enable the option to take the better of the two or more adjacent paths

Possibly consider taking a weighted subset of these hub nodes, rather than taking all of them. But by taking all of them, we get a good way of indexing the graph completely

can explore by:

  1. weighted walk forward from A until we find a hub node or find B
  2. weighted walk forward from B until we find a hub node or find A
  3. use the pre-calculated shortest path from A to B
  4. can either stop there for a possibly, but not necessarily guaranteed, shortest path
  5. Can continue shortest path searches from A and B UP UNTIL they are equal to or longer than the precalculated route from hub to hub, then we'll know for sure which is the shortest

Finish PoC joint-probility assembler + variant caller by passing my polished graphs into pggb/odgi+vg variant deconvolution flow

algorithm:

  • deconvolve reads into kmer graph
  • iterative correction, starting at first k-length with sparsity, ending at first k-length with no updates (possibly better terminating conditions, should look into that)
  • write graph out as GFA (try compacted and raw)
  • to determine primary contigs using shortest path with L = (1 / total bases, where total bases = length * average depth or the actual lossless alignment calculation), or to start but with external dependencies, add try metaflye or https://github.com/lh3/minigraph
  • use primary contigs as reference for generating variant calls using ODGI -> VG flow, or possibly VG directly
  • ensure that VG variant calling can use coverage information - if not, update VCF files with depth-of-coverage information
    https://odgi.readthedocs.io/en/latest/rst/commands/odgi_build.html

add jellyfish counting function

"""
	function jellyfish_count(;fasta::String, k::Int, directory::String, jellyfish_path="$(homedir())/jellyfish-linux")

count kmers from fasta file and write outputs to directory
"""
function jellyfish_count(;fasta::String, k::Int, directory::String, jellyfish_path="jellyfish")
    id = first(split(last(split(fasta, '/')), '.'))
    counts_file = "$directory/$id.$k.counts"
    if !isfile(counts_file)
        jf_file = "$directory/$id.$k.jf"
        run(`$(jellyfish_path) count -m $k -s 100M --canonical -o $jf_file $fasta`)
        run(`$(jellyfish_path) dump -ct -o $counts_file $jf_file`)
        rm(jf_file)
    end
end

Add documentation

  • add basic docstrings for all functions
  • make docs available online
  • add real examples to docstrings to replace placeholders

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.