Coder Social home page Coder Social logo

dissertation's People

Contributors

camillescott avatar ctb avatar

Watchers

 avatar  avatar  avatar

dissertation's Issues

comments on Chapter 1, figure "Changes in Fragmentation"

ref fig-chap1-unitig-frag

This looks pretty interesting! I think some more discussion in the figure caption would be great, though. In particular, is there a takeaway?

Please put axis labels, esp on y axis!

Is there a different way to represent this? Perhaps in an additional figure, since I think this one is also pretty informative? What if you made one where the curves were weighted by unitig size?

Also, I have questions -

  • why is the proportion of small nodes increasing for yeast towards the end??

comments on Chapter 1, figure "Dynamic metrics of the cDBG during construction"

A few minor questions first -

  • so, the sum of the n_ lines should equal one? and kmer_p always ends at x=1, y=1.
  • if correct, would suggest adding 'proportion of nodes' as y axis label.
  • might also make them a bit bigger so they take up the page width?

I think the discussion of this figure in the Results section is a good start, but I have lots of questions ๐Ÿ˜† -

  • yeast looks pretty different. this is because of high coverage?
  • I still have some trouble understanding the dynamics after looking at this for a while. let's see -
    • n_circular should generally be small, ok
    • n_trivial isn't defined anywhere?
    • the kinks in kmer_p are from non-random sequencing, presumably? maybe verify this with shuffling?
    • wouldn't you expect a fair number of islands in RNAseq?
    • in yeast, it looks like n_islands is converging to a very different point than in the other two data sets; what gives?
    • maybe: if this is due to higher coverage, what happens if you subsample the yeast data set a whole bunch?
    • n_tips behavior is driven by error?

anyway, I think it would be very helpful to develop some intuition, maybe in another (simpler) figure.

also, a different or perhaps complementary tack - what interesting behavior is occurring in this figure that readers should be alerted to, and what behavior is just boring and "trivial"?

Comments on Chp 2, figure "Sketching distance curves showing saturation of a transcriptomic sample"

First question: is this done on diginormed data, or not?

I still struggle intuitively with the saturation behavior in the leftmost figure. Why would later sketches have high similarity? Can you remind me and then explain it in the caption?

Middle figure, the curve is because you're using flat k-mers w/o abundance, right, so you get a convexity when you have seen a bunch of the k-mers.

Right figure, why on earth is it decreasing a bit towards x=1?

Should you maybe be using containment instead of Jaccard? Seems easier to explain to me.

Chapter 1: Discussion: Key Points

  • The primary and most intensive job of compaction is finding the decision k-mers
    • Extracting sequence is incidental
    • Tracking unitig metrics is potentially useful for downstream
  • Streaming compaction builds a foundation for streaming assembly
  • Streaming compaction can be used downstream from lighter-weight sub-linear methods
    • Original idea: using compaction metrics for guessing
    • Future idea (actually chapter 2): sketching methods for sub-linear compaction / assembly
  • Streaming pipelines enable real-time feedback
    • Stopping early to preserve computational resources
    • Feedback loop w/third generation sequencing instruments

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.