Coder Social home page Coder Social logo

Comments (4)

apcamargo avatar apcamargo commented on September 4, 2024

By default, geNomad will apply some additional filters to the classification results. So only sequences with a high score are flagged as virus. The filters are more aggressive for short sequences (less than 2.5kb), as geNomad requires them to encode at least one virus hallmark gene. You can read more about these filter here.

These filter were implemented after the paper was published, since I noticed that most people preferred to have less sequences that are more reliably classified. This means that, by default, precision is preferred over recall. If you want to maximize discovery, use the --relaxed parameter (which is explained in detail in the link above). This will disable all filters and provide you with all the sequences that geNomad classified as virus, regardless of their score or amount of virus hallmarks. If you don't change any other parameter, geNomad will skip all the slow steps (since you already finished computing them) and will only redo the filtering, which is very fast.

Let me know your results after using --relaxed!

from genomad.

apcamargo avatar apcamargo commented on September 4, 2024

you would suggest using geNomad in combination with other tools to maximize prediction accuracy

This is a tricky question. In principle, I'm not a big fan of doing that because you won't have an estimate of the false discovery rate of the discovery process. As you add more tools, the complexity of the methodology increases and, ideally, you'd have to do some additional tests to have an idea of the expected false discovery rate of the process.

That said, if you can find sequences that are clearly viral and that geNomad can't identify, for whatever reason, I won't tell you that you shouldn't use them. Different methodologies will always have cases where they work better than others. You just need to be careful to integrate them in a way that allows you to have an idea of the expected performance (otherwise, you'd just take the union of a dozen of different tools and call it a day). It might be possible to combine geNomad and VirSorter2 in a way that will provide results that are better than the individual runs, but I've never tried this and I can tell you it's not trivial to benchmark in a robust manner. Taking the union will for sure decrease precision substantially, taking the intersection will reduce recall by a lot.

Of course, this is my opinion. Benchmarking in bioinformatics is a complex topic and different people will have different opinions.

from genomad.

lingrongjin avatar lingrongjin commented on September 4, 2024

Hi Antonio,

Thanks for your quick response! I checked genomad's results using --relaxed mode and it indeed boost the number of viral contigs predicted from ~8k to ~16k. I re-checked the overlap with Virsorter2 and the intersect also increased by about 1k+ seq, but at the expense of increasing amount of unique seqs predicted by the two tools.
I agree that combining different tools may not be straightforward and I just wanted to make sure that the differences are a result of different computational approaches/thresholds but not some systematic bias like different database representation resulting in one type of virus easier to be detected by one tool but not the other.

from genomad.

apcamargo avatar apcamargo commented on September 4, 2024

Glad to hear that worked out :)
I'll close the issue for now, but please let me know if you have any other questions

from genomad.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.