Comments (5)
Hi Natasha,
When NanoSim enters model fitting phase, it has nothing to do with the aligner any more. read_analysis.py
was trying to find the best parameter combinations for the error model and it took nearly 10 hours for the alignment results from minimap2
. Could you send me your alignment files or raw data so I can test? I'm curious why it took so long.
Thanks,
Chen
from nanosim.
Thanks for your reply and explanation @cheny19 .
I used two different datasets, and in both cases the run time with "minimap2" was significantly longer.
Please find the datasets uploaded here, https://drive.google.com/drive/folders/1p9OSIXseyGoXoKv9oNYaoP8PhaLqYygI?usp=sharing.
I used "nanosim-2.0.0", "minimap2, v2.10-r761" with 4 threads, and "last, v876".
Please let me know if you need any additional information.
Thank you,
Natasha
from nanosim.
Hi Natasha,
Sorry I didn't reply until now. Last few weeks I looked into the code and I think the runtime is heavily dragged down by R. So I re-wrote the model fitting part in Python (which is faster than R) and it now supports multiprocessing. The model fitting stage can be finished within a hour now. Please download the latest commit and have a try. I haven't made the new release yet, because I have more testing to do, but on your dataset, it works fine.
The error model is a bit different between minimap2
and LAST
, and the original proposed model may not fit very well on errors inferred from minimap2
. That is also why it ran for so long in previous versions. NanoSim will throw a warning if the fitted model cannot pass statistical test, but don't worry, it's still close to the emprical errors. I'll keep looking and see if there are better models.
Thanks for pointing this out!
from nanosim.
Hi @cheny19 - all this sounds great - thank you so much for improving NanoSim!
I will have some time next week to test this out if that is ok.
In the meantime, does this mean that I can still use the simulated reads generated by Minimap2 in the previous version, or I need to re-run the simulation now?
Thank you,
Natasha
from nanosim.
You can still use them.
from nanosim.
Related Issues (20)
- Nanosim hangs in the middle HOT 18
- Infinite loop in function extract_reads in metagenome mode when length equals max length HOT 2
- Transcriptome mode error rate tsv explanation HOT 2
- Models for R10.3 or R10.4 flow cell
- Option to specify desired read coverage or sequencing depth HOT 2
- ValueError: Found array with 0 sample(s) (shape=(0, 1)) while a minimum of 1 is required. HOT 6
- Please specify the training reads and its reference genome! HOT 3
- Stuck at simulation stage HOT 4
- simulator.py genome FileNotFoundError: [Errno 2] No such file or directory: 'training_model_profile' HOT 1
- NanoSim for tuning Minimap2 parameters? HOT 2
- Models for newer versions of Guppy with sup basecalls HOT 3
- Options / suggestions for how to simulate nCats data? HOT 1
- Support for Dorado? HOT 2
- IndexError: list index out of range HOT 1
- Installation error HOT 4
- How to find reference genome for pre-trained models HOT 2
- Coverage breadth following metagenome characterization HOT 2
- Can't install Nanosim HOT 2
- Questions about the usage and processing of the expression profile HOT 1
- Failure in using Nanosim for transcriptome (ValueError: file does not contain alignment data)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nanosim.