A repository for code that generates various simulated data.
Subfolders found here concern:
- "select 5-10 Mb of neutral DNA sequences in the human genome": https://twitter.com/cegAmorim/status/1646158398071754755
"What is a good and easy way to select 5-10 Mb of neutral DNA sequences in the human genome? Would selecting random intergenic regions (say >10Kb away from genes) be enough? Has someone done something similar recently in a paper I could cite and use the same loci?"
https://twitter.com/vsbuffalo/status/1646212322833334272
"I had to do this recently โ I took all exonic + phastcons + UTRS, merged them, and then add 200bp of buffer on both ends (all using bedtools). You could do this and even select out random regions. I did some sensitivity analysis and comparison to the CADD tracks and seemed good."
"Also (and perhaps this is being too paranoid) but I merged the refseq and ensembl tracks. They differ slightly in their percent of basepairs that annotated as coding, so I took the union."
- long-read mock microbial community data: https://twitter.com/pathogenomenick/status/1037346467462176769
"Go and grab 130G of long-read mock microbial community data from PromethION and 36G from MinION over here, if you fancy: https://github.com/LomanLab/mockcommunity โฆ #UKGS18 - could be useful for bioinformatics pipeline validation and method development!"
https://twitter.com/Hasindu2008/status/1628569325895585793
"Squigulator r10 branch https://github.com/hasindu2008/squigulator/tree/r10 can simulate r10.4.1 signals. Also f5c r10 branch https://github.com/hasindu2008/f5c/tree/r10 can do resquiggle and eventalign for R10.4.1. Note: still work in progress and improvements are on the way. Thanks, @nanopore for providing the pore-model."
-
Squigulator - "a tool for simulating nanopore raw signal data"
-
Plus, "...f5c r10 branch https://github.com/hasindu2008/f5c/tree/r10 can do resquiggle and eventalign for R10.4.1"
-
https://twitter.com/Hasindu2008/status/1656470090668449793 May 2023
"Here is a pre-print for squigulator: a tool for simulating @nanopore raw signal data https://biorxiv.org/content/10.1101/2023.05.09.539953 Squigulator is easy to use, simple, fast, supports dna/rna and r9/r10 and modifiable parameters, and the output can be basecalled using nanopore basecallers."
-
a script that allows for using long reads make mock short read data: Create short paired-end reads from long reads
"To test different approaches for assembling genomes, I needed data with known microbial content. Only long reads were available, but I needed to test the algorithm on short paired-end reads. This script was written to create short reads from long reads."
- Single cell: https://twitter.com/jsb_ucla/status/1656690298435821568. May 2023.
"Our single-cell and spatial omics simulator scDesign3 is now online: https://nature.com/articles/s41587-023-01772-1 scDesign3 has two functionalities: (1) synthetic data simulation and (2) real data interpretation and modification 1/"