dark_and_camouflaged_genes's People
dark_and_camouflaged_genes's Issues
Missing annotation files?
Hi Mark,
Thanks for making this, I'm really excited to try this out.
I'm trying to use your supplied .bed files to run steps 6-10. I think there are some files missing for steps 9 & 10. Can you upload the corresponding annotation files?
Files seem to be missing:
09_FIND_FALSE_POSITIVES /submit.sh
CAMO_ANNOTATION="../results/hg38_no_alt/illuminaRL100/illuminaRL100.hg38_no_alt.camo_annotations.txt"
10_VARIANT_FILTERING
ANNO_BED = "./results/annotations/hg38/Homo_sapiens.GRCh38.93.annotation.hg38.bed"
Could you put the annotation files (that correspond to your bed files) in github? I'm using b37.
Thank you,
Pauline
Problems while running 05_CREATE_BED_FILE
Hello, I tried to use your script to detect the camo regions, but I encountered the following error when I ran 05_CREATE_BED_FILE (extract_camo_regions.py):
Wed Jul 26 20:08:19 CST 2023 python extract_camo_regions.py
Traceback (most recent call last):
File "extract_camo_regions.py", line 113, in <module>
main(sys.argv[1], sys.argv[2], sys.argv[3], sys.argv[4], sys.argv[5])
File "extract_camo_regions.py", line 94, in main
group_pos = [regions[region_id]]
KeyError: 'DDX11L1_1::chr1:11869-12227'
Actually, I have no idea about this, and I don't know if there's something wrong with the format that KeyError points out?
Looking forward to your reply!
Alt-aware alignment files?
Hi there,
Thank you for your research, and for making this repository public.
Our group is running the pipeline using our own data. We were wondering whether to use our alt-aware alignment files or our non-alt-aware files.
Would this make a difference?
Thanks,
Kaitlyn
Issues with reproducing the .realign.sorted.bed file
Hello,
First of all, thank you for this interesting work and making available your code to the community.
I am trying to reproduce some of the steps of your pipeline using our own data and I noticed something strange in the .realign.sorted.bed I am producing compared to the one your are making available here.
Particularly, your .realign.sorted.bed contains the following coordinates for CR1 (among other coordinates):
chr1 207533408 207533588 CR1_10;CR1_26;CR1_18 3
chr1 207551963 207552143 CR1_18;CR1_10;CR1_26 3
chr1 207569058 207569238 CR1_26;CR1_18;CR1_10 3
But using the reference genome genome hg38 and the Ensemble build 93, I obtain:
chr1 207533408 207533588 CR1_10;CR1_26;CR1_18 3
chr1 207551963 207552143 CR1_18;CR1_10;CR1_26 3
chr1 207569058 207569122 CR1_26;CR1_18;CR1_10 3
chr1 207569122 207569238 CR1_UTR_2;CR1_18;CR1_10 3
Basically, your last CR1_26 coordinates show a 180bp region but in my case, it is CR1_26 + CR1_UTR_2 (the 2 last coordinates) which is 180bp long. While CR1_10, CR1_18 and CR1_26 are reported 3 times, CR1_UTR_2 is only reported one time. I checked in the GFF3 file of Ensemble build 93 and indeed, CR1_26 has coordinates chr1:207,569,058-207,569,122 and should be 64bp long.
Using Ensemble build 98, I obtain something even more strange:
chr1 207533408 207533588 CR1_10;AL137789.1_intron_2;AL137789.1_1;CR1_UTR_2;CR1_18 5
chr1 207551963 207552143 CR1_18;CR1_10;AL137789.1_intron_2;AL137789.1_1;CR1_UTR_2 5
chr1 207569058 207569122 CR1_26;AL137789.1_intron_;CR1_182;AL137789.1_1;CR1_10 5
chr1 207569122 207569238 CR1_UTR_2;CR1_18;CR1_10;AL137789.1_intron_2;AL137789.1_1 5
CR1_26 is this time reported similar to CR1_10 and CR1_18 only once while CR1_UTR_2 is reported 3 times.
Maybe I am missing a step somewhere in your pipeline where CR1_26 is merged with CR1_UTR_2 but at that point, I do not understand nor see how to obtain the coordinates that you have for chr1:207,569,058-207,569,122 (CR1_26) while I get 2 coordinates corresponding to CR1_26 and CR1_UTR_2.
I would appreciate any help and insights on this problem.
Thank you.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.