seryrzu / centroflye Goto Github PK
View Code? Open in Web Editor NEWAn algorithm for centromere assembly using long error-prone reads
License: BSD 3-Clause "New" or "Revised" License
An algorithm for centromere assembly using long error-prone reads
License: BSD 3-Clause "New" or "Revised" License
Hi seryrzu:
Thanks for the great job of T2T assembling. I have question about the centroFlye using in other species, such as bovine. We do not know the HOR unit and monomor in chromosomeX, but easier to get prexfix reads and suffix reads. How could we assembly this centromere, Will the centroFlye mono mode could help to assemble the genome? Thanks a lot if you could reply.
Best
Huanfa
Hi,
In our project we are currently having issues with the asssembly of long repetitive regions of the genome and we would like to know if your approach (centroflye) would work on repetitive regions others then centromers.
If this is the case I would appreciate if you could explain to me how to make the reference file necessary for running centrofly.
We have long (Nanopore) reads.
Thanks,
j.
Hello,
I am trying to construct the human ChrX centromere as demonstrated in your paper, except with reads from our own sample: 87X of regular Pacbio Sequel II CLR reads (N50=28.3k, 10X of >50k reads). May I ask what are the parameters that I should adjust besides the ones below?
error-mode=pacbio
min-coverage=2
coverage=3
Thanks for creating such a valuable tool,
Tim
Dear author, I recently tried to go through the process in the way of your article, but I had a problem in the first step, my code is:
bash /home/Mpzhang/HpDu/centroFlye-master/scripts/read_recruitment/run_read_recruitment.sh rel2.fastq.gz results_cenX/centromeric_reads 50 11100000
,but reported the wrong, I looked carefully, do not understand where their mistakes, can give me advice on how to correct
I downloaded the link you provided "https://s3.amazonaws.com/nanopore-human-wgs/chm13/nanopore/rel2/rel2_to_GRCh38.cram
But the last error reported at runtime,Couldn't find the reason for this cram?
Hi @seryrzu ,
thank you for your great work.
When I do the read_recruitment part, I got this error:
xargs -I '{}' -P 150 /storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/9.gap/completeness/centroFlye/scripts/read_recruitment/rr /storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/9.gap/completeness/centroFlye/data/GgDNA_chr1_cen1_repeat_sequence.fasta '{}' '../split_rr/{}_cen.fasta' 350
rr: rr.cpp:24: char complement(char): Assertion `false' failed.
rr: rr.cpp:24: char complement(char): Assertion `false' failed.
rr: rr.cpp:24: char complement(char): Assertion `false' failed.
rr: rr.cpp:24: char complement(char): Assertion `false' failed.
rr: rr.cpp:24: char complement(char): Assertion `false' failed.
rr: rr.cpp:24: char complement(char): Assertion `false' failed.
xargs: /storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/9.gap/completeness/centroFlye/scripts/read_recruitment/rr: terminated by signal 6
xargs: /storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/9.gap/completeness/centroFlye/scripts/read_recruitment/rr: terminated by signal 6
rr: rr.cpp:24: char complement(char): Assertion `false' failed.
rr: rr.cpp:24: char complement(char): Assertion `false' failed.
and my code is :
bash scripts/read_recruitment/run_read_recruitment.sh data/Chicken_ONT_93X.fa results_cen1/centromeric_reads 150 7700000 data/GgDNA_chr1_cen1_repeat_sequence.fasta 350
then I check the rr.cpp
#include <cassert>
#include <algorithm>
#include <cstdio>
#include <cstdlib>
#include <zlib.h>
#include "edlib.h"
#include "kseq/kseq.h"
KSEQ_INIT(gzFile, gzread)
char complement(char n)
{
switch(n)
{
case 'A':
return 'T';
case 'T':
return 'A';
case 'G':
return 'C';
case 'C':
return 'G';
}
assert(false);
return ' ';
}
It seems something went wrong, so the code arrived at assert()
, but I don't know why, looking forward to your answer.
Thank you sincerely
Johnson
Dear author
Using the data you provided, your software ran through the assembly of centromeres in the article,But I'm also trying to use manual assembly of centromeres,Want to Verify each other,Would you please provide a ID of these reads across centromeres,Since I used the reads ID provided in T2T article to find that it was not found in your centromere.fasta reads, only 1 out of 10 reads can be found, I guess it might be data is different version .
I saw that T2T used 12 reads to assemble the centromere,And so I'd like to ask if you can provide reads ID, across centromeres(assembly),I want to use a manual method to verify
Hello,
I'd like to try to run cen6 pipeline but it doesn't work because scripts/ext/stringdecomposer/longreads_decomposer.py
is not included in ext
folder in the current master HEAD(a1a314d). How should I handle this?
Thanks in advance.
Can you tag a release? That'll make it easier to put this in bioconda.
Hello!
Could you please help me install centroFlye?
command1
git clone --verbose --progress --recurse-submodules --single-branch --branch cF_NatBiotech_paper_Xv0.8.3-6v0.1.3 [email protected]:seryrzu/centroFlye.git
error1
Cloning into 'centroFlye'...
Permission denied (publickey).
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
.
command2
git clone --verbose --progress --recurse-submodules --single-branch https://github.com/seryrzu/centroFlye.git
error2
Cloning into 'centroFlye'...
POST git-upload-pack (204 bytes)
remote: Enumerating objects: 685, done.
remote: Total 685 (delta 0), reused 0 (delta 0), pack-reused 685
Receiving objects: 100% (685/685), 2.45 MiB | 1.45 MiB/s, done.
Resolving deltas: 100% (432/432), done.
Submodule 'scripts/ext/stringdecomposer' ([email protected]:ablab/stringdecomposer.git) registered for path 'scripts/ext/stringdecomposer'
Submodule 'scripts/ext/tandemQUAST' ([email protected]:ablab/tandemQUAST.git) registered for path 'scripts/ext/tandemQUAST'
Cloning into 'scripts/ext/stringdecomposer'...
Permission denied (publickey).
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
Clone of '[email protected]:ablab/stringdecomposer.git' into submodule path 'scripts/ext/stringdecomposer' failed
With regards,
Jin-Young
Dear author, I recently tried to go through the process in the way of your article, but I had a problem in the first step, my code is:
bash /home/Mpzhang/HpDu/centroFlye-master/scripts/read_recruitment/run_read_recruitment.sh rel2.fastq.gz results_cenX/centromeric_reads 50 11100000
,but reported the wrong, I looked carefully, do not understand where their mistakes, can give me advice on how to correct
I tried cenX assembly on my sample but getting error messages on the final tandemquast step.
It seems to be due to the same reads are recruited twice.
In my sample, 4 reads were reported to be duplicated.
Is it okay if I manually remove this duplicate or does this mean something is wrong?
command1
grep "^>" centromeric_reads.fasta | sort | uniq -c | awk '$1>1'
stdout1
2 >322f6dbf-c5bf-4486-8e2c-e764ea4947bf
2 >507fbfdd-3676-4c83-bfc2-2cf2073ea27a
2 >56d4b6ec-9491-4c3c-8d74-d727a6bb3a4c
2 >5e4107b0-c80b-48d4-bbad-fceb11f48b3b
command2
head -n 10000 split_fasta_0.fasta | grep -n 322f6dbf-c5bf-4486-8e2c-e764ea4947bf
stdout2
2065:>322f6dbf-c5bf-4486-8e2c-e764ea4947bf runid=869182ee9e718c030eb89012b70e7246ee05373c read=38 ch=2666 start_time=2020-08-11T09:13:23Z flow_cell_id=PAF10281 protocol_group_id=20200811 sample_id=test
9088:>322f6dbf-c5bf-4486-8e2c-e764ea4947bf runid=869182ee9e718c030eb89012b70e7246ee05373c read=38 ch=2666 start_time=2020-08-11T09:13:23Z flow_cell_id=PAF10281 protocol_group_id=20200811 sample_id=test
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.