nextomics / nextpolish2 Goto Github PK
View Code? Open in Web Editor NEWRepeat-aware polishing genomes assembled using HiFi long reads
License: Other
Repeat-aware polishing genomes assembled using HiFi long reads
License: Other
I used nextpolish2 to polish a few genomes several days ago. When I looked the log file today, I found this report in some jobs:
/var/spool/slurm/d/job1113564/slurm_script: line 31: 37987 Killed nextPolish2 -t ${threads} -r ${mat_HiFi_mapping_file} ${mat_asm} $k21 $k31 -o ${mat_asm}.np2.fasta
and in the initial jobs which were run in the first few days, there was all right about the polishing process.
Best wishes!
hello
my code
nextPolish2 -r hifi.map.sort.bam sam.fa k21.yak k31.yak >asm.np2.fa
error
thread '' panicked at 'called Result::unwrap()
on an Err
value: Error { kind: UnexpectedEof, message: "failed to fill whole buffer" }', src/main.rs:1695:51
note: run with RUST_BACKTRACE=1
environment variable to display a backtrace
Aborted (core dumped)
Hi,
I am polishing human genome assembly with ~30X HiFi reads and ~30X paired end illumina reads, using 30 CPU cores with 192GB RAM. The job has run ~15 hours. However, according to your paper, it only takes ~90 to 100 min with 5 CPU cores and 256GB RAM. Is it normal to run the job for such a long time?
Below is the code I used to prepare the input and run nextpolish2. The 15 hour is for nextpolish2 alone.
# minimap2 align to assembly (2.26-r1175)
minimap2 -ax map-hifi -Q -t $threads \
${asmpath}/${sample}.asm.bp.hap1.p_ctg.inspector.fa \
${lrspath}/${sample}.filt.fq.gz | samtools sort -o ${output_prefix}.hap1.bam -
samtools index -@ $threads ${output_prefix}.hap1.bam
# prepare k-mer count (0.1-r69-dirty)
yak count -o ${output_prefix}.k21.yak -k 21 -b 37 -t $threads \
<(zcat ${srspath}/${sample}_1_paired.fq.gz) <(zcat ${srspath}/${sample}_2_paired.fq.gz)
yak count -o ${output_prefix}.k31.yak -k 31 -b 37 -t $threads \
<(zcat ${srspath}/${sample}_1_paired.fq.gz) <(zcat ${srspath}/${sample}_2_paired.fq.gz)
# run nextPolish2 (nextPolish2 0.2.0)
nextPolish2 -t $threads \
${output_prefix}.hap1.bam \
${asmpath}/${sample}.asm.bp.hap1.p_ctg.inspector.fa \
${output_prefix}.k21.yak \
${output_prefix}.k31.yak > ${output_prefix}.asm.bp.hap1.p_ctg.inspector.np2.fa
Hi,
Can we use NextPolish2 to polish the assembly generated by Oxford nanopore reads? Or is it only applicable for HiFi reads?
Thanks
I am involved in a genome project for a wild animal species with a huge genome (6.5 Gb). We expect the genome to be highly heterozygous, and perhaps over half of it will be repetitive sequences.
Due to budget constraints, we could only obtain x80 coverage of CLR, x20 of Hifi data, and x150 of short-read data. I plan to use CLR to make draft assembly and then use HiFi.
Would it be effective to use HiFi data in Nextpolish2 for policing against the draft assembly made with CLR?
Any suggestions would be appreciated.
Hi! NextPolish2 is a great software for correcting errors.
I have poor knowledges about assembly. And I want to know which asm.fa should be used in nextpolish2?
I run hifiasm and got contig.fa. And after running juicer and 3d-dna, I got the genome.fa. May I use genome.fa directly in nextpolish2? Or maybe should I use contig.fa with nextpolish2 first, then run juicer and 3d-dna, and fix the genome.fa with nextpolish2 again?
Thanks!
Hi Jiang,
Thank you so much for developing this fantastic tool.
In your example, you used an asm.fa. However, with hifiasm, it yields two phased haplotypes. In this scenario, what would be the recommended approach to rectify switch errors in both haplotypes? Should I run the NextPolish2 pipeline separately for each haplotype?
Thanks in advances!
Best,
Lin
Updating crates.io index
error: failed to get clap
as a dependency of package nextPolish2 v0.1.0 (/hdd/data/wangjq/Software/NextPolish2-0.1.0)
Caused by:
failed to load source for dependency clap
Caused by:
Unable to update registry crates-io
Caused by:
failed to fetch https://github.com/rust-lang/crates.io-index
Caused by:
network failure seems to have happened
if a proxy or similar is necessary net.git-fetch-with-cli
may help here
https://doc.rust-lang.org/cargo/reference/config.html#netgit-fetch-with-cli
Caused by:
SSL error: received early EOF; class=Ssl (16); code=Eof (-20)
Can you get a conda installation?
Dear developer,
I was trying to polish the assembly by using HiFi reads, but constantly got the error of "memory allocation of xxx bytes failed".
Could you please your comments?
Thanks a lot
Crow
Dear the authors,
Thanks for developing such a useful tool to polish the results of Hifiasm. I try to polish my Hifiasm's assembly (assembled with 40 × hifi reads) but found the QV of the polished assembly is as the same as the unpolished one. The species genome is highly heterozygous (rate: 0.77%). I do not know why? Could you be kind to help me?
The commands used are as follows:
hifiasm -t 60 -o Pvat -l 2 -s 0.75 --h1 R1.fq.gz --h2 R2.fq.gz hifi.fastq.gz
winnowmap -k 21 -t 100 -W repetitive_k21.txt -ax map-pb Pvat.hic.p_ctg.fa hifi.fastq.gz |samtools sort -@ 100 -o hifi.map.sort.bam -
yak count -t 100 -o k21.yak -k 21 -b 37 <(zcat illumina_.fq.gz)
yak count -t 100 -o k31.yak -k 31 -b 37 <(zcat illumina_.fq.gz)
nextPolish2 -r hifi.map.sort.bam Pvit.hic.p_ctg.fa k21.yak k31.yak -t 100 -o Pvat.hic.p_ctg.polished.v1.fa
The QV before polishing:
45.1437
The QV after polishing:
45.1451
Thanks again!
Bob
我按照
https://github.com/Nextomics/NextPolish2/blob/main/doc/benchmark3.md
这个网址的教程对我trio binning后的父母本进行polish。以下是报错:
thread '<unnamed>' panicked at 'byte index 11129 is out of bounds of TTGCTTCTTTGACCAAAACACCACCTTATGACATTGGTTCTCCAAATCTTATGTCCTTCTGACATACAAAATACTACACAATGTCATATCATTCTATGTCAATAGTTCCCAAAAGTCTTAACTTGTTCCAGCATCAACTCTAAAGTCCAAAGTTTCATCTGAGATTCAAGGCAAGTTCCTTTCAGCTATGAGCCTTTAGGATCAATAAAAATTTATTTACTTTCAAGATACAATGATTTTGCAAGCATTGGGTAAA[...], src/main.rs:409:36 note: run with RUST_BACKTRACE=1 environment variable to display a backtrace /var/spool/slurm/d/job1102999/slurm_script: line 39: 88136 Aborted (core dumped) nextPolish2 -r ${mat_HiFi_mapping_file} ${mat_asm} $k21 $k31 -o ${mat_asm}.np2.fasta
其中mat_HiFi_mapping_file=au_po_mat/racon_mat.meryl.iter_2.winnowmap.sorted.bam
(因为我看到教程里的似乎只是比对了一下,用的并不是racon polish之后再比对的文件,所以我多迭代了一轮,用了第二轮的bam文件)
线程的话,我一开始是20,后面改成5,这里是1,都会报这个错。
Hello author, I would like to know if Nextpolish2 can improve BUSCO score? The genome I assembled with hifiasm is of high quality, but I'm not sure if I need to polish it
Hello author, I tried to use your software to correct two genomes again, and the genomes became smaller. What is the reason? In theory, can't it only correct SNPs and indels?
These are my codes:
minimap2 -ax map-hifi -t 60 LC_filled_N0.fasta /public1/home/yinhang/projects/two_genomes/01_data/HiFi_fastq/LC.ccs.fq|samtools sort -o LC.sort.bam -
samtools index LC.sort.bam
yak count -o k21_ngs.yak -k 21 -b 37 NGS_correct_1.fq.gz NGS_correct_2.fq.gz
yak count -o k31_ngs.yak -k 31 -b 37 NGS_correct_1.fq.gz NGS_correct_2.fq.gz
nextPolish2 -t 60 LC.sort.bam LC_filled_N0.fasta k21_ngs.yak k31_ngs.yak > LC_corrected_N0.fa
If I am using haplotype assembly with hifiasm via hic data instead of trio data, will NextPolish2 be able to do haplotype aware error correction, and what should I do in this case?
Hi, @moold
Is the yak
take the HiFi kmer or Illumina kmer? In the code, it was from sr.R*.fq.gz
, but the documentation didn't mention. So this could be from the short reads or HiFi reads?
I used nextPolish2 to improve my Arabidopsis assembly, and I generated the polished file smoothly. But the end result seems to be only one Fasta file, I would like to ask if it is possible to get a file with polished location and sequence, like Pilon's change.txt? In addition to this, I would like to ask if you have any suggestions regarding the kmer size to use? Is it just that the more k the better?
Hi!
I was tring to polish a genome with 1.3g at a node with 2T memory, but constantly got the error of "memory allocation of xxx bytes failed". According to your previous recommendation, I decraesed -t
to 1, and it failed. Then, I split my genome and bam file, while it turns out the same error.
Could you please your comments?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.