nextomics / nextpolish2 Goto Github PK

View Code? Open in Web Editor NEW

53.0 5.0 2.0 7.58 MB

Repeat-aware polishing genomes assembled using HiFi long reads

License: Other

Rust 94.31% Shell 0.56% Python 5.13%

genome-assembly genome-polish t2t-polish

nextpolish2's People

Contributors

Stargazers

Watchers

nextpolish2's Issues

job killed

I used nextpolish2 to polish a few genomes several days ago. When I looked the log file today, I found this report in some jobs:
/var/spool/slurm/d/job1113564/slurm_script: line 31: 37987 Killed nextPolish2 -t ${threads} -r ${mat_HiFi_mapping_file} ${mat_asm} $k21 $k31 -o ${mat_asm}.np2.fasta

and in the initial jobs which were run in the first few days, there was all right about the polishing process.
Best wishes!

:Error { kind: UnexpectedEof, message: "failed to fill whole buffer" }', src/main.rs:1695:51

hello
my code
nextPolish2 -r hifi.map.sort.bam sam.fa k21.yak k31.yak >asm.np2.fa
error
thread '' panicked at 'called Result::unwrap() on an Err value: Error { kind: UnexpectedEof, message: "failed to fill whole buffer" }', src/main.rs:1695:51
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace
Aborted (core dumped)

nextpolish2 runs too slow

Hi,

I am polishing human genome assembly with ~30X HiFi reads and ~30X paired end illumina reads, using 30 CPU cores with 192GB RAM. The job has run ~15 hours. However, according to your paper, it only takes ~90 to 100 min with 5 CPU cores and 256GB RAM. Is it normal to run the job for such a long time?

Below is the code I used to prepare the input and run nextpolish2. The 15 hour is for nextpolish2 alone.

# minimap2 align to assembly (2.26-r1175)
minimap2 -ax map-hifi -Q -t $threads \
${asmpath}/${sample}.asm.bp.hap1.p_ctg.inspector.fa \
${lrspath}/${sample}.filt.fq.gz | samtools sort -o ${output_prefix}.hap1.bam - 
samtools index -@ $threads ${output_prefix}.hap1.bam


# prepare k-mer count (0.1-r69-dirty)
yak count -o ${output_prefix}.k21.yak -k 21 -b 37 -t $threads \
<(zcat ${srspath}/${sample}_1_paired.fq.gz) <(zcat ${srspath}/${sample}_2_paired.fq.gz) 
yak count -o ${output_prefix}.k31.yak -k 31 -b 37 -t $threads \
<(zcat ${srspath}/${sample}_1_paired.fq.gz) <(zcat ${srspath}/${sample}_2_paired.fq.gz) 

# run nextPolish2 (nextPolish2 0.2.0)
nextPolish2 -t $threads \
${output_prefix}.hap1.bam \
${asmpath}/${sample}.asm.bp.hap1.p_ctg.inspector.fa \
${output_prefix}.k21.yak \
${output_prefix}.k31.yak > ${output_prefix}.asm.bp.hap1.p_ctg.inspector.np2.fa

Polish using Oxford Nanopore assembly

Hi,
Can we use NextPolish2 to polish the assembly generated by Oxford nanopore reads? Or is it only applicable for HiFi reads?

Thanks

Polishing by HiFi data to draft assembly made of CLR data.

I am involved in a genome project for a wild animal species with a huge genome (6.5 Gb). We expect the genome to be highly heterozygous, and perhaps over half of it will be repetitive sequences.

Due to budget constraints, we could only obtain x80 coverage of CLR, x20 of Hifi data, and x150 of short-read data. I plan to use CLR to make draft assembly and then use HiFi.
Would it be effective to use HiFi data in Nextpolish2 for policing against the draft assembly made with CLR?

Any suggestions would be appreciated.

which asm.fa should be used in nextpolish2? the contig.fa or genome.fa

Hi! NextPolish2 is a great software for correcting errors.
I have poor knowledges about assembly. And I want to know which asm.fa should be used in nextpolish2?
I run hifiasm and got contig.fa. And after running juicer and 3d-dna, I got the genome.fa. May I use genome.fa directly in nextpolish2? Or maybe should I use contig.fa with nextpolish2 first, then run juicer and 3d-dna, and fix the genome.fa with nextpolish2 again?
Thanks!

two phased haplotypes questions

Hi Jiang,

Thank you so much for developing this fantastic tool.

In your example, you used an asm.fa. However, with hifiasm, it yields two phased haplotypes. In this scenario, what would be the recommended approach to rectify switch errors in both haplotypes? Should I run the NextPolish2 pipeline separately for each haplotype?

Thanks in advances!

Best,
Lin

Problems with installation

Updating crates.io index
error: failed to get clap as a dependency of package nextPolish2 v0.1.0 (/hdd/data/wangjq/Software/NextPolish2-0.1.0)

Caused by:
failed to load source for dependency clap

Caused by:
Unable to update registry crates-io

Caused by:
failed to fetch https://github.com/rust-lang/crates.io-index

Caused by:
network failure seems to have happened
if a proxy or similar is necessary net.git-fetch-with-cli may help here
https://doc.rust-lang.org/cargo/reference/config.html#netgit-fetch-with-cli

Caused by:
SSL error: received early EOF; class=Ssl (16); code=Eof (-20)

Can you get a conda installation?

memory allocation failed

Dear developer,

I was trying to polish the assembly by using HiFi reads, but constantly got the error of "memory allocation of xxx bytes failed".
Could you please your comments?

Thanks a lot
Crow

Why no any improvement of QV is achieved using NextPolish2?

Dear the authors,

Thanks for developing such a useful tool to polish the results of Hifiasm. I try to polish my Hifiasm's assembly (assembled with 40 × hifi reads) but found the QV of the polished assembly is as the same as the unpolished one. The species genome is highly heterozygous (rate: 0.77%). I do not know why? Could you be kind to help me?

The commands used are as follows:
hifiasm -t 60 -o Pvat -l 2 -s 0.75 --h1 R1.fq.gz --h2 R2.fq.gz hifi.fastq.gz
winnowmap -k 21 -t 100 -W repetitive_k21.txt -ax map-pb Pvat.hic.p_ctg.fa hifi.fastq.gz |samtools sort -@ 100 -o hifi.map.sort.bam -
yak count -t 100 -o k21.yak -k 21 -b 37 <(zcat illumina_.fq.gz)
yak count -t 100 -o k31.yak -k 31 -b 37 <(zcat illumina_.fq.gz)
nextPolish2 -r hifi.map.sort.bam Pvit.hic.p_ctg.fa k21.yak k31.yak -t 100 -o Pvat.hic.p_ctg.polished.v1.fa

The QV before polishing:
45.1437

The QV after polishing:
45.1451

Thanks again!
Bob

thread '<unnamed>' panicked at 'byte index 11129 is out of bounds of ...

我按照

https://github.com/Nextomics/NextPolish2/blob/main/doc/benchmark3.md

这个网址的教程对我trio binning后的父母本进行polish。以下是报错：
thread '<unnamed>' panicked at 'byte index 11129 is out of bounds of TTGCTTCTTTGACCAAAACACCACCTTATGACATTGGTTCTCCAAATCTTATGTCCTTCTGACATACAAAATACTACACAATGTCATATCATTCTATGTCAATAGTTCCCAAAAGTCTTAACTTGTTCCAGCATCAACTCTAAAGTCCAAAGTTTCATCTGAGATTCAAGGCAAGTTCCTTTCAGCTATGAGCCTTTAGGATCAATAAAAATTTATTTACTTTCAAGATACAATGATTTTGCAAGCATTGGGTAAA[...], src/main.rs:409:36 note: run with RUST_BACKTRACE=1 environment variable to display a backtrace /var/spool/slurm/d/job1102999/slurm_script: line 39: 88136 Aborted (core dumped) nextPolish2 -r ${mat_HiFi_mapping_file} ${mat_asm} $k21 $k31 -o ${mat_asm}.np2.fasta
其中mat_HiFi_mapping_file=au_po_mat/racon_mat.meryl.iter_2.winnowmap.sorted.bam（因为我看到教程里的似乎只是比对了一下，用的并不是racon polish之后再比对的文件，所以我多迭代了一轮，用了第二轮的bam文件）

线程的话，我一开始是20，后面改成5，这里是1，都会报这个错。

BUSCO score

Hello author, I would like to know if Nextpolish2 can improve BUSCO score? The genome I assembled with hifiasm is of high quality, but I'm not sure if I need to polish it

Genome gets smaller after correction

Hello author, I tried to use your software to correct two genomes again, and the genomes became smaller. What is the reason? In theory, can't it only correct SNPs and indels?

These are my codes:

minimap2 -ax map-hifi -t 60 LC_filled_N0.fasta /public1/home/yinhang/projects/two_genomes/01_data/HiFi_fastq/LC.ccs.fq|samtools sort -o LC.sort.bam -
samtools index LC.sort.bam

yak count -o k21_ngs.yak -k 21 -b 37 NGS_correct_1.fq.gz NGS_correct_2.fq.gz
yak count -o k31_ngs.yak -k 31 -b 37 NGS_correct_1.fq.gz NGS_correct_2.fq.gz

nextPolish2 -t 60 LC.sort.bam LC_filled_N0.fasta k21_ngs.yak k31_ngs.yak > LC_corrected_N0.fa

Correction of haplotypes

If I am using haplotype assembly with hifiasm via hic data instead of trio data, will NextPolish2 be able to do haplotype aware error correction, and what should I do in this case?

yak for Illumina data?

Hi, @moold

Is the yak take the HiFi kmer or Illumina kmer? In the code, it was from sr.R*.fq.gz, but the documentation didn't mention. So this could be from the short reads or HiFi reads?

Installed issue

Hi,

I got the error: failed to run custom build command for libz-sys v1.1.12, when run installion with 'cargo build --release'.

Do you have any ideas? Thank you!

Jing

Process file containing polish sites and sequences

I used nextPolish2 to improve my Arabidopsis assembly, and I generated the polished file smoothly. But the end result seems to be only one Fasta file, I would like to ask if it is possible to get a file with polished location and sequence, like Pilon's change.txt? In addition to this, I would like to ask if you have any suggestions regarding the kmer size to use? Is it just that the more k the better?

memory allocation failed

Hi！
I was tring to polish a genome with 1.3g at a node with 2T memory, but constantly got the error of "memory allocation of xxx bytes failed". According to your previous recommendation, I decraesed -t to 1, and it failed. Then, I split my genome and bam file, while it turns out the same error.
Could you please your comments?

nextomics / nextpolish2 Goto Github PK

nextpolish2's People

Contributors

Stargazers

Watchers

nextpolish2's Issues

Recommend Projects

Recommend Topics

Recommend Org