Coder Social home page Coder Social logo

hingeassembler / hinge Goto Github PK

View Code? Open in Web Editor NEW
64.0 14.0 9.0 6.04 MB

Software accompanying "HINGE: Long-Read Assembly Achieves Optimal Repeat Resolution"

Home Page: http://genome.cshlp.org/content/27/5/747.full.pdf+html?sid=39918b0d-7a7d-4a12-b720-9238834902fd

License: Other

Python 30.71% Shell 0.42% CMake 0.28% C 28.72% C++ 37.51% Jupyter Notebook 1.41% Nix 0.75% Dockerfile 0.19%
genome-assembly

hinge's People

Contributors

0xaf1f avatar fxia22 avatar govinda-kamath avatar ilanshom avatar jameslz avatar mr-c avatar spock avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hinge's Issues

Out of range error in draft_assembly step

I get the following error in draft_assembly step after the process is running for quite a while:

...
In total 450 lanes
0
482901
list size:0
list size:0
list size:0
list size:0
list size:0
list size:0
terminate called after throwing an instance of 'std::out_of_range'
  what():  basic_string::substr: __pos (which is 13829) > this->size() (which is 10798)
Aborted

Maybe it is connected to #64 ? The process is running for quite a while until I encounter the error. I have about 2.5gb of data in the initial Fasta file, however I do not think that it is related to the amount of data.

What further do you need for debugging?

issues with the ecoli demo

I was able to compile and install the tool. When I tried to run the demo on ecoli I was able to run everything OK until this point:

stelo@H4:~/HINGE/data$ mkdir log
stelo@H4:~/HINGE/data$ Reads_filter --db ecoli --las ecoli.las -x ecoli --config ~/HINGE/utils/nominal.ini
[2016-07-14 14:06:57.906] [log] [info] Reads filtering
[2016-07-14 14:06:57.906] [log] [info] name of db: ecoli, name of .las file ecoli.las
[2016-07-14 14:06:57.906] [log] [info] name of fasta: , name of .paf file
[2016-07-14 14:06:57.906] [log] [info] Parameters passed in
[filter]
length_threshold = 1000;
quality_threshold = 0.23;
n_iter = 3; // filter iteration
aln_threshold = 1000;
min_cov = 5;
cut_off = 300;
theta = 300;
use_qv = true;
[running]
n_proc = 12;
[draft]
min_cov = 10;
trim = 200;
edge_safe = 100;
tspace = 900;
step = 50;
[consensus]
min_length = 4000;
trim_end = 200;
best_n = 1;
quality_threshold = 0.23;

[layout]
hinge_slack = 1000
min_connected_component_size = 8
[2016-07-14 14:06:57.910] [log] [info] Load alignments from ecoli.las
[2016-07-14 14:06:57.910] [log] [info] # Alignments: 15491688
[2016-07-14 14:06:57.910] [log] [info] # Reads: 82590
[2016-07-14 14:07:07.139] [log] [info] Input data finished
[2016-07-14 14:07:07.198] [log] [info] No debug restrictions.
[2016-07-14 14:07:07.247] [log] [info] use_qv_mask set to true
[2016-07-14 14:07:07.247] [log] [info] use_qv_mask set to true
[2016-07-14 14:07:07.247] [log] [info] number processes set to 12
[2016-07-14 14:07:07.247] [log] [info] LENGTH_THRESHOLD = 1000
[2016-07-14 14:07:07.247] [log] [info] QUALITY_THRESHOLD = 0.23
[2016-07-14 14:07:07.247] [log] [info] N_ITER = 3
[2016-07-14 14:07:07.247] [log] [info] ALN_THRESHOLD = 1000
[2016-07-14 14:07:07.247] [log] [info] MIN_COV = 5
[2016-07-14 14:07:07.247] [log] [info] CUT_OFF = 300
[2016-07-14 14:07:07.247] [log] [info] THETA = 300
[2016-07-14 14:07:07.247] [log] [info] EST_COV = 0
[2016-07-14 14:07:07.247] [log] [info] reso = 40
[2016-07-14 14:07:07.247] [log] [info] use_coverage_mask = true
[2016-07-14 14:07:07.247] [log] [info] COVERAGE_FRACTION = 3
[2016-07-14 14:07:07.247] [log] [info] MIN_REPEAT_ANNOTATION_THRESHOLD = 10
[2016-07-14 14:07:07.247] [log] [info] MAX_REPEAT_ANNOTATION_THRESHOLD = 20
[2016-07-14 14:07:07.247] [log] [info] REPEAT_ANNOTATION_GAP_THRESHOLD = 300
[2016-07-14 14:07:07.247] [log] [info] NO_HINGE_REGION = 500
[2016-07-14 14:07:07.247] [log] [info] HINGE_MIN_SUPPORT = 7
[2016-07-14 14:07:07.247] [log] [info] HINGE_BIN_PILEUP_THRESHOLD = 7
[2016-07-14 14:07:07.247] [log] [info] HINGE_READ_UNBRIDGED_THRESHOLD = 6
[2016-07-14 14:07:07.247] [log] [info] HINGE_BIN_LENGTH = 200
[2016-07-14 14:07:07.247] [log] [info] HINGE_TOLERANCE_LENGTH = 100
Segmentation fault (core dumped)

File log_2016-07-14_14-06.txt in the directory log is empty

Make the graph symmetric

  • [fix] Make the graph Watson-Crick complete.
  • Find a symmetric greedy metric, i.e. best right overlap of w is v <===> best right overlap of v' is w', this can avoid some bubbles

RSII as input

Hi,
One SMRT cell contains on h5 file, but when we converted it to Fasta file we get 3 Fasta files.

Can we combine these 3 Fasta files and import them to fasta2DB or do they have to be imported separately?

Thank you in advance

Michal

Hints for parametrization

We have a bact. with nasty 35-40kb repeats. The genome is about 6.8Mb in size. The four SMRTcells we sequenced give about 3 Gb of raw data.

Do you have some hints for the parametrization in nominal.ini and also for Gene M. tools?

I would go for more strict qual. parameters (lower values are more strict, right? :-) ) and maybe filter for longer reads. Would you agree and if yes which values? What else would you suggest?

We have a quite powerfull machine, so we don't mind to trade high qual. for high comp. demands.

Thx!

Segmentation fault when fruit.x.las empty

Hi,
Hinge runs in Segmentation fault when one of many x.las is empty

[2017-03-16 09:18:59.081] [log] [info] name of las: fruit.16.las
[2017-03-16 09:18:59.160] [log] [info] Load alignments from fruit.16.las
[2017-03-16 09:18:59.160] [log] [info] # Alignments: 127553882
[2017-03-16 09:20:34.739] [log] [info] # reads: 34130
[2017-03-16 09:20:34.739] [log] [info] # active reads: 1391/34130
[2017-03-16 09:20:34.739] [log] [info] Input data finished, part 16/30
[2017-03-16 09:21:13.952] [log] [info] kept 781263/127553882 overlaps,  324911/53119069 rev_overlaps in part 16/30
[2017-03-16 09:21:13.953] [log] [info] index finished
[2017-03-16 09:21:17.181] [log] [info] name of las: fruit.17.las
[2017-03-16 09:21:17.197] [log] [info] Load alignments from fruit.17.las
[2017-03-16 09:21:17.197] [log] [info] # Alignments: 0
/work/lorencm/apps/HINGE/inst/bin/hinge: line 8: 28975 Segmentation fault      hinging "$@"

Michal

is g++ 4.9 required?

I am getting a compilation error ...

...
-- The C compiler identification is GNU 4.8.4
-- The CXX compiler identification is GNU 4.8.4
-- Check for working C compiler: /usr/bin/gcc
-- Check for working C compiler: /usr/bin/gcc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/g++
-- Check for working CXX compiler: /usr/bin/g++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found ZLIB: /usr/lib/x86_64-linux-gnu/libz.so (found version "1.2.8")
-- Boost version: 1.54.0
-- Found the following Boost libraries:
--   graph
-- Configuring done
-- Generating done
-- Build files have been written to: /home/stelo/HINGE/build
[ 17%] Built target ini
[ 24%] Built target LA
[ 41%] Built target DB
[ 41%] Built target DW_banded
[ 41%] Built target PAF
[ 48%] Built target kmer_lookup
[ 58%] Built target falcon
[ 65%] Built target LAInterface
[ 68%] Building CXX object bin/filter/CMakeFiles/Reads_filter.dir/filter.cpp.o
[ 75%] Built target consensus
[ 82%] Built target io
[ 89%] Built target hinging
[ 96%] Built target draft_assembly
/home/stelo/HINGE/src/filter/filter.cpp: In function ‘int main(int, char**)’:
/home/stelo/HINGE/src/filter/filter.cpp:251:9: warning: ‘char* getwd(char*)’ is deprecated (declared at /usr/include/unistd.h:525) [-Wdeprecated-declarations]
         getwd(buff);
         ^
/home/stelo/HINGE/src/filter/filter.cpp:251:19: warning: ‘char* getwd(char*)’ is deprecated (declared at /usr/include/unistd.h:525) [-Wdeprecated-declarations]
         getwd(buff);
                   ^
In file included from /usr/include/c++/4.8/algorithm:62:0,
                 from /home/stelo/HINGE/src/filter/filter.cpp:10:
/usr/include/c++/4.8/bits/stl_algo.h: In instantiation of ‘_RandomAccessIterator std::__unguarded_partition(_RandomAccessIterator, _RandomAccessIterator, const _Tp&, _Compare) [with _RandomAccessIterator = __gnu_cxx::__normal_iterator<std::pair<int, int>*, std::vector<std::pair<int, int> > >; _Tp = std::pair<int, int>; _Compare = bool (*)(const std::pair<int, int>&, std::pair<int, int>&)]’:
/usr/include/c++/4.8/bits/stl_algo.h:2296:78:   required from ‘_RandomAccessIterator std::__unguarded_partition_pivot(_RandomAccessIterator, _RandomAccessIterator, _Compare) [with _RandomAccessIterator = __gnu_cxx::__normal_iterator<std::pair<int, int>*, std::vector<std::pair<int, int> > >; _Compare = bool (*)(const std::pair<int, int>&, std::pair<int, int>&)]’
/usr/include/c++/4.8/bits/stl_algo.h:2337:62:   required from ‘void std::__introsort_loop(_RandomAccessIterator, _RandomAccessIterator, _Size, _Compare) [with _RandomAccessIterator = __gnu_cxx::__normal_iterator<std::pair<int, int>*, std::vector<std::pair<int, int> > >; _Size = long int; _Compare = bool (*)(const std::pair<int, int>&, std::pair<int, int>&)]’
/usr/include/c++/4.8/bits/stl_algo.h:5499:44:   required from ‘void std::sort(_RAIter, _RAIter, _Compare) [with _RAIter = __gnu_cxx::__normal_iterator<std::pair<int, int>*, std::vector<std::pair<int, int> > >; _Compare = bool (*)(const std::pair<int, int>&, std::pair<int, int>&)]’
/home/stelo/HINGE/src/filter/filter.cpp:845:84:   required from here
/usr/include/c++/4.8/bits/stl_algo.h:2263:35: error: invalid initialization of reference of type ‘std::pair<int, int>&’ from expression of type ‘const std::pair<int, int>’
    while (__comp(*__first, __pivot))
                                   ^
make[2]: *** [bin/filter/CMakeFiles/Reads_filter.dir/filter.cpp.o] Error 1
make[1]: *** [bin/filter/CMakeFiles/Reads_filter.dir/all] Error 2
make: *** [all] Error 2

Fasta line is too long (> 9998 chars)

Hi,
I just clone a fresh HINGE code. Unfortunately, I got Fasta line is too long (> 9998 chars) by using find /All_RawData/Each_Cell_Raw/ -name "*.fasta" | xargs -I {} fasta2DB banana {}. This seems only to happen with PacBio Sequel data.

How is it possible to fix it?

Thank you in advance.

Michal

get_consensus_gfa.py error

After you having resolved #65 I get an error at last step (get_consensus_gfa.py):

user@myMachine:~/HINGE_all$ get_consensus_gfa.py $PWD pk pkid.G2.graphml pk.consensus.fasta 
Traceback (most recent call last):
  File "/home/bioinf/Software/HINGE/scripts/get_consensus_gfa.py", line 62, in <module>
    nodes_to_keep = [x for x in g.nodes() if consensus_contigs[g.node[x]['contig_id']] != '' ]
KeyError: 'contig_id'

AND I don't know what this means exactly and if it is related to this problem - HPC.daligner gives me after successful run:

...
# Check level 2 .las files jobs (1) (optional but recommended)
LAcheck -vS draft pk draft.pk
  **pk: Read indices out of range
  draft.pk: Read indices out of range**
# Remove level 1 .las files
rm L1.1.1.las L1.1.2.las L1.1. ...

Question regarding clipping

Hi
I have a question regarding the work-flow. The minimal example from your Running section succeeds for me. But if I try with a subset from my data I cant get beyond the clip section.
Upon running


Jun  5 09:36 edges.g_out.txt
dgmserv01.vital-it.ch HINGE_00000F hinge clip test.edges.hinges test.hinge.list whatever


Traceback (most recent call last):
  File "/software/UHTS/Assembler/HINGE/20170509/bin/../lib/hinge/pruning_and_clipping.py", line 1357, in 
    add_chimera_flags(G,prefix)
  File "/software/UHTS/Assembler/HINGE/20170509/bin/../lib/hinge/pruning_and_clipping.py", line 1054, in add_chimera_flags
    with open(cov_flags,'r') as f:
IOError: [Errno 2] No such file or directory: 'test.cov.flag'

I guess I have 2 questions related to this

  1. is the identifier-of-run as you put it in the example just a random one which I can choose or should it relate to something from earlier steps in the pipeline?

  2. Indeed the mentioned file test.cov.flag does not exist. At which step should it have been created and what can be the source of it being missing?

I forgot to add, that if I am checking the demo of the ecoli P4, I find a file ecoli.cov.flag which seems to be generated in the first step of filtering, but it is empty.

And indeed I have another file with the extension *.cov.flag in my folder as well but this one corresponds to the hinge layout step -o parameter. If I set though the identifier-of-run similar to the -o parameter it suddenly refers to yet another file...

draft_assembly aborted

Hi,

Yet on another (PacBio) dataset, draft_assembly gets aborted with the error below (the dataset unfortunately is not public):

draft_assembly --db r2s3 --las r2s3.las --prefix r2s3 --config /home/amay/packages/HINGE/utils/nominal.ini --out r2s3.draft

[filter]
length_threshold = 1000;
quality_threshold = 0.23;
n_iter = 3; // filter iteration
aln_threshold = 1000;
min_cov = 5;
cut_off = 300;
theta = 300;
use_qv = true;
...
...
...
Read 0:3840 4740 5640 6540 7440 8340 9240 10140
Read 1:1736 2618 3534 4414 5332 6244 7123 8023 8923 9823
Read 2:5067 5986 6924 7814 8710 9610
Read 3:4034 4835 5688 6662 7562 8462
Read 4:2661 3473 4284 5172 6022 6882
Lane 0
[0 3840] [1 850]
Lane 1
[0 4740] [1 1736] [2 510]
Lane 2
[0 5640] [1 2618] [2 1459]
Lane 3
[0 6540] [1 3534] [2 2363]
Lane 4
[0 7440] [1 4414] [2 3237]
Lane 5
[0 8340] [1 5332] [2 4140]
Lane 6
[0 9240] [1 6244] [2 5067] [3 2244]
Lane 7
[0 10140] [1 7123] [2 5986] [3 3119]
Lane 8
[1 8023] [2 6924] [3 4034] [4 2661]
Lane 9
[1 8923] [2 7814] [3 4835] [4 3473]
Lane 10
[1 9823] [2 8710] [3 5688] [4 4284]
Lane 11
[2 9610] [3 6662] [4 5172]
Lane 12
[3 7562] [4 6022]
Lane 13
[3 8462] [4 6882]
In total 14 lanes
0
11392
terminate called after throwing an instance of 'std::out_of_range'
  what():  basic_string::substr: __pos (which is 13167) > this->size() (which is 10595)
Aborted

The strange thing is that this dataset is very similar in terms of its characteristics to other four PacBio datasets that I can run HINGE smoothly with. Any ideas?

Thanks!

Inverted 2-mer bases

I've noticed another difference when I used racon instead of the default consensus module. We've previously noticed that the default consensus module sometimes results in inverted 2-mers in the assembly. If you look at file1, where you see the reads aligned to the default consensus assembly, you'll realise the inverted bases. Note that TC is inverted to CT in the assembly
file1
For this particular assembly we've realised that this happened more than 20,000 times.

When I used racon instead, the issue is not there. See file2, where the top sequence is the assembly from default consensus and the bottom is the consensus with racon.
file2

Hope you find this useful.

Larger genomes than E coli -- and getting started.

Hi,

This is an exciting tool. Thanks for developing it.

Have you tried it on genomes larger than Ecoli yet?

I have a 300 Mb insect genomewith 40-50X long read coverage and would like to try it.

Any tips or recommendations appreciated.

best,

John

DBsplit needed before HPC.daligner draft ?

Hi,

Step by step I'm running HINGE
Thank to @govinda-kamath and @ilanshom (#60) I was able to reach the "get consensus assembly" step.
But I'm facing this error :

Daligner jobs (3)

daligner -A -k20 -h50 -e0.85 vriparia_test.1 draft
daligner: Block draft contains reads < 20bp long ! Run DBsplit.
daligner -A -k20 -h50 -e0.85 vriparia_test.2 draft
daligner: Block draft contains reads < 20bp long ! Run DBsplit.
daligner -A -k20 -h50 -e0.85 vriparia_test.3 draft
daligner: Block draft contains reads < 20bp long ! Run DBsplit.
daligner -A -k20 -h50 -e0.85 vriparia_test.4 draft
daligner: Block draft contains reads < 20bp long ! Run DBsplit.
daligner -A -k20 -h50 -e0.85 vriparia_test.5 draft
daligner: Block draft contains reads < 20bp long ! Run DBsplit.
daligner -A -k20 -h50 -e0.85 vriparia_test.6 draft
daligner: Block draft contains reads < 20bp long ! Run DBsplit.
daligner -A -k20 -h50 -e0.85 vriparia_test.7 draft
daligner: Block draft contains reads < 20bp long ! Run DBsplit.
daligner -A -k20 -h50 -e0.85 vriparia_test.8 draft
daligner: Block draft contains reads < 20bp long ! Run DBsplit.
daligner -A -k20 -h50 -e0.85 vriparia_test.9 draft
daligner: Block draft contains reads < 20bp long ! Run DBsplit.

Do I have to run DBsplit like in the initial step ? :

DBsplit draft -x500 -s200

Thanks

Key and Assertion errors in get_draft_path_norevcomp.py

Hi guys,

In one of the datasets I'm working on, when I use the get_draft_path.py it works just fine but when I replace it with get_draft_path_norevcomp.py I get a key error. I had a look at it but couldn't figure out quickly what it's about, though it seems like a mapping issue. Perhaps you can have a look at it? I'm attaching the relevant files.

Thanks in advance
r2s2r2s2_run_id.G2.zip

get_draft_path_norevcomp.py ./ r2s2 r2s2r2s2_run_id.G2.graphml
Traceback (most recent call last):
  File "~/packages/HINGE/scripts/get_draft_path_norevcomp.py", line 316, in <module>
    RCmap[path_to_vert[path]] = path_to_vert[path_to_search]
KeyError: '20549_1'

Edit:
The same script gives an Assertion Error with another dataset. Attaching the file for that dataset here too.

r2s3r2s3_run_id.G2.zip

get_draft_path_norevcomp.py ./ r2s3 r2s3r2s3_run_id.G2.graphml
Traceback (most recent call last):
  File ~/packages/HINGE/scripts/get_draft_path_norevcomp.py", line 125, in <module>
    segment = get_string(path_var)
  File "~/packages/HINGE/scripts/get_draft_path_norevcomp.py", line 44, in get_string
    assert itm[1][0] >= itm[1][1]
AssertionError

HINGE for minimap

Hello
I found that intriguing sentence in your paper

"Therefore, integrating HINGE with other overlapping tools such as MHAP or Minimap can be done if different levels of alignment sensitivity or memory usage are required"

Is this a theoretical possibility, something you already started to look into or even possible with current code?

Cheers

Question: can I reuse the FALCON alignments in HINGE?

Hello,
I have been running Falcon for a while on a large set of pacbio reads, and I was wondering whether I could reuse the all-pairs daligner step that FALCON carries out and feed these results into HINGE. Is this possible? If so, is just a matter of renaming files? Please advise. Thanks.

Stefano

syntax error near unexpected token `('

Hi,
I have got from PBS the following error:

/var/spool/PBS/mom_priv/jobs/1638246.pbs.SC: line 17: syntax error near unexpected token `('
/var/spool/PBS/mom_priv/jobs/1638246.pbs.SC: line 17: `LAmerge ecoli.las ecoli.+([[:digit:]]).las'

How is it possible to fix it?

Thank you in advance.

Michal

Could not find the following static Boost libraries: boost_graph

Hi,
I got here the following error:

  Could not find the following static Boost libraries:
          boost_graph
  No Boost libraries were found.  You may need to set BOOST_LIBRARYDIR to the
  directory containing Boost libraries or BOOST_ROOT to the location of
  Boost.

However, I provided here BOOST_LIBRARYDIR.

What could I miss in my PR to Bioconda?

Thank you in advance,

Michal

Binary release of HINGE

Hi,
The software seems really interesting and I want to try it on my own datasets. Due to unsuccessful trials of compiling the software, may I ask for a binary release to run directly? Thanks.

Jimmy

doesn't install

dextract.c:23:18: fatal error: hdf5.h: No such file or directory
#include <hdf5.h>
^
compilation terminated.
make: *** [dextract] Error 1
make: *** Waiting for unfinished jobs....
./utils/build.sh: line 19: cmake: command not found
make: *** No targets specified and no makefile found. Stop.
make: *** No rule to make target `install'. Stop.

Layout 0 active hinges

I am trying to re-assemble a small piece of a large genome (which assembled correctly with other assemblers) using the multi-las approach and am experiencing a few strange points:

  1. the read filtering reports that none of the reads is removed, is this expected ?

  2. the multi-threading for the layout step seems not to work. I define 12 CPU's in the nominal.ini but it never uses more than one. How is this implemented, maybe I am missing some library?

  3. the layout steps remove all of my overlaps


[2017-06-13 10:19:52.415] [log] [info] Load alignments from test.1.las
[2017-06-13 10:19:52.415] [log] [info] # Alignments: 5631866
[2017-06-13 10:20:14.940] [log] [info] # reads: 17734
[2017-06-13 10:20:14.940] [log] [info] # active reads: 0/17734
[2017-06-13 10:20:14.940] [log] [info] Input data finished, part 1/16
[2017-06-13 10:20:16.287] [log] [info] kept 0/5631866 overlaps,  0/2768815 rev_overlaps in part 1/16
[2017-06-13 10:20:16.287] [log] [info] index finished
......
Similarly for my other 16 overlap results.
......
[2017-06-13 10:32:25.359] [log] [info] Building hinge graph
[2017-06-13 10:32:25.416] [log] [info] num hinges 62482
[2017-06-13 10:32:25.703] [log] [info] Hinge graph built
Total number of components: 62482
[2017-06-13 10:32:26.286] [log] [info] after filter 0 active hinges
[2017-06-13 10:32:26.539] [log] [info] Starting to build assembly graph.
[2017-06-13 10:32:26.563] [log] [info] sort and output finished
[2017-06-13 10:32:26.563] [log] [info] version 0.0.3

I tried similarly with a single las:


[2017-06-13 11:09:36.317] [log] [info] name of las: test.las
[2017-06-13 11:09:36.335] [log] [info] Load alignments from test.las
[2017-06-13 11:09:36.335] [log] [info] # Alignments: 16927078
[2017-06-13 11:10:47.286] [log] [info] # reads: 105189
[2017-06-13 11:10:47.286] [log] [info] # active reads: 0/105189
[2017-06-13 11:10:47.286] [log] [info] Input data finished, part 1/1
[2017-06-13 11:10:53.000] [log] [info] kept 0/16927078 overlaps,  0/8335689 rev_overlaps in part 1/1
[2017-06-13 11:10:53.000] [log] [info] index finished
[2017-06-13 11:10:53.008] [log] [info] kept 0/16927078 overlaps,  0/8335689 rev_overlaps in 1 part(s)
[2017-06-13 11:10:53.026] [log] [info] 0 overlaps
[2017-06-13 11:10:53.026] [log] [info] 0 rev overlaps
[2017-06-13 11:10:53.051] [log] [info] removed contained reads, active reads: 0
[2017-06-13 11:10:53.066] [log] [info] active reads: 0
[2017-06-13 11:10:54.046] [log] [info] 0 killed hinges
[2017-06-13 11:10:54.046] [log] [info] 0 hinges
[2017-06-13 11:10:54.906] [log] [info] 0 active hinges
[2017-06-13 11:10:54.928] [log] [info] Building hinge graph
[2017-06-13 11:10:54.993] [log] [info] num hinges 6967
[2017-06-13 11:10:55.105] [log] [info] Hinge graph built
Total number of components: 6967
[2017-06-13 11:10:55.258] [log] [info] after filter 0 active hinges
[2017-06-13 11:10:55.414] [log] [info] Starting to build assembly graph.
[2017-06-13 11:10:55.434] [log] [info] sort and output finished
[2017-06-13 11:10:55.434] [log] [info] version 0.0.3

I used the demo configuration and have ~100x coverage

Versions:
HINGE: 2d70ea7
DAZZ_DB: ff5cfec955496fbc1f5ab6735e0a832976dd2995
DALIGNER: 9e9acd358d2d8b6d24769f58f7de991c47292ce2
DASCUBBER: 77dd9555dac79c9f3040e582c4b44fc3fce16e32

##My steps:

Multi-las:

fasta2DB test test.subreads.fasta
DBsplit test
HPC.daligner test | bash -v
for i in {1..16}; do DASqv -c90 test.db test.${i}.las; done
Catrack test.db qual
hinge filter --db test --las test --mlas -x test --config nominal.ini
hinge layout --db test --las test --mlas -x test --config nominal.ini -o test

Single-las:

 LAmerge test.las test.[1-16].las
DASqv -c100  test test.las
hinge filter --db test --las test  -x test --config nominal.ini
hinge layout --db test --las test.las  -x test --config nominal.ini -o test

HINGE does produce any file content since few days

Hi,
After 4 days HINGE stopped to produce any content to the output files.

-rw-rw---- 1 lorencm lorencm 1.4K Mar  6 06:46 hinge.pbs
-rw-rw---- 1 lorencm lorencm 302G Mar 10 16:19 fruit.las
drwxrws--- 3 lorencm lorencm  11M Mar 10 16:19 .
-rw-rw---- 1 lorencm lorencm  89M Mar 10 16:41 .fruit.qual.data
-rw-rw---- 1 lorencm lorencm 8.0M Mar 10 16:41 .fruit.qual.anno
drwxrws--- 2 lorencm lorencm 4.0K Mar 10 16:41 log
-rw-rw---- 1 lorencm lorencm    0 Mar 10 16:42 fruit.repeat.txt
-rw-rw---- 1 lorencm lorencm    0 Mar 10 16:42 fruit.mas
-rw-rw---- 1 lorencm lorencm    0 Mar 10 16:42 fruit.homologous.txt
-rw-rw---- 1 lorencm lorencm    0 Mar 10 16:42 fruit.hinges.txt
-rw-rw---- 1 lorencm lorencm    0 Mar 10 16:42 fruit.filtered.fasta
-rw-rw---- 1 lorencm lorencm    0 Mar 10 16:42 fruit.coverage.txt

According to the queuing system is still running:

=========================================================================================
             Job                   NDS    CPUs       Mem  (Gb)    Walltime   Host/Array/
          ID     Name         State     Req Util%  Req'd  Used  Req'd  Used   GPU/mics
=========================================================================================
       1714719 hinge            R   1    10  37    200.0  11.5    300   170  cl3n065/2*0 
=========================================================================================

I am running HINGE in the following way:

#!/bin/bash -l
#PBS -N hinge
#PBS -j oe
#PBS -l walltime=150:00:00
#PBS -l mem=200G
#PBS -l ncpus=10
###PBS -M email@host

module load hdf5/1.8.16-foss-2016a
module load boost/1.61.0-foss-2016a
module load python/2.7.11-foss-2016a 

# pip install numpy ujson colormap easydev networkx --user


cd /work/lorencm/apps/HINGE
source utils/setup.sh
cd /work/lorencm/apps/HINGE/data/fruit
DBsplit -x500 -s100 fruit
HPC.daligner -t10 fruit | csh -v
LAmerge fruit.las fruit.*.las
rm fruit.*.las # we only need fruit.las
DASqv -c100 fruit fruit.las
# Run filter

mkdir log
hinge filter --db fruit --las fruit.las -x fruit --config ../../utils/nominal.ini

# Run layout

hinge layout --db fruit --las fruit.las -x fruit --config ../../utils/nominal.ini -o fruit

# Run postprocessing

hinge clip fruit.edges.hinges fruit.hinge.list 1


# get draft assembly 

hinge draft-path $PWD fruit fruit1.G2.graphml
hinge draft --db fruit --las fruit.las --prefix fruit --config ../../utils/nominal.ini --out fruit.draft


# get consensus assembly

hinge correct-head fruit.draft.fasta fruit.draft.pb.fasta draft_map.txt
fasta2DB draft fruit.draft.pb.fasta 
HPC.daligner fruit draft | zsh -v  
hinge consensus draft fruit draft.fruit.las fruit.consensus.fasta ../../utils/nominal.ini
hinge gfa $PWD fruit fruit.consensus.fasta

#results should be in fruit_consensus.gfa

Is it normal that HINGE does produce any file content since few days?

Thank you in advance.

Michal

DALIGNER seg fault

I was getting segmentation faults in the DALIGNER step, so I took the most recent source code from Gene Myers' github repository, and the segmentation faults went away. You might want to update your third-party links to most recent version of the DALIGNER. Other users might encounter the same problem.

segmentation error draft

Hi @fxia22, last Friday I complied the dev branch and I'm getting a segmentation fault error at the draft assembly stage. The old binary works fine. Let me know if I can help with troubleshooting.

...
...
...
T 29900 0 18940 0 12551
T 18940 0 64634 1 12121
T 64634 1 82532 1 11983
T 82532 1 26315 1 13473
T 26315 1 74944 1 13115
T 74944 1 15618 0 13322
T 15618 0 35434 0 13207
T 35434 0 44799 1 16938
E 44799 1 82360 0 3587 662
S 14312 1 6967 1 21870 0
E 6967 1 14312 1 21522 16053
S 14312 0 6967 0 21522 0
E 6967 0 14312 0 21870 16053
Error! Wrong format.

/mnt/nfs/programs/HINGE-dev/src/hinge: line 8: 31091 Segmentation fault      draft_assembly $*

Multiple CPU cores

Hi,
Does HINGE support Multiple CPU cores, if yes then how many?

Thank you in advance.

Michal

Python KeyError in get_consensus_gfa.py

Following the instructions in the README but using nanopore reads. Made it to the last step where I get this error.

Traceback (most recent call last):
  File "get_consensus_gfa.py", line 62, in <module>
    nodes_to_keep = [x for x in g.nodes() if consensus_contigs[g.node[x]['contig_id']] != '' ]
KeyError: 'contig_id'

After filter 0 active hinges

Hello again,

Another MinIon long-read dataset, unfortunately not public again. Read stats:

Starting nr of reads: 60566
Len: Min: 150 Max: 22639 Mean: 2682.2158 Std Dev: 2994.1440

This time I can only make it until

hinging --db org_8 --las org_8.las -x org_8 --config ~/packages/HINGE/utils/nominal.ini -o org_8

After which I am left with no sequences to work with, I think. Should I change a parameter perhaps? Here's the log.

[2016-09-08 11:09:45.485] [log] [info] Hinging layout
[2016-09-08 11:09:45.485] [log] [info] name of db: org_8, name of .las file org_8.las
[2016-09-08 11:09:45.486] [log] [info] name of fasta: , name of .paf file
[2016-09-08 11:09:45.486] [log] [info] filter files prefix: org_8
[2016-09-08 11:09:45.486] [log] [info] output prefix: org_8
[2016-09-08 11:09:45.506] [log] [info] Parameters passed in

[2016-09-08 11:09:45.557] [log] [info] Load alignments from org_8.las
[2016-09-08 11:09:45.557] [log] [info] # Alignments: 958702
[2016-09-08 11:09:45.557] [log] [info] # Reads: 48830
[2016-09-08 11:09:47.660] [log] [info] Input data finished
[2016-09-08 11:09:47.667] [log] [info] LENGTH_THRESHOLD = -1
[2016-09-08 11:09:47.667] [log] [info] QUALITY_THRESHOLD = 0
[2016-09-08 11:09:47.667] [log] [info] ALN_THRESHOLD = -1
[2016-09-08 11:09:47.667] [log] [info] MIN_COV = -1
[2016-09-08 11:09:47.667] [log] [info] CUT_OFF = -1
[2016-09-08 11:09:47.667] [log] [info] THETA = -1
[2016-09-08 11:09:47.667] [log] [info] N_ITER = -1
[2016-09-08 11:09:47.667] [log] [info] THETA2 = 0
[2016-09-08 11:09:47.667] [log] [info] N_PROC = 4
[2016-09-08 11:09:47.667] [log] [info] HINGE_SLACK = 1000
[2016-09-08 11:09:47.667] [log] [info] HINGE_TOLERANCE = 150
[2016-09-08 11:09:47.667] [log] [info] KILL_HINGE_OVERLAP_ALLOWANCE = 300
[2016-09-08 11:09:47.667] [log] [info] KILL_HINGE_INTERNAL_ALLOWANCE = 40
[2016-09-08 11:09:47.667] [log] [info] MATCHING_HINGE_SLACK = 200
[2016-09-08 11:09:47.667] [log] [info] MIN_CONNECTED_COMPONENT_SIZE = 8
[2016-09-08 11:09:47.668] [log] [info] USE_TWO_MATCHES = true
[2016-09-08 11:09:47.668] [log] [info] del_telomeres = false
[2016-09-08 11:09:47.699] [log] [info] read mask finished
[2016-09-08 11:09:47.845] [log] [info] read marked repeats
[2016-09-08 11:09:47.845] [log] [info] killed 0 reads with many repeats
[2016-09-08 11:09:47.959] [log] [info] read marked hinges
[2016-09-08 11:09:47.961] [log] [info] active reads: 48830
[2016-09-08 11:09:47.963] [log] [info] active reads: 48373
[2016-09-08 11:09:50.042] [log] [info] overlaps 958702 rev_overlaps 469296
[2016-09-08 11:09:50.042] [log] [info] index finished
[2016-09-08 11:09:50.043] [log] [info] Number reads 48830
[2016-09-08 11:09:56.855] [log] [info] 0 overlaps
[2016-09-08 11:09:56.855] [log] [info] 0 rev overlaps
[2016-09-08 11:09:56.924] [log] [info] removed contained reads, active reads: 48373
[2016-09-08 11:09:56.925] [log] [info] active reads: 48373
[2016-09-08 11:09:57.058] [log] [info] 132 killed hinges
[2016-09-08 11:09:57.058] [log] [info] 93 hinges
[2016-09-08 11:09:57.182] [log] [info] 93 active hinges
[2016-09-08 11:09:57.186] [log] [info] Building hinge graph
[2016-09-08 11:09:57.195] [log] [info] num hinges 93
[2016-09-08 11:09:57.230] [log] [info] Hinge graph built
[2016-09-08 11:09:57.245] [log] [info] after filter 0 active hinges
[2016-09-08 11:09:57.271] [log] [info] Starting to build assembly graph.
[2016-09-08 11:09:57.443] [log] [info] sort and output finished
[2016-09-08 11:09:57.443] [log] [info] version 0.0.3

Thanks in advance for any tips!

HINGE stalled and producing a huge empty error/ouput file

Hi,
I'using HINGE to assemble a 260 Mb genome (PacBio reads RII - 56X coverage)
Here is my script :
HINGE_Stella_Assembly_04_2017.zip
HINGE is running since 11 days, but since 4 days it seems to be stalled
The last log in the output folder/log is empty. The last informative log :
log_2017-04-22_15-24.txt
Last but not least the ouput and error file is about 55Go ! and still growing.
When I look into it : the beginning is "normal" but since HINGE is stalled, this file is growing with empty lines or spaces !!

Here is the content of the output folder :

-rw-r--r-- 1 ag users    0 Apr 22 15:50 Stella.draft.fasta
drwxr-xr-x 2 ag users 4.0K Apr 22 15:36 log
-rw-r--r-- 1 ag users    0 Apr 22 15:36 Stella.contained.txt
-rw-r--r-- 1 ag users    0 Apr 22 15:36 Stella.garbage.txt
-rw-r--r-- 1 ag users    0 Apr 22 15:36 Stella.draft.deadends.txt
-rw-r--r-- 1 ag users 5.9M Apr 22 15:36 edges.g_out.txt
-rw-r--r-- 1 ag users 4.1M Apr 22 15:36 Stella.edges.1
-rw-r--r-- 1 ag users 4.1M Apr 22 15:36 Stella.edges.2
-rw-r--r-- 1 ag users 6.0M Apr 22 15:36 Stella.edges.hinges
-rw-r--r-- 1 ag users 4.7M Apr 22 15:36 Stella.edges.hinges2
-rw-r--r-- 1 ag users 5.8M Apr 22 15:36 Stella.edges.greedy
-rw-r--r-- 1 ag users 4.8M Apr 22 15:36 Stella.edges.skipped
-rw-r--r-- 1 ag users 130M Apr 22 15:36 Stella.hgraph
-rw-r--r-- 1 ag users 212K Apr 22 15:36 Stella.debug
-rw-r--r-- 1 ag users 2.2M Apr 22 15:36 Stella.deadends.txt
-rw-r--r-- 1 ag users    0 Apr 22 15:36 hinge_debug.txt
-rw-r--r-- 1 ag users 927K Apr 22 15:36 Stella.hinge.list
-rw-r--r-- 1 ag users    0 Apr 22 15:32 overlap_debug.txt
-rw-r--r-- 1 ag users  13M Apr 22 15:32 Stella.killed.hinges
-rw-r--r-- 1 ag users  22M Apr 22 15:32 edges.bkw.backup.txt
-rw-r--r-- 1 ag users  25M Apr 22 15:32 edges.fwd.backup.txt
-rw-r--r-- 1 ag users 424K Apr 22 15:20 Stella.max
-rw-r--r-- 1 ag users 317M Apr 22 14:46 Stella.coverage.txt
-rw-r--r-- 1 ag users    0 Apr 22 13:59 Stella.filtered.fasta
-rw-r--r-- 1 ag users    0 Apr 22 13:59 Stella.homologous.txt
-rw-r--r-- 1 ag users 5.5M Apr 22 13:54 Stella.hinges.txt
-rw-r--r-- 1 ag users 6.2M Apr 22 13:54 Stella.repeat.txt
-rw-r--r-- 1 ag users    0 Apr 22 13:20 debug.txt
-rw-r--r-- 1 ag users 2.8M Apr 22 13:20 Stella.mas
-rw-r--r-- 1 ag users 2.3M Apr 22 13:20 Stella.cmas
-rw-r--r-- 1 ag users    0 Apr 22 12:34 Stella.self.flag
-rw-r--r-- 1 ag users    0 Apr 22 12:34 Stella.cov.flag
-rw-r--r-- 1 ag users  54G Apr 22 12:30 Stella.las
-rw-r--r-- 1 ag users 1.7K Apr 14 12:15 Stella.db
-rw-r--r-- 1 ag users  15G Apr 14 12:13 Stella_reads_3kb.pb.fasta
-rw-r--r-- 1 ag users 142M Apr 14 12:13 map.txt
-rw-r--r-- 1 ag users 3.8K Apr 14 12:10 HINGE_Stella_Assembly_04_2017.sh
-rw-r--r-- 1 ag users  15G Apr 14 11:54 Stella_reads_3kb.fasta

Thanks for your help

HINGE with PacBio CCS reads

Hi all,

I've been trying to run HINGE using Circular Consensus Sequencing reads that are obtained after a Sequel run. As expected the number of reads decreases drastically due to CCS, but on the other hand the read quality increases. After CCS I end up with 7,344 reads of total length 25.8 Mpb (~8x coverage in this case).

I tried to lower all coverage-related parameters in the nominal.ini and run HINGE on these high-quality sequences but unfortunately it didn't work out. It appears as if it's failing at a rather early stage before draft assembly. I'm attaching the log file. Could you please have a look and let me know?

Many thanks in advance,

Ali

log_2017-03-21_16-08.txt
hingehinge_run_id.G00.zip

Segmentation fault at consensus

Hello,

I've made it until
consensus draft ecoli draft.ecoli.las ecoli.consensus.fasta utils/nominal.ini
however when I issue this command I get

length threshold:-1
1 files
1 files

Contigs:58

Reads:119774

Alignments:117641

115549
Segmentation fault (core dumped)

I should note that I'm not using the example E coli reads, but some other long reads.

Any clues?

Read filtering fails

Hi I get the following error when running Reads_filter

Reads_filter --db staph --las staph.las -x staph --config ../../utils/nominal.ini                                                                                                                                            (master) 
[2016-07-08 18:03:07.669] [log] [info] Reads filtering
terminate called after throwing an instance of 'spdlog::spdlog_ex'
  what():  formatting error while processing format string 'current user {}, current working directory {}': string pointer is null

Issue with the dev branch

Hi guys,

When I build HINGE using the dev branch and use my hinge wrapper script I get 0 contigs from my validation Sequel data. When I use the master I get 1 contig, which is the expected genome.

The commands I use are as follows

fq2fa m54072_160926_234436.subreads_downsampled_90k.fastq.0.fastq m54072_160926_234436.subreads_downsampled_90k.fastq.0.fasta

hinge correct-head m54072_160926_234436.subreads_downsampled_90k.fastq.0.fasta m54072_160926_234436.subreads_downsampled_90k.fastq.0_f.fasta fasta_map.txt

fasta2DB hinge_assembly m54072_160926_234436.subreads_downsampled_90k.fastq.0_f.fasta

DBsplit -x500 -s100 hinge_assembly

HPC.daligner -t5 -T32 hinge_assembly| csh -v

LAmerge hinge_assembly.las hinge_assembly*.las

DASqv -c100 hinge_assembly hinge_assembly.las

hinge filter --db hinge_assembly --las hinge_assembly.las -x hinge_assembly --config /hinge/utils/nominal.ini

hinge layout --db hinge_assembly --las hinge_assembly.las -x hinge_assembly --config /hinge/utils/nominal.ini -o hinge_assembly

hinge clip hinge_assembly.edges.hinges hinge_assembly.hinge.list hinge_assembly_run_id

hinge draft-path ./ hinge_assembly hinge_assemblyhinge_assembly_run_id.G2.graphml

hinge draft --db hinge_assembly --las hinge_assembly.las --prefix hinge_assembly --config /hinge/utils/nominal.ini --out hinge_assembly.draft

get_draft_path_norevcomp.py hinge_assembly.draft.fasta hinge_assembly.draft.norevcomp.fasta

hinge correct-head hinge_assembly.draft.norevcomp.fasta hinge_assembly.draft.norevcomp.pb.fasta draft_map.txt

fasta2DB draft hinge_assembly.draft.norevcomp.pb.fasta

HPC.daligner hinge_assembly draft | zsh -v

hinge consensus draft hinge_assembly draft.hinge_assembly.las hinge_assembly.consensus.fasta /hinge/utils/nominal.ini

I can't really pinpoint exactly where/why it's going wrong when I used the dev build, but by looking at the contents (file sizes) of the two output directories from dev and master, it looks like it's the stage where the graph is built:

The output folder contents of master
-rw-r--r-- 1 amay users 271M Oct 31 15:11 m54072_160926_234436.subreads_downsampled_90k.fastq.0.fasta
-rw-r--r-- 1 amay users 5.4M Oct 31 15:11 fasta_map.txt
-rw-r--r-- 1 amay users 274M Oct 31 15:11 m54072_160926_234436.subreads_downsampled_90k.fastq.0_f.fasta
-rw-r--r-- 1 amay users 242 Oct 31 15:11 smrt1_90k.db
-rw-r--r-- 1 amay users 509M Oct 31 15:17 smrt1_90k.las
-rw-r--r-- 1 amay users 0 Oct 31 15:17 smrt1_90k.homologous.txt
-rw-r--r-- 1 amay users 0 Oct 31 15:17 smrt1_90k.filtered.fasta
-rw-r--r-- 1 amay users 0 Oct 31 15:18 debug.txt
-rw-r--r-- 1 amay users 584K Oct 31 15:18 smrt1_90k.repeat.txt
-rw-r--r-- 1 amay users 584K Oct 31 15:18 smrt1_90k.hinges.txt
-rw-r--r-- 1 amay users 1.2M Oct 31 15:18 smrt1_90k.mas
-rw-r--r-- 1 amay users 47M Oct 31 15:18 smrt1_90k.coverage.txt
-rw-r--r-- 1 amay users 292K Oct 31 15:18 edges.fwd.backup.txt
-rw-r--r-- 1 amay users 282K Oct 31 15:18 edges.bkw.backup.txt
-rw-r--r-- 1 amay users 0 Oct 31 15:18 overlap_debug.txt
-rw-r--r-- 1 amay users 0 Oct 31 15:18 smrt1_90k.hinge.list
-rw-r--r-- 1 amay users 0 Oct 31 15:18 hinge_debug.txt
-rw-r--r-- 1 amay users 583K Oct 31 15:18 smrt1_90k.killed.hinges
-rw-r--r-- 1 amay users 168K Oct 31 15:18 smrt1_90k.deadends.txt
-rw-r--r-- 1 amay users 219K Oct 31 15:18 edges.g_out.txt
-rw-r--r-- 1 amay users 201K Oct 31 15:18 smrt1_90k.edges.1
-rw-r--r-- 1 amay users 204K Oct 31 15:18 smrt1_90k.edges.2
-rw-r--r-- 1 amay users 291K Oct 31 15:18 smrt1_90k.edges.hinges
-rw-r--r-- 1 amay users 228K Oct 31 15:18 smrt1_90k.edges.hinges2
-rw-r--r-- 1 amay users 291K Oct 31 15:18 smrt1_90k.edges.greedy
-rw-r--r-- 1 amay users 360 Oct 31 15:18 smrt1_90k.edges.skipped
-rw-r--r-- 1 amay users 338 Oct 31 15:18 smrt1_90k.hgraph
-rw-r--r-- 1 amay users 83 Oct 31 15:19 smrt1_90k.debug
-rw-r--r-- 1 amay users 1.7M Oct 31 15:19 smrt1_90ksmrt1_90k_run_id.G00.graphml
-rw-r--r-- 1 amay users 1.8M Oct 31 15:19 smrt1_90ksmrt1_90k_run_id.G0.graphml
-rw-r--r-- 1 amay users 1.3M Oct 31 15:19 smrt1_90ksmrt1_90k_run_id.G1.graphml
-rw-r--r-- 1 amay users 44 Oct 31 15:19 tandem.txt
-rw-r--r-- 1 amay users 1.3M Oct 31 15:19 smrt1_90ksmrt1_90k_run_id.G2.graphml
-rw-r--r-- 1 amay users 325K Oct 31 15:19 smrt1_90ksmrt1_90k_run_id.Gs.graphml
-rw-r--r-- 1 amay users 324K Oct 31 15:19 smrt1_90ksmrt1_90k_run_id.G2s.graphml
-rw-r--r-- 1 amay users 370K Oct 31 15:19 smrt1_90ksmrt1_90k_run_id.Gc.graphml
-rw-r--r-- 1 amay users 370K Oct 31 15:19 smrt1_90ksmrt1_90k_run_id.G2c.graphml
-rw-r--r-- 1 amay users 60K Oct 31 15:19 smrt1_90k.edges.list
-rw-r--r-- 1 amay users 36K Oct 31 15:19 smrt1_90k_draft.graphml
-rw-r--r-- 1 amay users 0 Oct 31 15:19 smrt1_90k.draft.deadends.txt
-rw-r--r-- 1 amay users 0 Oct 31 15:19 smrt1_90k.max
-rw-r--r-- 1 amay users 0 Oct 31 15:19 smrt1_90k.garbage.txt
-rw-r--r-- 1 amay users 0 Oct 31 15:19 smrt1_90k.contained.txt
drwxr-xr-x 2 amay users 4.0K Oct 31 15:19 log
-rw-r--r-- 1 amay users 4.9M Oct 31 15:20 smrt1_90k.draft.fasta
-rw-r--r-- 1 amay users 2.5M Oct 31 15:20 smrt1_90k.draft.norevcomp.fasta
-rw-r--r-- 1 amay users 29 Oct 31 15:20 draft_map.txt
-rw-r--r-- 1 amay users 2.5M Oct 31 15:20 smrt1_90k.draft.norevcomp.pb.fasta
-rw-r--r-- 1 amay users 68 Oct 31 15:20 draft.db
-rw-r--r-- 1 amay users 3.1M Oct 31 15:21 draft.smrt1_90k.las
-rw-r--r-- 1 amay users 2.5M Oct 31 15:21 smrt1_90k.consensus.fasta
-rw-r--r-- 1 amay users 541 Oct 31 15:24 smrt1_90k.consensus.fasta.stats

The output folder contents of dev
-rw-r--r-- 1 amay users 271M Jan 5 10:22 m54072_160926_234436.subreads_downsampled_90k.fastq.0.fasta
-rw-r--r-- 1 amay users 5.4M Jan 5 10:22 fasta_map.txt
-rw-r--r-- 1 amay users 274M Jan 5 10:22 m54072_160926_234436.subreads_downsampled_90k.fastq.0_f.fasta
-rw-r--r-- 1 amay users 242 Jan 5 10:22 hinge_assembly.db
-rw-r--r-- 1 amay users 509M Jan 5 10:25 hinge_assembly.las
-rw-r--r-- 1 amay users 0 Jan 5 10:25 hinge_assembly.homologous.txt
-rw-r--r-- 1 amay users 0 Jan 5 10:25 hinge_assembly.filtered.fasta
-rw-r--r-- 1 amay users 0 Jan 5 10:25 debug.txt
-rw-r--r-- 1 amay users 584K Jan 5 10:25 hinge_assembly.repeat.txt
-rw-r--r-- 1 amay users 584K Jan 5 10:25 hinge_assembly.hinges.txt
-rw-r--r-- 1 amay users 1.2M Jan 5 10:25 hinge_assembly.mas
-rw-r--r-- 1 amay users 47M Jan 5 10:25 hinge_assembly.coverage.txt
-rw-r--r-- 1 amay users 0 Jan 5 10:25 hinge_assembly.deadends.txt
-rw-r--r-- 1 amay users 0 Jan 5 10:25 edges.fwd.backup.txt
-rw-r--r-- 1 amay users 0 Jan 5 10:25 edges.bkw.backup.txt
-rw-r--r-- 1 amay users 0 Jan 5 10:25 hinge_assembly.edges.hinges2
-rw-r--r-- 1 amay users 0 Jan 5 10:25 hinge_assembly.edges.hinges
-rw-r--r-- 1 amay users 0 Jan 5 10:25 hinge_assembly.edges.2
-rw-r--r-- 1 amay users 0 Jan 5 10:25 hinge_assembly.edges.1
-rw-r--r-- 1 amay users 0 Jan 5 10:25 hinge_assembly.edges.skipped
-rw-r--r-- 1 amay users 0 Jan 5 10:25 hinge_assembly.edges.greedy
-rw-r--r-- 1 amay users 0 Jan 5 10:25 hinge_assembly.hgraph
-rw-r--r-- 1 amay users 0 Jan 5 10:25 hinge_assembly.debug
-rw-r--r-- 1 amay users 0 Jan 5 10:25 overlap_debug.txt
-rw-r--r-- 1 amay users 0 Jan 5 10:25 hinge_assembly.hinge.list
-rw-r--r-- 1 amay users 0 Jan 5 10:25 hinge_debug.txt
-rw-r--r-- 1 amay users 583K Jan 5 10:25 hinge_assembly.killed.hinges
-rw-r--r-- 1 amay users 4 Jan 5 10:25 edges.g_out.txt
-rw-r--r-- 1 amay users 308 Jan 5 10:25 hinge_assemblyhinge_assembly_run_id.G00.graphml
-rw-r--r-- 1 amay users 308 Jan 5 10:25 hinge_assemblyhinge_assembly_run_id.G0.graphml
-rw-r--r-- 1 amay users 308 Jan 5 10:25 hinge_assemblyhinge_assembly_run_id.G1.graphml
-rw-r--r-- 1 amay users 308 Jan 5 10:25 hinge_assemblyhinge_assembly_run_id.G2.graphml
-rw-r--r-- 1 amay users 308 Jan 5 10:25 hinge_assemblyhinge_assembly_run_id.Gs.graphml
-rw-r--r-- 1 amay users 308 Jan 5 10:25 hinge_assemblyhinge_assembly_run_id.G2s.graphml
-rw-r--r-- 1 amay users 308 Jan 5 10:25 hinge_assemblyhinge_assembly_run_id.Gc.graphml
-rw-r--r-- 1 amay users 308 Jan 5 10:25 hinge_assemblyhinge_assembly_run_id.G2c.graphml
-rw-r--r-- 1 amay users 0 Jan 5 10:25 hinge_assembly.edges.list
-rw-r--r-- 1 amay users 308 Jan 5 10:25 hinge_assembly_draft.graphml
-rw-r--r-- 1 amay users 0 Jan 5 10:26 hinge_assembly.garbage.txt
-rw-r--r-- 1 amay users 0 Jan 5 10:26 hinge_assembly.draft.deadends.txt
-rw-r--r-- 1 amay users 0 Jan 5 10:26 hinge_assembly.contained.txt
drwxr-xr-x 2 amay users 4.0K Jan 5 10:26 log
-rw-r--r-- 1 amay users 2 Jan 5 10:26 hinge_assembly.draft.fasta
-rw-r--r-- 1 amay users 0 Jan 5 10:26 hinge_assembly.draft.norevcomp.fasta

Working on larger genomes (errors, questions and discussions)

Hi,

I'm working on a larger genome (200-300Mb) with high repeat content. I'm really excited when I find HINGE.

The first error I had is "Pacbio header line name inconsisten", which should be because that I used a merged.fasta with pacbio data from different cells to build the DB, as was explained here (thegenemyers/DEXTRACTOR#4)
Then how can I build the DB with fasta2DB using pacbio data from multiple pacbio cells?

Another question is about the -t parameter of HPCdaligner. In the Ecoli example, -t5 was used. If I got it correctly, "-t any k-mer that occurs more than t times in either the subject or target is not counted in the heuristic". So should I modify this -t value for a highly repeative genome?

Thanks,
Quan

Getting small contigs with long polyA and polyT stretches

Assembling a bacterial genome (Still the one I have problems running with 4 smrt cells, see #65. But with one or two smrt cells it works. I do not know if the problem #65 is related to amount of data or certain "problematic" combinations).

Anyway, here I have another issue. I have one smrt cell of this bacterium with about 1.1gb of input fasta file. The strange thing is that I get some small contigs (14k and 29k) which contain very long polyA and polyT stretches (several 1000 bp). I do NOT get those with HGAP. And the "normal" regions flanking those long polyA and polyT homopolymers are from the bacterium itself, I checked. They are not human or whatever contamination.

Is this a known problem with HINGE or is it a PacBio specific thing? But then I should also see it with HGAP?

new release

Aside from just finalizing the Debian package, I'm looking to run hinge on a set of bacterial genomes. I was wondering when you plan to cut a new release (the latest one shows as from September 2016).

Thanks for your consideration :)

hinge: line 8: 15600 Segmentation fault draft_assembly $*

Hi, I used hinge in the following way:

cd /work/lorencm/apps/HINGE
source utils/setup.sh
cd /work/lorencm/apps/HINGE/data/fruit
#HPC.daligner -t10 fruit | csh -v
#LAmerge fruit.las fruit.*.las
#rm fruit.*.las # we only need fruit.las
DASqv -c100 fruit fruit.las
# Run filter

mkdir log
hinge filter --db fruit --las fruit.las -x fruit --config ../../utils/nominal.ini

# Run layout

hinge layout --db fruit --las fruit.las -x fruit --config ../../utils/nominal.ini -o fruit

# Run postprocessing

hinge clip fruit.edges.hinges fruit.hinge.list 1

# get draft assembly 

hinge draft-path $PWD fruit fruit1.G2.graphml
hinge draft --db fruit --las fruit.las --prefix fruit --config ../../utils/nominal.ini --out fruit.draft

# get consensus assembly

hinge correct-head fruit.draft.fasta fruit.draft.pb.fasta draft_map.txt
fasta2DB draft fruit.draft.pb.fasta 
HPC.daligner fruit draft | zsh -v  
hinge consensus draft fruit draft.fruit.las fruit.consensus.fasta ../../utils/nominal.ini
hinge gfa $PWD fruit fruit.consensus.fasta

#results should be in fruit_consensus.gfa

but I ran into the following problems:

[2017-02-14 15:09:03.814] [log] [info] removed contained reads, active reads: 20796
[2017-02-14 15:09:03.861] [log] [info] active reads: 20796
[2017-02-14 15:09:19.887] [log] [info] 44591 killed hinges
[2017-02-14 15:09:19.887] [log] [info] 121770 hinges
[2017-02-14 15:09:32.237] [log] [info] 121770 active hinges
[2017-02-14 15:09:41.929] [log] [info] Building hinge graph
[2017-02-14 15:09:42.045] [log] [info] num hinges 1625372
[2017-02-14 15:12:20.878] [log] [info] Hinge graph built
Total number of components: 1511513
[2017-02-14 15:12:34.782] [log] [info] after filter 32822 active hinges
[2017-02-14 15:12:41.289] [log] [info] Starting to build assembly graph.
[2017-02-14 15:12:47.617] [log] [info] sort and output finished
[2017-02-14 15:12:47.617] [log] [info] version 0.0.3
/work/lorencm/apps/HINGE/inst/bin/hinge: line 8: 15594 Illegal instruction     pruning_and_clipping.py $*
/work/lorencm/apps/HINGE/inst/bin/hinge: line 8: 15597 Illegal instruction     get_draft_path.py $*
[2017-02-14 15:18:01.956] [log] [info] draft consensus
[2017-02-14 15:18:01.956] [log] [info] name of db: fruit, name of .las file fruit.las
[2017-02-14 15:18:01.956] [log] [info] name of fasta: , name of .paf file 
[2017-02-14 15:18:01.956] [log] [info] filter files prefix: fruit
[2017-02-14 15:18:01.956] [log] [info] output prefix: fruit.draft
[2017-02-14 15:18:01.963] [log] [info] Parameters passed in 

[filter]
length_threshold = 1000;
quality_threshold = 0.23;
n_iter = 3; // filter iteration
aln_threshold = 1000;
min_cov = 5;
cut_off = 300;
theta = 300;
use_qv = true;

[running]
n_proc = 12;

[draft]
min_cov = 10;
trim = 200;
edge_safe = 100;
tspace = 900;
step = 50;


[consensus]
min_length = 4000;
trim_end = 200;
best_n = 1;
quality_threshold = 0.23;

[layout]
hinge_slack = 1000
min_connected_component_size = 8

[2017-02-14 15:18:02.329] [log] [info] Load alignments from fruit.las
[2017-02-14 15:18:02.329] [log] [info] # Alignments: 1693423760
[2017-02-14 15:18:02.329] [log] [info] # Reads: 1044634
[2017-02-14 15:40:02.574] [log] [info] Input data finished
[2017-02-14 15:40:02.599] [log] [info] LENGTH_THRESHOLD = 1000
[2017-02-14 15:40:02.599] [log] [info] QUALITY_THRESHOLD = 0.23
[2017-02-14 15:40:02.599] [log] [info] ALN_THRESHOLD = 1000
[2017-02-14 15:40:02.599] [log] [info] MIN_COV = 5
[2017-02-14 15:40:02.599] [log] [info] CUT_OFF = 300
[2017-02-14 15:40:02.599] [log] [info] THETA = 300
[2017-02-14 15:40:02.599] [log] [info] N_ITER = 3
[2017-02-14 15:40:02.600] [log] [info] THETA2 = 0
[2017-02-14 15:40:02.600] [log] [info] N_PROC = 12
[2017-02-14 15:40:02.600] [log] [info] HINGE_SLACK = 1000
[2017-02-14 15:40:02.600] [log] [info] HINGE_TOLERANCE = 150
[2017-02-14 15:40:02.600] [log] [info] KILL_HINGE_OVERLAP_ALLOWANCE = 300
[2017-02-14 15:40:02.600] [log] [info] KILL_HINGE_INTERNAL_ALLOWANCE = 40
[2017-02-14 15:40:02.600] [log] [info] MATCHING_HINGE_SLACK = 200
[2017-02-14 15:40:02.600] [log] [info] MIN_CONNECTED_COMPONENT_SIZE = 8
add data
add data
Error! Wrong format.

/work/lorencm/apps/HINGE/inst/bin/hinge: line 8: 15600 Segmentation fault      draft_assembly $*
/work/lorencm/apps/HINGE/inst/bin/hinge: line 8: 48149 Illegal instruction     correct_head.py $*
fasta2DB: Cannot open ./fruit.draft.pb.fasta for 'r'
# set global options for all zsh shells here
HPC.daligner: Cannot open ./draft.db for 'r'
(null): Could not open database draft
length threshold:4000
/work/lorencm/apps/HINGE/inst/bin/hinge: line 8: 48158 Illegal instruction     get_consensus_gfa.py $*

What did I miss?

Thank you in advance.

Michal

Get draft assembly error

Hi,

Still trying to work with HINGE on my data set. Here is a test on a single SMRT Cell.
All previous steps went well.
But now i'm stucked with :

`Run postprocessing: Mon Aug 29 11:10:47 CEST 2016

get draft assembly : Mon Aug 29 11:23:49 CEST 2016
Traceback (most recent call last):

File "/home/ag/rainman_home/HINGE/scripts/get_draft_path.py", line 24, in
stdout=subprocess.PIPE,bufsize=1)

File "/module/apps/python/2.7.9/lib/python2.7/subprocess.py", line 710, in init
errread, errwrite)

File "/module/apps/python/2.7.9/lib/python2.7/subprocess.py", line 1335, in _execute_child
raise child_exception

OSError: [Errno 2] No such file or directory

[2016-08-29 11:23:50.855] [log] [info] draft consensus

[2016-08-29 11:23:50.855] [log] [info] name of db: vriparia_test, name of .las file vriparia_test.las

[2016-08-29 11:23:50.855] [log] [info] name of fasta: , name of .paf file

[2016-08-29 11:23:50.855] [log] [info] filter files prefix: vriparia_test

[2016-08-29 11:23:50.855] [log] [info] output prefix: vriparia_test.draft

[2016-08-29 11:23:50.855] [log] [info] Parameters passed in`

My lines of script :

echo "#############################" echo "Run postprocessing:"date`
python2.7 /home/ag/rainman_home/HINGE/scripts/pruning_and_clipping.py vriparia_test.edges.hinges vriparia_test.hinge.list hinge_1

echo "#############################"
echo "get draft assembly :" date
python2.7 /home/ag/rainman_home/HINGE/scripts/get_draft_path.py $PBS_O_WORKDIR vriparia_test vriparia_testhinge_1.G2.graphml
/home/ag/rainman_home/HINGE/build/bin/consensus/draft_assembly --db vriparia_test --las vriparia_test.las --prefix vriparia_test --config /home/ag/rainman_home/HINGE/utils/nominal.ini --out vriparia_test.draft`

I have updated this morning get_draft_path.py according 9ea413d

Thanks for your help

Bioconda

Hi,
Any plans to create a bioconda package for HINGE ?

Thank you in advance.

Michal

Detailed explanation of parameters in nominal.ini

Maybe I missed that. But I have trouble to find the definition of all parameters in nominal.ini and the connection to where they are mentioned in the paper?

Is there a document describing it in detail? I would be awesome if you can give a doc which describes the parameters!

Many thanks!

Hinge detection problems

Far too many hinges are being detected. Greedy seems robust, but input sizes are small. So must fix.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.