rrwick / unicycler Goto Github PK

hybrid assembly pipeline for bacterial genomes

License: GNU General Public License v3.0

Python 5.63% C++ 89.88% C 4.46% Makefile 0.02%

unicycler's Introduction

Unicycler is an assembly pipeline for bacterial genomes. It can assemble Illumina-only read sets where it functions as a SPAdes-optimiser. It can also assembly long-read-only sets (PacBio or Nanopore) where it runs a miniasm+Racon pipeline. For the best possible assemblies, give it both Illumina reads and long reads, and it will conduct a short-read-first hybrid assembly.

2022 update

Unicycler was initially made in 2016, back when long reads could be sparse and very noisy. For example, our early Oxford Nanopore sequencing runs might generate only 15× read depth for a single bacterial isolate, and most of the reads had a lot of errors. So Unicycler was designed to use low-depth and low-accuracy long reads to scaffold a short-read assembly graph to completion, an approach I call short-read-first hybrid assembly. Assuming the short-read assembly graph is in good shape, Unicycler does this quite well!

However, things have changed in the last six years. Nanopore sequencing yield is now much higher, making >100× depth easy to obtain, even on multiplexed runs. Read accuracy has also improved and continues to get better each year. High-depth and high-accuracy long reads make long-read-first hybrid assembly (long-read assembly followed by short-read polishing) a viable approach that's often preferable to Unicycler. I have developed Trycycler and Polypolish in the pursuit of ideal long-read-first assemblies.

Unicycler is not completely out-of-date, as it is still (in my opinion) the best tool for short-read-first hybrid assembly of bacterial genomes. But I think it should only be used for hybrid assembly when long-read-first is not an option – i.e. when long-read depth is low. I also think that Unicycler is good for short-read-only bacterial genomes, as it produces cleaner assembly graphs than SPAdes alone. So while Unicycler doesn't get a lot of my time and attention these days, I don't yet consider it to be abandonware.

For some up-to-date bacterial genome assembly tips, check out these parts of Trycycler's wiki:

Introduction

As input, Unicycler takes one of the following:

Illumina reads from a bacterial isolate (ideally paired-end, but unpaired works too)
A set of long reads (either PacBio or Nanopore) from a bacterial isolate
Illumina reads and long reads from the same isolate (best case)

Reasons to use Unicycler:

It circularises replicons without the need for a separate tool like Circlator.
It handles plasmid-rich genomes.
It can use long reads of any depth and quality in hybrid assembly. 20× or more may be required to complete a genome, but Unicycler can make nearly-complete genomes with far fewer long reads.
It produces an assembly graph in addition to a contigs FASTA file, viewable in Bandage.
It filters out low-depth contigs, giving clean assemblies even when the read set has low-level contamination.
It has low misassembly rates.
It can cope with highly repetitive genomes, such as Shigella.
It's easy to use: runs with just one command and usually doesn't require tinkering with parameters.

Reasons to not use Unicycler:

You're assembling a eukaryotic genome or a metagenome (Unicycler is designed exclusively for bacterial isolates).
Your Illumina reads and long reads are from different isolates (Unicycler struggles with sample heterogeneity).
You're impatient (Unicycler is thorough but not especially fast).

Requirements

Linux or macOS
Python 3.4 or later
C++ compiler with C++14 support:
- GCC 4.9.1 or later
- Clang 3.5 or later
- ICC also works (though I don't know the minimum required version number)
setuptools (only required for installation of Unicycler)
For short-read or hybrid assembly:
- SPAdes v3.14.0 or later (spades.py)
For long-read or hybrid assembly:
- Racon (racon)
For rotating circular contigs:
- BLAST+ (makeblastdb and tblastn)

Unicycler expects external tools to be available in $PATH. If they aren't, you can specify their location using Unicycler options (e.g. --spades_path).

Bandage isn't required to run Unicycler, but it is very helpful for manually investigating assemblies (the graph images in this README were made with Bandage).

Installation

Install from source

These instructions install the most up-to-date version of Unicycler:

git clone https://github.com/rrwick/Unicycler.git
cd Unicycler
python3 setup.py install

Notes:

If the last command complains about permissions, you may need to run it with sudo.
If you want a particular version of Unicycler, download the source from the releases page instead of cloning from GitHub.
Install just for your user: python3 setup.py install --user
- If you get a strange 'can't combine user with prefix' error, read this.
Install to a specific location: python3 setup.py install --prefix=$HOME/.local
Install with pip (local copy): pip3 install path/to/Unicycler
Install with pip (from GitHub): pip3 install git+https://github.com/rrwick/Unicycler.git
Install with specific Makefile options: python3 setup.py install --makeargs "CXX=icpc"

Build and run without installation

This approach compiles Unicycler code, but doesn't copy executables anywhere:

git clone https://github.com/rrwick/Unicycler.git
cd Unicycler
make

Now instead of running unicycler, you instead use path/to/unicycler-runner.py.

Quick usage

Illumina-only assembly:
unicycler -1 short_reads_1.fastq.gz -2 short_reads_2.fastq.gz -o output_dir

Long-read-only assembly:
unicycler -l long_reads.fastq.gz -o output_dir

Hybrid assembly:
unicycler -1 short_reads_1.fastq.gz -2 short_reads_2.fastq.gz -l long_reads.fastq.gz -o output_dir

If you don't have any reads of your own, take a look in the sample_data directory for links to some small read sets.

Background

Assembly graphs

To understand what Unicycler is doing, you need to know about assembly graphs. For a thorough introduction, I'd suggest this tutorial or the Velvet paper. But in short, an assembly graph is a data structure where contigs aren't disconnected sequences but can have connections to each other:

Just contigs:               Assembly graph:

TCGAAACTTGACGCGAGTCGC                             CTTGTTTA
TGCTACTGCTTGATGATGCGG                            /        \
TGTCCATT                    TCGAAACTTGACGCGAGTCGC          TGCTACTGCTTGATGATGCGG
CTTGTTTA                                         \        /
                                                  TGTCCATT

Most assemblers use graphs internally to produce their assemblies, but users often ignore the graph in favour of the conceptually simpler FASTA file of contigs. When a genome assembly is 100% complete, we have one contig per chromosome/plasmid and there's no real need for the graph. But most short-read assemblies are not complete, and a graph can describe an incomplete assembly much better than contigs alone.

Limitations of short reads

The main reason we can't get a complete assembly from short reads is that DNA usually contains repeats – the same sequence occurring two or more times in the genome. When a repeat is longer than the reads (or for paired-end sequencing, longer than the insert size), it forms a single contig in the assembly graph with multiple connections in and multiple connections out.

Here is what happens to a simple bacterial assembly graph as you add repeats to the genome:

As repeats are added, the graph becomes increasingly tangled (and real assembly graphs get a lot more tangled than that).

To complete a bacterial genome assembly (i.e. find the one correct sequence for each chromosome/plasmid), we need to resolve the repeats. This means finding which way into a repeat matches up with which way out. Short reads don't have enough information for this but long reads do.

SPAdes graphs

Assembly graphs come in many different varieties, but we are particularly interested in the kind produced by SPAdes, because that is what Unicycler uses.

SPAdes graphs are made by performing a de Bruijn graph assembly with a range of different k-mer sizes, from small to large (see the SPAdes paper). Each assembly builds on the previous one, which allows SPAdes to get the advantages of both small k-mer assemblies (a more connected graph) and large k-mer assemblies (better ability to resolve repeats). Two contigs in a SPAdes graph that connect will overlap by their k-mer size (more info on the Bandage wiki page).

After producing the graph, SPAdes can perform further repeat resolution by using paired-end information. Since two reads in a pair are close to each other in the original DNA, SPAdes can use this to trace paths in the graph to form larger contigs (see their paper on ExSPAnder). However, the SPAdes contigs with repeat resolution do not come in graph form – they are only available in a FASTA file.

Method: Illumina-only assembly

When assembling just Illumina reads, Unicycler functions mainly as a SPAdes optimiser. It offers a few benefits over using SPAdes alone:

Tries a wide range of k-mer sizes and automatically selects the best.
Filters out low-depth parts of the assembly to remove contamination.
Applies SPAdes repeat resolution to the graph (as opposed to disconnected contigs in a FASTA file).
Rejects low-confidence repeat resolution to reduce the rate of misassembly.
Trims off graph overlaps so sequences aren't repeated where contigs join.

More information on the Illumina-only assembly process is described in the steps below.

SPAdes assembly

Unicycler uses SPAdes to assemble the Illumina reads into an assembly graph. It tries assemblies at a wide range of k-mer sizes, evaluating the graph at each one. It chooses the graph which best minimises both contig count and dead end count. If the Illumina reads are good, it produces an assembly graph with long contigs but few to no dead ends (more info here). Since a typical bacterial genome has no dead ends (the sequences are circular) an ideal assembly graph won't either.

A raw SPAdes graph can also contain some 'junk' sequences due to sequencer artefacts or contamination, so Unicycler performs some graph cleaning to remove these. Therefore, small amounts of contamination in the Illumina reads should not be a problem.

Multiplicity

To scaffold the graph, Unicycler must distinguish between single copy contigs and repeats. It does this with a greedy algorithm that uses both read depth and graph connectivity:

This process does not assume that all single copy contigs have the same read depth, which allows it to identify single copy contigs from plasmids as well as the chromosome. After it has determined multiplicity, Unicycler chooses a set of 'anchor' contigs. These are sufficiently-long single-copy contigs suitable for bridging in later steps.

Overlap removal

To reduce redundancy and allow for neatly circularised contigs, Unicycler removes all overlap in the graphs:

Before:                                       After:
                   GACGCGTTGACAAGGAAAT                           TGACAAGGAAAT
                  /                                             /
TTGACTACCCAGACGCGT                            TTGACTACCCAGACGCGT
                  \                                             \
                   GACGCGTCCTCTCATTCTA                           CCTCTCATTCTA

Bridging

At this point, the assembly graph does not contain the SPAdes repeat resolution. To apply this to the graph, Unicycler builds bridges between single copy contigs using the path information in the SPAdes assembly.

Bridges are given a quality score, most importantly based on the length of the bridge compared to the length of the paired end insert size, so bridges which span a long repeat are given a low score. Since paired-end sequencing cannot resolve repeats longer than the insert size, bridges which attempt to span long repeats cannot be trusted. This selectivity helps to reduce the number of misassemblies.

Method: long-read-only assembly

When assembling just long reads, Unicycler uses a miniasm+Racon pipeline. It offers a couple advantages over using other long-read-only assemblers:

Multiple rounds of Racon polishing give a good final sequence accuracy.
Circular replicons (like most bacterial chromosomes and plasmids) assemble into circular replicons with no start-end overlap.

More information on the long-read-only assembly process is described in the steps below.

miniasm assembly

Unicycler uses minimap and miniasm to assemble the long reads in essentially the same manner as described in the miniasm README. This produces an uncorrected assembly which is made directly of pieces of reads – the assembly error rate will be similar to the read error rate.

The version of miniasm that comes with Unicycler is slightly modified in a couple of ways. The first modification is to help circular replicons assemble into circular string graphs. The other modification only applies to hybrid assembly, so I'll come back to that!

Racon polishing

After miniasm assembly, Unicycler carries out multiple rounds of polishing with Racon to improve the sequence accuracy. It will polish until the assembly stops improving, as measured by the agreement between the reads and the assembly. Circular replicons are 'rotated' (have their starting position shifted) between rounds of polishing to ensure that no part of the sequence is left unpolished.

Method: hybrid assembly

Hybrid assembly (using both Illumina read and long reads) is where Unicycler really shines. Like with the Illumina-only pipeline described above, Unicycler will produce an Illumina assembly graph. It then uses long reads to build bridges, which often allows it to resolve all repeats in the genome, resulting in a complete genome assembly.

In hybrid assembly, Unicycler carries out all the steps in the Illumina-only pipeline, plus the additional steps below:

Long-read plus contig assembly

This step uses miniasm and Racon, and is very much like the long-read-only assembly method described above. Here however, the assembly is not just on long reads but a mixture of long reads and anchor contigs from the Illumina-only assembly. Since these anchor contigs can often be much longer than long reads (sometimes hundreds of kbp), they can significantly help the assembly. This takes advantage of the other modification to miniasm which was teased above. In Unicycler's miniasm, contigs and long reads are treated slightly differently in the string graph manipulations to better perform this step.

After the assembly is finished, Unicycler finds anchor contigs in the assembled sequence and uses the intervening sequences to create bridges:

assembled sequence:                 TATGGTCTCGCATGTTAATTCTACTCCCGAACTTGGCCCATCCCCGGCTAGGCTGGGCACTAGACGGTGGAT
anchor contigs:                         GTCTCGCATGTTAA    ACTCCCGAACTTGGCCCATCCCCGGC       GGCACTAGACGGTGG
intervening sequences for bridges:                    TTCT                          TAGGCTG

Direct long-read bridging

Unicycler also attempts to make long-read bridges directly by semi-globally aligning the long reads to the assembly graph. For each pair of single copy contigs which are linked by read alignments, Unicycler uses the read consensus sequence to find a connecting path and creates a bridge.

This step and the previous step are somewhat redundant, as both use long reads to build bridges between short-read contigs. They are both included because they have different strengths. The previous approach can tolerate low long-read depth but requires a good short-read assembly graph (i.e. few dead ends). This step requires decent long-read depth but can tolerate poor short-read assembly graphs. By using the two strategies together, Unicycler can successfully handle many types of input.

Bridge application

At this point of the pipeline there can be many bridges, some of which may conflict. Unicycler therefore assigns a quality score to each based on all available evidence (e.g. read alignment quality, graph path match, read depth consistency). Bridges are then applied in order of decreasing quality so whenever there is a conflict, the most supported bridge is used. A minimum quality threshold prevents the application of low evidence bridges (see Conservative, normal and bold for more information).

Finalisation

If the above steps have resulted in any simple, circular sequences, then Unicycler will attempt to rotate/flip them to begin at a consistent starting gene. By default this is dnaA or repA, but users can specify their own with the --start_genes option.

Conservative, normal and bold

Unicycler can be run in three modes: conservative, normal (the default) and bold, set with the --mode option. Conservative mode is least likely to produce a complete assembly but has a very low risk of misassembly. Bold mode is most likely to produce a complete assembly but carries greater risk of misassembly. Normal mode is intermediate regarding both completeness and misassembly risk.

If the structural accuracy of your assembly is paramount to your research, conservative mode is recommended. If you want a completed genome, even if it contains a mistake or two, then use bold mode.

The specific differences between the three modes are as follows:

Mode	Invocation	Short read bridges	Bridge quality threshold	Contig merging
conservative	`‑‑mode conservative`	not used	high (25)	contigs are only merged with bridges
normal	`‑‑mode normal` (or nothing)	used	medium (10)	contigs are merged with bridges and when their multiplicity is 1
bold	`‑‑mode bold`	used	low (1)	contigs are merged wherever possible

In the above example, the conservative assembly is incomplete because some bridges fell below the quality threshold and were not applied. Its contigs, however, are very reliable. Normal mode nearly gave a complete assembly, but a couple of unmerged contigs remain. Bold mode completed the assembly, but since lower confidence regions were bridged and merged, there is a larger risk of error.

Options and usage

Standard options

Run unicycler --help to view the program's most commonly used options:

usage: unicycler [-h] [--help_all] [--version] [-1 SHORT1] [-2 SHORT2] [-s UNPAIRED] [-l LONG] -o OUT
                 [--verbosity VERBOSITY] [--min_fasta_length MIN_FASTA_LENGTH] [--keep KEEP]
                 [-t THREADS] [--mode {conservative,normal,bold}] [--linear_seqs LINEAR_SEQS]

       __
       \ \___
        \ ___\
        //
   ____//      _    _         _                     _
 //_  //\\    | |  | |       |_|                   | |
//  \//  \\   | |  | | _ __   _   ___  _   _   ___ | |  ___  _ __
||  (O)  ||   | |  | || '_ \ | | / __|| | | | / __|| | / _ \| '__|
\\    \_ //   | |__| || | | || || (__ | |_| || (__ | ||  __/| |
 \\_____//     \____/ |_| |_||_| \___| \__, | \___||_| \___||_|
                                        __/ |
                                       |___/

Unicycler: an assembly pipeline for bacterial genomes

Help:
  -h, --help                      Show this help message and exit
  --help_all                      Show a help message with all program options
  --version                       Show Unicycler's version number

Input:
  -1 SHORT1, --short1 SHORT1      FASTQ file of first short reads in each pair
  -2 SHORT2, --short2 SHORT2      FASTQ file of second short reads in each pair
  -s UNPAIRED, --unpaired UNPAIRED
                                  FASTQ file of unpaired short reads
  -l LONG, --long LONG            FASTQ or FASTA file of long reads

Output:
  -o OUT, --out OUT               Output directory (required)
  --verbosity VERBOSITY           Level of stdout and log file information (default: 1)
                                    0 = no stdout, 1 = basic progress indicators, 2 = extra info,
                                    3 = debugging info
  --min_fasta_length MIN_FASTA_LENGTH
                                  Exclude contigs from the FASTA file which are shorter than this
                                  length (default: 100)
  --keep KEEP                     Level of file retention (default: 1)
                                    0 = only keep final files: assembly (FASTA, GFA and log),
                                    1 = also save graphs at main checkpoints,
                                    2 = also keep SAM (enables fast rerun in different mode),
                                    3 = keep all temp files and save all graphs (for debugging)

Other:
  -t THREADS, --threads THREADS   Number of threads used (default: 8)
  --mode {conservative,normal,bold}
                                  Bridging mode (default: normal)
                                    conservative = smaller contigs, lowest misassembly rate
                                    normal = moderate contig size and misassembly rate
                                    bold = longest contigs, higher misassembly rate
  --linear_seqs LINEAR_SEQS       The expected number of linear (i.e. non-circular) sequences in the
                                  underlying sequence (default: 0)

Advanced options

Run unicycler --help_all to see a complete list of the program's options. These allow you to turn off parts of the pipeline, specify the location of tools (only necessary if they are not in PATH) and adjust various settings:

usage: unicycler [-h] [--help_all] [--version] [-1 SHORT1] [-2 SHORT2] [-s UNPAIRED] [-l LONG] -o OUT
                 [--verbosity VERBOSITY] [--min_fasta_length MIN_FASTA_LENGTH] [--keep KEEP]
                 [-t THREADS] [--mode {conservative,normal,bold}] [--min_bridge_qual MIN_BRIDGE_QUAL]
                 [--linear_seqs LINEAR_SEQS] [--min_anchor_seg_len MIN_ANCHOR_SEG_LEN]
                 [--spades_path SPADES_PATH] [--min_kmer_frac MIN_KMER_FRAC]
                 [--max_kmer_frac MAX_KMER_FRAC] [--kmers KMERS] [--kmer_count KMER_COUNT]
                 [--depth_filter DEPTH_FILTER] [--largest_component] [--spades_options SPADES_OPTIONS]
                 [--no_miniasm] [--racon_path RACON_PATH]
                 [--existing_long_read_assembly EXISTING_LONG_READ_ASSEMBLY] [--no_simple_bridges]
                 [--no_long_read_alignment] [--contamination CONTAMINATION] [--scores SCORES]
                 [--low_score LOW_SCORE] [--min_component_size MIN_COMPONENT_SIZE]
                 [--min_dead_end_size MIN_DEAD_END_SIZE] [--no_rotate] [--start_genes START_GENES]
                 [--start_gene_id START_GENE_ID] [--start_gene_cov START_GENE_COV]
                 [--makeblastdb_path MAKEBLASTDB_PATH] [--tblastn_path TBLASTN_PATH]

       __
       \ \___
        \ ___\
        //
   ____//      _    _         _                     _
 //_  //\\    | |  | |       |_|                   | |
//  \//  \\   | |  | | _ __   _   ___  _   _   ___ | |  ___  _ __
||  (O)  ||   | |  | || '_ \ | | / __|| | | | / __|| | / _ \| '__|
\\    \_ //   | |__| || | | || || (__ | |_| || (__ | ||  __/| |
 \\_____//     \____/ |_| |_||_| \___| \__, | \___||_| \___||_|
                                        __/ |
                                       |___/

Unicycler: an assembly pipeline for bacterial genomes

Help:
  -h, --help                      Show this help message and exit
  --help_all                      Show a help message with all program options
  --version                       Show Unicycler's version number

Input:
  -1 SHORT1, --short1 SHORT1      FASTQ file of first short reads in each pair
  -2 SHORT2, --short2 SHORT2      FASTQ file of second short reads in each pair
  -s UNPAIRED, --unpaired UNPAIRED
                                  FASTQ file of unpaired short reads
  -l LONG, --long LONG            FASTQ or FASTA file of long reads

Output:
  -o OUT, --out OUT               Output directory (required)
  --verbosity VERBOSITY           Level of stdout and log file information (default: 1)
                                    0 = no stdout, 1 = basic progress indicators, 2 = extra info,
                                    3 = debugging info
  --min_fasta_length MIN_FASTA_LENGTH
                                  Exclude contigs from the FASTA file which are shorter than this
                                  length (default: 100)
  --keep KEEP                     Level of file retention (default: 1)
                                    0 = only keep final files: assembly (FASTA, GFA and log),
                                    1 = also save graphs at main checkpoints,
                                    2 = also keep SAM (enables fast rerun in different mode),
                                    3 = keep all temp files and save all graphs (for debugging)

Other:
  -t THREADS, --threads THREADS   Number of threads used (default: 8)
  --mode {conservative,normal,bold}
                                  Bridging mode (default: normal)
                                    conservative = smaller contigs, lowest misassembly rate
                                    normal = moderate contig size and misassembly rate
                                    bold = longest contigs, higher misassembly rate
  --min_bridge_qual MIN_BRIDGE_QUAL
                                  Do not apply bridges with a quality below this value
                                    conservative mode default: 25.0
                                    normal mode default: 10.0
                                    bold mode default: 1.0
  --linear_seqs LINEAR_SEQS       The expected number of linear (i.e. non-circular) sequences in the
                                  underlying sequence (default: 0)
  --min_anchor_seg_len MIN_ANCHOR_SEG_LEN
                                  If set, Unicycler will not use segments shorter than this as
                                  scaffolding anchors (default: automatic threshold)

SPAdes assembly:
  These options control the short-read SPAdes assembly at the beginning of the Unicycler pipeline.

  --spades_path SPADES_PATH       Path to the SPAdes executable (default: spades.py)
  --min_kmer_frac MIN_KMER_FRAC   Lowest k-mer size for SPAdes assembly, expressed as a fraction of
                                  the read length (default: 0.2)
  --max_kmer_frac MAX_KMER_FRAC   Highest k-mer size for SPAdes assembly, expressed as a fraction of
                                  the read length (default: 0.95)
  --kmers KMERS                   Exact k-mers to use for SPAdes assembly, comma-separated (example:
                                  21,51,71, default: automatic)
  --kmer_count KMER_COUNT         Number of k-mer steps to use in SPAdes assembly (default: 8)
  --depth_filter DEPTH_FILTER     Filter out contigs lower than this fraction of the chromosomal
                                  depth, if doing so does not result in graph dead ends (default:
                                  0.25)
  --largest_component             Only keep the largest connected component of the assembly graph
                                  (default: keep all connected components)
  --spades_options SPADES_OPTIONS
                                  Additional options to be given to SPAdes (example: "--phred-offset
                                  33", default: no additional options)

miniasm+Racon assembly:
  These options control the use of miniasm and Racon to produce long-read bridges.

  --no_miniasm                    Skip miniasm+Racon bridging (default: use miniasm and Racon to
                                  produce long-read bridges)
  --racon_path RACON_PATH         Path to the Racon executable (default: racon)
  --existing_long_read_assembly EXISTING_LONG_READ_ASSEMBLY
                                  A pre-prepared long-read assembly for the sample in GFA or FASTA
                                  format. If this option is used, Unicycler will skip the
                                  miniasm/Racon steps and instead use the given assembly (default:
                                  perform long-read assembly using miniasm/Racon)

Long-read alignment and bridging:
  These options control the use of long-read alignment to produce long-read bridges.

  --no_simple_bridges             Skip simple long-read bridging (default: use simple long-read
                                  bridging)
  --no_long_read_alignment        Skip long-read-alignment-based bridging (default: use long-read
                                  alignments to produce bridges)
  --contamination CONTAMINATION   FASTA file of known contamination in long reads
  --scores SCORES                 Comma-delimited string of alignment scores: match, mismatch, gap
                                  open, gap extend (default: 3,-6,-5,-2)
  --low_score LOW_SCORE           Score threshold - alignments below this are considered poor
                                  (default: set threshold automatically)

Graph cleaning:
  These options control the removal of small leftover sequences after bridging is complete.

  --min_component_size MIN_COMPONENT_SIZE
                                  Graph components smaller than this size (bp) will be removed from
                                  the final graph (default: 1000)
  --min_dead_end_size MIN_DEAD_END_SIZE
                                  Graph dead ends smaller than this size (bp) will be removed from the
                                  final graph (default: 1000)

Assembly rotation:
  These options control the rotation of completed circular sequence near the end of the Unicycler
  pipeline.

  --no_rotate                     Do not rotate completed replicons to start at a standard gene
                                  (default: completed replicons are rotated)
  --start_genes START_GENES       FASTA file of genes for start point of rotated replicons (default:
                                  start_genes.fasta)
  --start_gene_id START_GENE_ID   The minimum required BLAST percent identity for a start gene search
                                  (default: 90.0)
  --start_gene_cov START_GENE_COV
                                  The minimum required BLAST percent coverage for a start gene search
                                  (default: 95.0)
  --makeblastdb_path MAKEBLASTDB_PATH
                                  Path to the makeblastdb executable (default: makeblastdb)
  --tblastn_path TBLASTN_PATH     Path to the tblastn executable (default: tblastn)

Output files

Unicycler's most important output files are assembly.gfa, assembly.fasta and unicycler.log. These are produced by every Unicycler run. Which other files are saved to its output directory depends on the value of --keep:

--keep 0 retains only the important files. Use this setting to save drive space.
--keep 1 (the default) also saves some intermediate graphs which can be useful for investigating an assembly more deeply.
--keep 2 also retains the SAM file of long-read alignments to the graph. This ensures that if you rerun Unicycler with the same output directory (for example changing the mode to conservative or bold) it will run faster because it does not have to repeat the alignment step.
--keep 3 retains all files and saves many intermediate graphs. This is for debugging purposes and uses a lot of space, so most users should probably avoid this setting.

All files and directories are described in the table below. Intermediate output files (everything except for assembly.gfa, assembly.fasta and unicycler.log) will be prefixed with a number so they are in chronological order. Whether or not a file is in the output depends on the --keep level and type of input reads (e.g. short-read-only or hybrid).

File/directory	Description	`--keep` level
`spades_assembly/`	directory containing SPAdes files log (can be useful for debugging if SPAdes crashes)	3
`_spades_graph_k.gfa`	unaltered SPAdes assembly graphs at each k-mer size	1
`*_depth_filter.gfa`	best SPAdes short-read assembly graph after low-depth contigs have been removed and multiplicity determination	1
`*_overlaps_removed.gfa`	overlap-free version of the best SPAdes graph, with some more graph clean-up	1
`miniasm_assembly/`	directory containing miniasm string graphs and unitig graphs	3
`simple_bridging/`	directory containing files for the simple long-read bridging step	3
`*_long_read_assembly.gfa`	the long-read+contig miniasm+Racon assembly	1
`read_alignment/`	directory containing `long_read_alignments.sam`	2
`*_bridges_applied.gfa`	bridges applied, before any cleaning or merging	1
`*_cleaned.gfa`	redundant contigs removed from the graph	3
`*_merged.gfa`	contigs merged together where possible	3
`*_final_clean.gfa`	more redundant contigs removed	1
`blast/`	directory containing files for the assembly-rotation BLAST search	3
`*_rotated.gfa`	circular replicons rotated and/or flipped to a start position	1
`assembly.gfa`	final assembly in GFA v1 graph format	0
`assembly.fasta`	final assembly in FASTA format (same sequences as in assembly.gfa expect for very short contigs)	0
`unicycler.log`	Unicycler log file (same info as was printed to stdout)	0

Tips

Running time

Unicycler is thorough and accurate, but not particularly fast. For hybrid assemblies, the direct long-read bridging step of the pipeline can take a while to complete. Two main factors influence the running time: the number of long reads (more reads take longer to align) and the genome size/complexity (finding bridge paths is more difficult in complex graphs).

Unicycler may only take an hour or so to assemble a small, simple genome with low depth long reads. On the other hand, a complex genome with many long reads may take 12 hours to finish or more. If you have a very high depth of long reads (e.g. >100×), you can make Unicycler run faster by subsampling for only the best/longest reads (check out Filtlong).

Using a lot of threads (with the --threads option) can make Unicycler run faster too. It will only use up to 8 threads by default, but if you're running it on a big machine with lots of CPU and RAM, feel free to use more!

Unicycler also works with PyPy which can speed up parts of its pipeline. However, some of Unicycler's slowest steps are when it calls other tools (like SPAdes) or uses C++ code, so PyPy may not help much. I haven't tested this thoroughly – if you try it, let me know how you go!

Necessary read length

The length of a long read is very important, typically more than its accuracy, because longer reads are more likely to align to multiple single copy contigs, allowing Unicycler to build bridges.

Consider a sequence with a 2 kb repeat:

In order to resolve the repeat, a read must span it by aligning to some sequence on either side. In this example, the 1 kb reads are shorter than the repeat and are useless. The 2.5 kb reads can resolve the repeat, but they have to be in just the right place to do so. Only one out of the six in this example is useful. The 5 kb reads, however, have a much easier time spanning the repeat and all three are useful.

So how long must your reads be for Unicycler to complete an assembly? Longer than the longest repeat in the genome. Depending on the genome, that might be a 1 kb insertion sequence, a 6 kb rRNA operon or a 50 kb prophage. If your reads are just a bit longer than the longest repeat, you'll probably need a lot of them. If they are much longer, then fewer reads should suffice. But in any scenario, longer is better!

Bad Illumina reads

Unicycler prefers decent Illumina reads as input – ideally with uniform read depth and 100% genome coverage. Bad Illumina read sets can still work in Unicycler, but greater long-read depth will be required to compensate.

You can look at Unicycler graphs in Bandage to get a quick impression of the Illumina read quality:

A is an very good Illumina read graph – the contigs are long and there are no dead ends. This read set is ideally suited for use in Unicycler and shouldn't require too many long reads to complete (10–20× would probably be enough).

B is also a good graph. The genome is more complex, resulting in a more tangled structure, but there are still very few dead ends (you can see one in the lower left). This read set would also work well in Unicycler, though more long reads may be required to get a complete genome (maybe 30× or so).

C is a disaster! It is broken into many pieces, probably because parts of the genome got no read depth at all. This genome may take lots of long reads to complete in Unicycler, possibly 50× or more. The final assembly will probably have more small errors (SNPs and indels), as parts of the genome cannot be polished well with Illumina reads. If your graph looks like this, I'd recommend trying a long-read-first assembly approach (see 2022 update).

Very short contigs

Confused by very small (e.g. 2 bp) contigs in Unicycler assemblies? Unlike a SPAdes graph where neighbouring sequences overlap by their k-mer size, Unicycler's final graph has no overlaps and the sequences adjoin directly. This means that contigs in complex regions can be quite short. They may be useless as stand-alone contigs but are still important in the graph structure.

If short contigs are a problem for your downstream analysis, you can use the --min_fasta_length to exclude them from Unicycler's FASTA file (they will still be included in the GFA file).

Chromosomes and plasmid depth

Unicycler normalises the depth of contigs in the graph to the median value. This typically means that the chromosome has a depth near 1× and plasmids may have different (typically higher) depths.

In the above graph, the chromosome is at the top (you can only see part of it) and there are two plasmids. The plasmid on the left occurs in approximately 4 or 5 copies per cell. For the larger plasmid on the right, most cells probably had one copy but some had more. Since sequencing biases can affect read depth, these per cell counts should be interpreted loosely.

Known contamination

If your long reads have known contamination, you can use the --contamination option to give Unicycler a FASTA file of the contaminant sequences. Unicycler will then discard any reads for which the best alignment is to the contaminant.

For example, if you've sequenced two isolates in succession on the same Nanopore flow cell, there may be residual reads from the first sample in the second run. In this case, you can supply a reference/assembly of the first sample to Unicycler when assembling the second sample.

Some Oxford Nanopore protocols include a lambda phage spike-in as a control. Since this is a common contaminant, you can simply use --contamination lambda to filter these out (no need to supply a FASTA file).

Manual multiplicity

If Unicycler makes a serious mistake during its multiplicity determination, this can have detrimental effects on the rest of the assembly. I've seen this happen when:

the Illumina graph is badly fragmented (multiplcity determination has few graph connections to work with).
there are multiple very similar plasmids in the genome (shared sequences between plasmids can be huge, 10s of kbp).
there is genomic heterogeneity.

If you believe this has happened in your assembly, you can manually assign multiplicities and try the assembly again. Here's the process:

View the short read assembly (002_depth_filter.gfa) in Bandage and view the region in question. Note that Unicycler's graph colour scheme uses green for single-copy segments and yellow/orange/red for multi-copy segments.
For any segments where you disagree with Unicycler's multiplicity, add a ML tag to the GFA segment line in 002_depth_filter.gfa. Examples:
- If Unicycler called segment 50 single-copy but you think it's actually a 2-copy repeat, add ML:i:2 to the end of the GFA line starting with S 50.
- If Unicycler called segment 107 multi-copy but you think it's actually single-copy, add ML:i:1 to the end of the GFA line starting with S 107.
Run Unicycler again, pointing to the same output directory (with your modified 002_depth_filter.gfa file). It will take your manually assigned multiplicities into account and hopefully do better!

Manual completion

If Unicycler doesn't complete your bacterial genome assembly on its own, you may be able to complete it manually with a bit of bioinformatics detective work. There's no single, straight-forward procedure for doing so, but I've put together a few examples on the Unicycler wiki which may be helpful.

Using an external long-read assembly

If you have a long-read assembly that you've prepared outside Unicycler and trust (e.g. with Canu), you can give it to Unicycler with --existing_long_read_assembly. Unicycler will then skip its miniasm/Racon step and use this assembly instead.

Assemblies with contig overlaps

Unicycler removes overlaps between contigs, resulting in cleaner assembly graphs. However, in some contexts, you might want these overlaps. In particular, if you are analysing your assemblies with a k-mer-based algorithm, overlaps might be a good thing so k-mers at contig boundaries aren't lost.

If this applies to you, I'd recommend using Unicycler's 002_depth_filter.gfa file (the last of the intermediate files before overlaps are removed) instead of the final assembly.fasta file. If you need this in FASTA format, Torsten's any2fasta tool can do the conversion.

Acknowledgements

Unicycler would not have been possible without Kat Holt, my fellow researchers in her lab and the many other people I work with at the University of Melbourne's Bio21 Molecular Science & Biotechnology Institute. In particular, Margaret Lam, Kelly Wyres, David Edwards and Claire Gorrie worked with me on many challenging genomes during Unicycler's development. Louise Judd is great with the MinION and produced many of the long reads I have used when developing Unicycler.

Unicycler uses SeqAn to perform alignments and other sequence manipulations. The authors of this library have been very helpful during Unicycler's development and I owe them a great deal of thanks! It also uses minimap for alignment and miniasm for long-read assembly, and so I'd like to thank Heng Li for these tools. Finally, Unicycler uses nanoflann, a delightfully fast and lightweight nearest neighbour library, to perform its line-finding in semi-global alignment.

License

GNU General Public License, version 3

unicycler's People

Contributors

Stargazers

Watchers

Forkers

txje bgruening abremges asdcid cerebis swlong shyamrallapalli natefoo kblin golden75 ksnavely esteinig palc zhangrengang hpcbio fw1121 jianzuoyi sanvva sjackman raoyutinga baohongz eugenekim76 ravinpoudel cheberling rzzju tw7649116 skerker kevyin aedecano pythseq devvyn pollend jiaxing-yang toms1234 happymyworld superkits caizhaohui onecodex xjyx buyaowangle123 isabellepotterill macgenomegue huangziyan11111 macogwang peterk87 tseemann jianguozhou3 ambreen63 laxeye ypchan plesan nilad nagolazaro bazante1 glin0311 ibharvey liy34544 deepstatsanalysis piroonj frbot pjeraldo austinhartman luciernag monicambabazi maheshjethalia sumeettiwari07 nkleinbo abscibio berebolledo jflot jwebster89 abdo3a zxgsy520 besykes luke-dt loyalhow tirganteanga shehongbing bdgp simakro buihoangphuc412 sushiatgit xthua alexiswl kmnwn jvera888 clf-bio arbai yananzh aistbmrg xtmgah gaworj mingjuhao wook2014 beaupatrick 1383385 trytoolchest rpucheq cvn001 echo0725

unicycler's Issues

Public docker image

It might be worth noting in the Readme that you can find a publicly accessible Docker image for Unicycler at quay.io/biocontainers/unicycler. That can be helpful when there's a program with so many dependencies. You can also include a note on running an assembly with the docker image in a single line, e.g.:

docker run --rm -v $PWD:/share quay.io/biocontainers/unicycler:0.3.0b--py35_1 /bin/bash -c "cd /share; unicycler -1 read1.fastq.gz -2 read2.fastq.gz -l long_reads.fastq.gz -o output_dir"

small contigs in assembly.fasta

hi, i tried to run unicycler combined the data of illumina and pacbio. The log file showed the size of chromosome is about 4M. however, only 800k or smaller fragment was showed in the assembly.fasta. the bold mode already used.

Xiaoting

Error: miniasm assembly failed

Assembling contigs and long reads with miniasm (2017-07-24 18:16:16)

Saving to /miniasm_assembly/01_assembly_reads.fastq:
  61,957 long reads

Finding overlaps with minimap... 
success
  520,428,235 overlaps

Assembling reads with miniasm... 
empty result
Error: miniasm assembly failed

Is there any more info on why this happened? Because that's all it says.
Thx.

Makefile for minimap is overly specific

Compilation options supplied to GCC are overly specific by using march=native. In heterogeneous execution environments, builds on one system will fail to run on machines with differing architecture.

It would be better if this was left to documentation and perhaps the default made no CPU specification or instead used mtune=native, so as to avoid errors from the shared lib that end-users may not easily interpret.

Problem encoding the character '\u2190'

Hi Ryan,

We had some troubles in our HPC with the character '\u2190' which draws an arrow in the output given by the program. It raised the following error:
UnicodeEncodeError: 'ascii' codec can't encode character '\u2190' in position 48: ordinal not in range(128)

We solved the problem just replacing the character by a semicolon in the libraries. We are not sure if there is a more generic solution.

Thanks for this amazing software,

Sergio.

Assertion error (Cleaning graph - Unicycler v.0.4.0)

Hi Ryan,

I was testing the new version of Unicycler in some bacterial isolates using ONT and Illumina reads. In some cases Unicycler raised the following error.

File "/hpc/local/CentOS7/dla_mm/tools/miniconda2/envs/unicycler/bin/unicycler", line 11, in
load_entry_point('unicycler==0.4.0', 'console_scripts', 'unicycler')()
File "/hpc/local/CentOS7/dla_mm/tools/miniconda2/envs/unicycler/lib/python3.5/site-packages/unicycler/unicycler.py", line 92, in main
clean_up_spades_graph(graph)
File "/hpc/local/CentOS7/dla_mm/tools/miniconda2/envs/unicycler/lib/python3.5/site-packages/unicycler/unicycler.py", line 987, in clean_up_spades_graph
graph.expand_repeats()
File "/hpc/local/CentOS7/dla_mm/tools/miniconda2/envs/unicycler/lib/python3.5/site-packages/unicycler/assembly_graph.py", line 2162, in expand_repeats
self.segments[in_seg].trim_from_end(len(common_end))
File "/hpc/local/CentOS7/dla_mm/tools/miniconda2/envs/unicycler/lib/python3.5/site-packages/unicycler/assembly_graph_segment.py", line 141, in trim_from_end
assert self.get_length() >= amount
AssertionError

The same isolates were assembled using Unicycler v.0.3.1 without any error, so I suspect it must be a modification introduced in the algorithm in Unicycler v0.4.0.

Thanks for the latest release of Unicycler! Great assembler :)

Sergio.

problem setup unicycler

Hi,

Sorry I am new to this, so I am probably doing something stupid. I am trying to setup unicycler in a python 3.5 environment and under gcc 4.9.2. The setup seemed to finish fine with the big unicycler picture, but when I tried unicycler --help I get:

Traceback (most recent call last):
File "./unicycler", line 11, in
load_entry_point('unicycler==0.4.0', 'console_scripts', 'unicycler')()
File "/home/una/applications/anaconda3/envs/py35/lib/python3.5/site-packages/setuptools-27.2.0-py3.5.egg/pkg_resources/init.py", line 565, in load_entry_point
File "/home/una/applications/anaconda3/envs/py35/lib/python3.5/site-packages/setuptools-27.2.0-py3.5.egg/pkg_resources/init.py", line 2598, in load_entry_point
File "/home/una/applications/anaconda3/envs/py35/lib/python3.5/site-packages/setuptools-27.2.0-py3.5.egg/pkg_resources/init.py", line 2258, in load
File "/home/una/applications/anaconda3/envs/py35/lib/python3.5/site-packages/setuptools-27.2.0-py3.5.egg/pkg_resources/init.py", line 2264, in resolve
File "/home/una/applications/anaconda3/envs/py35/lib/python3.5/site-packages/unicycler/unicycler.py", line 24, in
from .assembly_graph import AssemblyGraph
File "/home/una/applications/anaconda3/envs/py35/lib/python3.5/site-packages/unicycler/assembly_graph.py", line 20, in
from .assembly_graph_segment import Segment
File "/home/una/applications/anaconda3/envs/py35/lib/python3.5/site-packages/unicycler/assembly_graph_segment.py", line 19, in
from .bridge_long_read import LongReadBridge
File "/home/una/applications/anaconda3/envs/py35/lib/python3.5/site-packages/unicycler/bridge_long_read.py", line 27, in
from .path_finding import get_best_paths_for_seq
File "/home/una/applications/anaconda3/envs/py35/lib/python3.5/site-packages/unicycler/path_finding.py", line 23, in
from .cpp_wrappers import fully_global_alignment, path_alignment
File "/home/una/applications/anaconda3/envs/py35/lib/python3.5/site-packages/unicycler/cpp_wrappers.py", line 28, in
C_LIB = CDLL(SO_FILE_FULL)
File "/home/una/applications/anaconda3/envs/py35/lib/python3.5/ctypes/init.py", line 347, in init
self._handle = _dlopen(self._name, mode)
OSError: /home/una/applications/anaconda3/envs/py35/lib/python3.5/site-packages/unicycler/cpp_functions.so: undefined symbol: gzopen

I googled around, some people suggest it maybe zlib related, but I have zlip installed (1.2.8, I think), so probably wrong.

Thanks for your help.
-una

Question: How many iterations of Racon polishing should I expect???

Hello: Right now, I am performing a hybrid assembly of Illumina MiSEQ & ONT minION R9 reads.

Thanks to the great documentation for Unicycler, everything has been straightforward getting all of the prerequisites built, etc. & submitting the primary command for the pipeline.

I have noticed, however, that it has been polishing w/Racon for some time now (~18 days over a total of 10 iterations).

The command that I submitted was the following:
unicycler -t 22 -1 R1_001.fastq -2 R2_001.fastq -l ONT_Reads_DeDuped.fq -o HYBRID --verbosity 2 --vcf --linear_seqs 3

While hardware platforms & OSes differ, I was just wondering how many iterations of polishing with Racon I could potentially expect as the pipeline completes. If I recall correctly, Racon is version 0.5.0 from when I built the pipeline.

Thank you kindly for your amazing documentation & remarkable application!
-J

Incorrect parsing of java 1.8 version in 0.3.1

Hi Ryan.

When running java -version on our system we get the following:

openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)

In your current code, it parses out the version to be openjdk, leading unicycler to fail the java dependency with a ? and the comment too old.

Perhaps some regex would help here:

import re
# some subprocess code here to run java -version and capture stdout to a 
# variable called java_version
java_pat = re.compile(r'([01])\.([1-8])\.([0-9])_([0-9]{0,3})')
obs_java_version = java_pat.findall(java_version.decode())
# On our system, you get two hits here:
# [('1', '8', '0', '131'), ('1', '8', '0', '131')]
# Some further processing of the output from java -version to pick out just
# the first line before applyting the regex might work best. But, each tuple 
# already has your major and sub-versions already parsed out.

Cheers.

Short contigs

Thanks for the brilliant tool. The only small thing I've noticed is that there are sometimes very short contigs in the final assembly:

>9 length=3 depth=0.73x
ACA
>10 length=1 depth=0.73x
C

Would it be possible to automatically filter these out (perhaps anything less than the median short read length)?

Corrected or Raw Nanopore reads?

Hello!

Could you recommend what kind of nanopore reads will be suitable for analysis raw reads or corrected reads by nanocorr ?

Thanks!

call to uniclycler fail when loading C++ functions

Hi,
I just installed Uniclycler using the following options:

python3 setup.py install --prefix=$HOME/.local --makeargs "CXX=/share/apps/gcc-6.2.0/bin/g++"

Now when I try a simple unicycler -h, here is what I get (not nice):

Traceback (most recent call last):
  File "/home/ucbtass/bin/unicycler", line 11, in <module>
    load_entry_point('unicycler==0.4.0', 'console_scripts', 'unicycler')()
  File "/share/apps/python-3.4.2/lib/python3.4/site-packages/pkg_resources/__init__.py", line 560, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/share/apps/python-3.4.2/lib/python3.4/site-packages/pkg_resources/__init__.py", line 2648, in load_entry_point
    return ep.load()
  File "/share/apps/python-3.4.2/lib/python3.4/site-packages/pkg_resources/__init__.py", line 2302, in load
    return self.resolve()
  File "/share/apps/python-3.4.2/lib/python3.4/site-packages/pkg_resources/__init__.py", line 2308, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "/home/ucbtass/.local/lib/python3.4/site-packages/unicycler/unicycler.py", line 24, in <module>
    from .assembly_graph import AssemblyGraph
  File "/home/ucbtass/.local/lib/python3.4/site-packages/unicycler/assembly_graph.py", line 20, in <module>
    from .assembly_graph_segment import Segment
  File "/home/ucbtass/.local/lib/python3.4/site-packages/unicycler/assembly_graph_segment.py", line 19, in <module>
    from .bridge_long_read import LongReadBridge
  File "/home/ucbtass/.local/lib/python3.4/site-packages/unicycler/bridge_long_read.py", line 27, in <module>
    from .path_finding import get_best_paths_for_seq
  File "/home/ucbtass/.local/lib/python3.4/site-packages/unicycler/path_finding.py", line 23, in <module>
    from .cpp_wrappers import fully_global_alignment, path_alignment
  File "/home/ucbtass/.local/lib/python3.4/site-packages/unicycler/cpp_wrappers.py", line 28, in <module>
    C_LIB = CDLL(SO_FILE_FULL)
  File "/share/apps/python-3.4.2/lib/python3.4/ctypes/__init__.py", line 351, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by /home/ucbtass/.local/lib/python3.4/site-packages/unicycler/cpp_functions.so)

any idea to fix this?
Many thanks
F

Problem with Debian 9 installation

Hi, I am having issues installing unicycler in debian 9
Could I get some assistance? I already installed all the requirements but I keep on getting this error..

debian@debian-cris:/mnt/jiao/nanopore/Unicycler$ make
Platform: Linux
Compiler: g++ 6.3.0
g++ -std=c++14 -Iunicycler/include -fPIC -lrt -lpthread -O3 -DNDEBUG -Wall -Wextra -pedantic -march=native -c -o unicycler/src/minimap_align.o unicycler/src/minimap_align.cpp
unicycler/src/minimap_align.cpp:15:18: fatal error: zlib.h: No such file or directory
#include <zlib.h>
^
compilation terminated.
Makefile:114: recipe for target 'unicycler/src/minimap_align.o' failed
make: *** [unicycler/src/minimap_align.o] Error 1

**Any ideas?

Thanks!**

version 0.4.0 not in pypi ?

unicycler --version
Unicycler v0.3.1

pip3 install --upgrade unicycler
Requirement already up-to-date: unicycler in /home/linuxbrew/.linuxbrew/lib/python3.6/site-packages

The method strip_read_extensions() is overly greedy.

Since modern OS don't impose a constraint on the use of ".", it would perhaps be better if this method did not assume it is only used to demarcate suffixes and greedily remove them all. The dot character seems to increasingly creep into file name usage.

With the current approach, files can be named in such a way that both R1 and R2 become identically named. Unicycler does not appear to notice the resulting name clash.

The event of a name clash should probably be detected (IE stripped R1 != stripped R2) and perhaps the stripping method could be rewritten with less possibility for unintended actions.

Unicycler Thinks Spades Version is Incorrect

Hi. Trying to run unicycler locally on my laptop (Macbook Pro, MacOS Sierra) and when directed to the spades path (version 3.10.1-Darwin) using

--spades_path /directory/directory/SPAdes-3.10.1-Darwin/bin/spades.py

unicycler returns the error saying that the version of spades is too old. Cannot work out at all how to sort get past this bit so any advice would be great!

retain bubbles in assembly graph

Hi, I am using unicycler assembler for yeast pseudo-diploid genome using short illumina reads? I would like to retain bubbles in the graph, I am using no_correct option, but I dont see the bubbles in graph?
How can I retain bubbles in the graph?

Attached snapshot

Conda package

We would like to wrap Unicycler as a galaxy (http://usegalaxy.org) tool. We resolve tool dependecies via conda (https://docs.galaxyproject.org/en/master/admin/conda_faq.html).

are you planning on creating a conda package, or
can you tell us what the deps are. from what I gather so far these are:
- zlib1g-dev
- bowtie2
- pilon
- blast
- samtools
- spades

Thanks for writing a great utility!

Java problem

Hi I am trying to run unicycler but I get this error:
**Dependencies:
Program Version Status
spades.py 3.9.1 good
makeblastdb 2.6.0+ good
tblastn 2.6.0+ good
bowtie2-build 2.3.0 good
bowtie2 2.3.0 good
samtools 1.3.1 good
java ? too old
pilon 1.22 good

Error: Unspecified error with Unicycler dependencies**

Is this a problem with my Java?? This is the output of java -version

openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-2-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)

Thanks!

Error: could not find SPAdes at spades.py

Hi, I am trying to use Unicycler. I have installed the dependencies needed but it seems he is not able to find spades (altough it is in the path and I have tested it and it is working fine). Can you help me with this?

Some doubts about Unicycler

Hi,
Unicycler is really a great pipeline with many useful tools. Although, I still have some doubts about its application and some steps, like the graph scaffolding and bridging:

Does Unicycler only use for bacterial genomes? How about a small diploid genome?
Long read bridging says it makes long-read bridges directly by semi-globally aligning the long reads to the assembly graph. In Unicycler pipeline, I find the assembly graph can be generated by spades. Is there any other way to get an assembly graph?

Thanks,
Ryan

not declared in this scope

Can someone point me in the right direction on this? I'm trying a make without the python setup at the moment:

[root@machine ~/Unicycler-0.4.1]# make
Platform: Linux
Compiler: g++ 4.9.3
g++ -std=c++14 -Iunicycler/include -fPIC -lrt -lpthread -O3 -DNDEBUG -Wall -Wextra -pedantic -mtune=native -c -o unicycler/src/consensus_align.o unicycler/src/consensus_align.cpp
g++ -std=c++14 -Iunicycler/include -fPIC -lrt -lpthread -O3 -DNDEBUG -Wall -Wextra -pedantic -mtune=native -c -o unicycler/src/global_align.o unicycler/src/global_align.cpp
g++ -std=c++14 -Iunicycler/include -fPIC -lrt -lpthread -O3 -DNDEBUG -Wall -Wextra -pedantic -mtune=native -c -o unicycler/src/kmers.o unicycler/src/kmers.cpp
g++ -std=c++14 -Iunicycler/include -fPIC -lrt -lpthread -O3 -DNDEBUG -Wall -Wextra -pedantic -mtune=native -c -o unicycler/src/miniasm/asg.o unicycler/src/miniasm/asg.cpp
unicycler/src/miniasm/asg.cpp: In function 'int cut_biloops(asg_t*, int)':
unicycler/src/miniasm/asg.cpp:294:30: error: 'UINT32_MAX' was not declared in this scope
         uint32_t nv, nw, w = UINT32_MAX, x, ov = 0, ox = 0;
                              ^
unicycler/src/miniasm/asg.cpp:299:9: error: 'x' was not declared in this scope
         x = (uint32_t)a.a[a.n - 1] ^ 1;
         ^
unicycler/src/miniasm/asg.cpp:307:31: error: 'ox' was not declared in this scope
             if (aw[i].v == x) ox = aw[i].ol;
                               ^
unicycler/src/miniasm/asg.cpp:308:31: error: 'ov' was not declared in this scope
             if (aw[i].v == v) ov = aw[i].ol;
                               ^
unicycler/src/miniasm/asg.cpp:310:13: error: 'ov' was not declared in this scope
         if (ov == 0 && ox == 0) continue;
             ^
unicycler/src/miniasm/asg.cpp:310:24: error: 'ox' was not declared in this scope
         if (ov == 0 && ox == 0) continue;
                        ^
unicycler/src/miniasm/asg.cpp:311:13: error: 'ov' was not declared in this scope
         if (ov > ox) {
             ^
unicycler/src/miniasm/asg.cpp:311:18: error: 'ox' was not declared in this scope
         if (ov > ox) {
                  ^
make: *** [unicycler/src/miniasm/asg.o] Error 1
[root@machine ~/Unicycler-0.4.1]#

Support interleaved paired-end reads [feature request]

Hi, Ryan. Please consider supporting assembling interleaved paired-end reads in one FASTQ file. Thanks!

using Ion torrent and nanopore reads

I am trying to use unicycler for my reads obtained from Ion PGM and nanopore. However, it fails at spades reads error correction.

Any suggestions how I could overcome this problem

Thanks
Suresh

Command: /usr/local/bin/unicycler -s smithPGM.fastq -l smithNP-PCBio-cor.fastq --no_rotate --no_miniasm -o smithunicycler1

Unicycler version: v0.4.0
Using 4 threads

The output directory already exists and files may be reused or overwritten:
/Users/Suresh/Desktop/smithunicycler1

Dependencies:
Program Version Status
spades.py 3.10.1 good
racon not used
makeblastdb not used
tblastn not used
bowtie2-build ? good
bowtie2 ? good
samtools 1.5 good
java 1.8.0_131 good
pilon 1.22 good
bcftools not used

SPAdes read error correction (2017-07-19 14:14:43)
Unicycler uses the SPAdes read error correction module to reduce the number
of errors in the short read before SPAdes assembly. This can make the assembly
faster and simplify the assembly graph structure.

Command: /usr/local/bin/spades.py -s /Users/Suresh/Desktop/smithPGM.fastq -o /Users/Suresh/Desktop/smithunicycler1/spades_assembly/read_correction --threads 4 --only-error-correction
Error: SPAdes read error correction failed

feature suggestion/question

Hi,

I've been doing something similar with the combination of short and long reads, and with further pilon refinement (just that instead of a single pass I used 2-3 pilon passes). In my some of the I had more than one short read dataset ie. one set of 150 x 2 and another one of 250 x 2 to which I then added nanopore reads. As the spades. Theoretically Expander (if I understand it accurately) would have best performance with a monodisperse or a very narrow insert size so I didn't want to simply cat the two datasets into a single read library; in these cases SPAdes actually provides the command line options to input several short read libraries -- would it feasible and would it make sense to add a similar feature to Unicycler? (From the SPAdes command line it seems possible) but I'm not sure for the rest of the pipeline.

Sorry for writing this here and thanks! (If you find this post inappropriate here, I will close the issue immediately)

Failing in the Loading Reads step. IndexError: list index out of range??

I am getting the following error in the Loading Reads step:
Loading reads (2017-07-03 07:52:59)
114,292 / 114,292 (100.0%) - 148,471,758 bpTraceback (most recent call last):
File "/mnt/jiao/nanopore/Unicycler/unicycler-runner.py", line 21, in
main()
File "/mnt/jiao/nanopore/Unicycler/unicycler/unicycler.py", line 134, in main
read_dict, read_names, long_read_filename = load_long_reads(args.long)
File "/mnt/jiao/nanopore/Unicycler/unicycler/read_ref.py", line 126, in load_long_reads
original_name = line.strip()[1:].split()[0]
IndexError: list index out of range

Any ideas?

Thanks!

assembly.fasta is empty?

Started working on my third genome with Unicycler. First two worked well, ONT & Illumina data.

Took a stab at some old PacBio data. Everything seemed to work, but in the end, the assembly.fasta file that is generated is empty. Screenshot attached. Let me know what other log info might be helpful.

SPAdes crashes on small paired-end read assembly

Hi @rrwick,

I am investigating if unicycler can be used for a slightly different purpose. I've been given some structural variants called with SNIFFLES from ONT reads and I also have Illumina paired reads available. I wondered if unicycler could be used for local reassembly of the investigated regions refining break points and catching false positives with the help of accurate Illumina reads.

I extracted 1034 short read fastq sequences (517 pairs) for a specific region of about 3000bp and called unicycler with default parameters for paired-end short reads (and 8 long reads). Dependencies are all reported to be good.

At the SPAdes assemblies section unicycler crashes because it cannot find the contigs.paths file.
Stack Trace:

Traceback (most recent call last):
   File "Unicycler-0.4.1/unicycler-runner.py", line 21, in <module>
      main()
   File "/nfs/odinn/tmp/svenjam/Unicycler-0.4.1/unicycler/unicycler.py", line 87, in main
      args.no_correct, args.linear_seqs)
   File "/nfs/odinn/tmp/svenjam/Unicycler-0.4.1/unicycler/spades_func.py", line 178, in get_best_spades_graph
      insert_size_deviation=insert_size_deviation)
   File  "/nfs/odinn/tmp/svenjam/Unicycler-0.4.1/unicycler/assembly_graph.py", line 68, in __init__
      self.load_spades_paths(paths_file)
   File  "/nfs/odinn/tmp/svenjam/Unicycler-0.4.1/unicycler/assembly_graph.py", line 170, in load_spades_paths
       paths_file = open(filename, 'rt')
FileNotFoundError: [Errno 2] No such File or directory:  '/nfs/odinn/tmp/svenjam/sv.ins.1288233/output/spades_assembly/assembly/contigs_paths'

I don't know if its the fault of my data, any idea? (Read pairs are identified by the same name-id)

Side note: If a treat the paired reads as unpaired by concatenating the two files and using unicycler with the s parameter (which of course doesn't make much sense for the assembly) the program doesn't fail.

OSError: cpp_functions.so: undefined symbol: gzopen

hi Ryan,
I have successfully installed Unicycler. But I get the following run error:

Traceback (most recent call last):
  File "./unicycler-runner.py", line 18, in <module>
    from unicycler.unicycler import main
  File "/opt/NGS_tools/Unicycler/unicycler/unicycler.py", line 25, in <module>
    from .assembly_graph import AssemblyGraph
  File "/opt/NGS_tools/Unicycler/unicycler/assembly_graph.py", line 18, in <module>
    from .assembly_graph_segment import Segment
  File "/opt/NGS_tools/Unicycler/unicycler/assembly_graph_segment.py", line 19, in <module>
    from .bridge_long_read import LongReadBridge
  File "/opt/NGS_tools/Unicycler/unicycler/bridge_long_read.py", line 25, in <module>
    from .cpp_wrappers import consensus_alignment
  File "/opt/NGS_tools/Unicycler/unicycler/cpp_wrappers.py", line 27, in <module>
    C_LIB = CDLL(SO_FILE_FULL)
  File "/opt/anaconda3/lib/python3.5/ctypes/__init__.py", line 347, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /opt/NGS_tools/Unicycler/unicycler/cpp_functions.so: undefined symbol: gzopen

I'm using Ubuntu 16.04 with g++ (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0.

zlib1g/xenial-updates,now 1:1.2.8.dfsg-2ubuntu4.1 amd64 [installed]
  compression library - runtime
zlib1g-dev/xenial-updates,now 1:1.2.8.dfsg-2ubuntu4.1 amd64 [installed,automatic]
  compression library - development

Is there a problem with my dependencies? thanks.

regards,
tomas

No module named 'pkg_resources'

Hi I have installed Unicycler on a CentOS server but when trying to execute "unicycler" it raises the following error:

Traceback (most recent call last):
  File "/usr/local/bin/unicycler", line 6, in <module>
    from pkg_resources import load_entry_point
ImportError: No module named 'pkg_resources'

I've been trying to fix this problem several ways, including what is explained in this StackOverflow thread
And also tried to uninstall and install again setuptools and even pip and pip3

Nothing worked for me

Greetings!

Installation problem

It seems as if the function _resolve_version() in ez_setup.py is failing to generate a string required by Python, so all calls to setup.py, including setup.py -h are failing. The error is:

Traceback (most recent call last): File "setup.py", line 24, in <module> ez_setup.use_setuptools() File "/home/jcszamosi/BioinformaticsTools/Unicycler/ez_setup.py", line 150, in use_setuptools version = _resolve_version(version) File "/home/jcszamosi/BioinformaticsTools/Unicycler/ez_setup.py", line 365, in _resolve_version reader = codecs.getreader(charset) File "/usr/lib/python3.4/codecs.py", line 994, in getreader return lookup(encoding).streamreader TypeError: must be str, not None

Thanks!

compilation problems

on MacOS Sierra

Darwin 16.5.0 Darwin Kernel Version 16.5.0: Fri Mar  3 16:52:33 PST 2017; root:xnu-3789.51.2~3/RELEASE_X86_64 x86_64

Platform: Mac
Compiler: unknown 5.4.0
c++ -std=c++14 -Iunicycler/include -fPIC -O3 -DNDEBUG -Wall -Wextra -pedantic -march=native -c -o unicycler/src/consensus_align.o unicycler/src/consensus_align.cpp
/var/folders/j_/6f4g5k1x4jx9tyl_fshyqktr0000gr/T//ccnAuawe.s:1135:no such instruction: `vmovaps %xmm0, -144(%rbp)'
/var/folders/j_/6f4g5k1x4jx9tyl_fshyqktr0000gr/T//ccnAuawe.s:1136:no such instruction: `vmovaps %xmm1, -128(%rbp)'
/var/folders/j_/6f4g5k1x4jx9tyl_fshyqktr0000gr/T//ccnAuawe.s:1137:no such instruction: `vmovaps %xmm2, -112(%rbp)'
/var/folders/j_/6f4g5k1x4jx9tyl_fshyqktr0000gr/T//ccnAuawe.s:1138:no such instruction: `vmovaps %xmm3, -96(%rbp)'
/var/folders/j_/6f4g5k1x4jx9tyl_fshyqktr0000gr/T//ccnAuawe.s:1139:no such instruction: `vmovaps %xmm4, -80(%rbp)'
/var/folders/j_/6f4g5k1x4jx9tyl_fshyqktr0000gr/T//ccnAuawe.s:1140:no such instruction: `vmovaps %xmm5, -64(%rbp)'
/var/folders/j_/6f4g5k1x4jx9tyl_fshyqktr0000gr/T//ccnAuawe.s:1141:no such instruction: `vmovaps %xmm6, -48(%rbp)'
...
make: *** [unicycler/src/consensus_align.o] Error 1

also tried with gcc 4.9.4 and 6.3.0, all installed with MacPorts, same thing

Unpaired reads

I understand that paired end reads are required for unicycler.

Is there any way to relax this constraint and let unicycler accept unpaired end data as well?
(suppose I have unpaired reads from illumina and long reads from nanopore)

Thanks.

Pilon still not being auto-detected

We are using the current HEAD 0.3.0b version.

We have pilon in the PATH which is a bash wrapper for the JAR.

But it still says "unable to find pilon" at the end of a unicycler run.

Is it only in dev branch?

Could you add --memory option for Spades

Hello!
Thank you for this tool!
I have 'out of memory' message while running Spades with a lot of threads (-t 144).
What do you recommend to solve this issue?

The Best regards.

BLAST doesn't like colons in the file name

Hi, Ryan. Any thoughts on the following error? Ah. Wait. I have an idea. BLAST is breaking because there's a colon in the file name. Don't do that. Feel free to close this issue.

❯❯❯ unicycler -t64 -o psitchensiscpmt_8/8003:0-154229.bed.bx.as100.bam.barcodes.bx.unicycler -1 psitchensiscpmt_8/8003:0-154229.bed.bx.as100.bam.barcodes.bx.1.fq.gz -2 psitchensiscpmt_8/8003:0-154229.bed.bx.as100.bam.barcodes.bx.2.fq.gz
…
Rotating completed replicons (2017-08-22 21:28:10)
    Any completed circular contigs (i.e. single contigs which have one link
connecting end to start) can have their start position changed with altering
the sequence. For consistency, Unicycler now searches for a starting gene (dnaA
or repA) in each such contig, and if one is found, the contig is rotated to
start with that gene on the forward strand.


BLAST encountered an error:
BLAST Database error: No alias or index file found for nucleotide database [replicon.fasta] in search path [/projects/btl/sjackman/picea-sitchensis-mitochondrion/psitchensiscpmt_8/8003:0-154229.bed.bx.as100.bam.barcodes.bx.unicycler/blast::/projects/btl/db/blast:]

BLAST encountered an error:
BLAST Database error: No alias or index file found for nucleotide database [replicon.fasta] in search path [/projects/btl/sjackman/picea-sitchensis-mitochondrion/psitchensiscpmt_8/8003:0-154229.bed.bx.as100.bam.barcodes.bx.unicycler/blast::/projects/btl/db/blast:]

Segment   Length    Depth   Starting gene   Position   Strand   Identity   Cover
      1   167,735   6.93x   none found                                          
    581       325   3.23x   none found                                          

Assembly complete (2017-08-22 21:28:12)

Sample links seem to be out

Hi Ryan,

The links on this page : https://github.com/rrwick/Unicycler/tree/master/sample_data for the Helicobacter pylori, Streptococcus pyogenes and Neisseria gonorrhoeae organisms seem to be out of service.

Please, can you restore them ?

Thx.

Long read-only assembly

Just at the start it says:

Unicycler is a assembly pipeline for bacterial genomes. It uses Illumina reads and/or long reads (PacBio or Nanopore) to produce complete and accurate assemblies.

but later on in the manual it becomes obvious that short reads are not optional like long reads.
Is it possible to use Unicycler with only long (corrected Nanopore) reads?

error in pilon

Hi,

I ran Unicycler and I got the following error message:
"Pilon polish round 1
Unable to polish assembly using Pilon: Pilon encountered an error:
Invalid maximum heap size: -Xmx1G -XX:ParallelGCThreads=4 -jar
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit."

I modified the following line in the script: "pilon_func.py":
pilon_command = [args.java_path, '-Xmx1G -XX:ParallelGCThreads=4 ' '-jar', args.pilon_path]

Can you please help me solving that error in running pilon.

Thanks

Outputting of single base "contigs".

Hi Ryan.

I am currently running version Unicycler v0.3.0, and noticed that a number (i.e., 4) contigs were outputted as single-base contigs (i.e., A, T, C, and G). Seems baffling, but I did notice the default for --min_fasta_length is 1. Maybe a better default option might be 200. Not entirely sure that is the best cutoff, but it seems more sensible than 1. Also, out of curiosity, how is it possible to output a single-base "contig" (not sure it should be called a contig either)?

Apologies if this has already been addressed in a newer version.

Cheers.

errors during spades Assembling no outputs

Dear Unicycler author,
I have tried Unicycler tool to assemble my genome (2,5 milion base) with a set of paired Illumina reads and a set of PacBio reads, I have used the docker container to run it.
The run of command failed , I;m reporting below the relative message, Could you help me to fix these errors?
COMMAND LAUNCHED:
c1780c8c[unicycler_input_result]# unicycler -1 RAMS001_dd_S28_L004_R1_001.fastq -2 RAMS001_dd_S28_L004_R2_001.fastq -l R125-G01.1-Reads.fastq -o ./output_dirUnicycler --pilon_path=/usr/local/share/pilon-1.20-0/pilon-1.20.jar

REPORT
Starting Unicycler
Command: /usr/local/bin/unicycler -1 RAMS001_dd_S28_L004_R1_001.fastq -2 RAMS001_dd_S28_L004_R2_001.fastq -l R125-G01.1-Reads.fastq -o ./output_dirUnicycler --pilon_path=/usr/local/share/pilon-1.20-0/pilon-1.20.jar

Making output directory:
/home/unicycler_input_result/output_dirUnicycler

SPAdes read error correction
Command: /usr/local/bin/spades.py -1 /home/unicycler_input_result/RAMS001_dd_S28_L004_R1_001.fastq -2 /home/unicycler_input_result/RAMS001_dd_S28_L004_R2_001.fastq -o /home/unicycler_input_result/output_dirUnicycler/spades_assembly_temp/read_correction --threads 8 --only-error-correction

Corrected reads:
/home/unicycler_input_result/output_dirUnicycler/spades_assembly_temp/corrected_1.fastq.gz
/home/unicycler_input_result/output_dirUnicycler/spades_assembly_temp/corrected_2.fastq.gz
/home/unicycler_input_result/output_dirUnicycler/spades_assembly_temp/corrected_u.fastq.gz

Choosing k-mer range for assembly
Median read length: 151
K-mer range: 27, 47, 63, 77, 89, 99, 107, 115, 121, 127

Conducting SPAdes assemblies
K-mer Segments Dead ends Score
27 too complex
47 too complex
63 325 0 3.08e-03
77 260 0 3.85e-03
89 178 0 5.62e-03
99 157 0 6.37e-03
107 139 0 7.19e-03
115 124 0 8.06e-03
121 115 0 8.70e-03
Traceback (most recent call last):
File "/usr/local/bin/unicycler", line 11, in
load_entry_point('unicycler==0.2.0', 'console_scripts', 'unicycler')()
File "/usr/local/lib/python3.5/site-packages/unicycler/unicycler.py", line 79, in main
args.expected_linear_seqs)
File "/usr/local/lib/python3.5/site-packages/unicycler/spades_func.py", line 152, in get_best_spades_graph
row_extra_text={best_kmer_row: ' \u2190 best'})
File "/usr/local/lib/python3.5/site-packages/unicycler/misc.py", line 628, in print_table
print(indenter + row_str, flush=True)
UnicodeEncodeError: 'ascii' codec can't encode character '\u2190' in position 48: ordinal not in range(128)
c1780c8c[unicycler_input_result]#

thanks in advance
Annalisa

Unpaired data: Problem with `-s` option

Obviously, using unpaired reads is quite suboptimal, but is some cases this is the one data available (albeit at high coverage). When using the -u option without -1 and -2 options Unicycler returns:

unicycler: error: the following arguments are required: -1/--short1, -2/--short2

Is it possible to make Unicycler happy with just the -s version?

Why pilon.jar and not just pilon

Hi Ryan.

Just running a test with 0.2 (great docs, btw!!). We have a script called pilon on the PATH, which calls pilon.jar with specific memory requirements. Is there any reason to run pilon.jar directly? I was looking at the code, and it did not strike me that this was the case.

Cheers.

Corrected long reads?

Hi Ryan.

I suspect it doesn't matter, but I was just asked this: does unicycler use raw or corrected long-reads (say correcting with canu, for instance)? (I also suspect someone else might ask you this, so now it is documented).

The reason I think it does not matter is that it is used only for threading the graph. Some mismatch may occur, and that will be fine, and seems like it can be controlled by the alignment score option. Do you have any tests showing any differences in the output depending on using raw or corrected reads?

Cheers.

No `semi_global_long_read_aligner.py`

Is it now called unicycler_align?

Just noticed that in README.

Keeping an assembly.log file in outputdir

When I run Unicycler, I end up with two files assembly.fasta and assembly,gfa.

The nice progress log and bridge and circularization summaries are lost in my scrollback.

Is there a way to keep it?
(besides saving stdout/stderr with all the ANSI ESC codes)

Should input reads be trimmed before use

Hi!

This looks like a great tool. I am likely to use it on bacterial Illumina reads. Should the reads be trimmed for quality and adapters before use?

Default values for optional parameters

What are the recommended defaults for the following options:

--min_kmer_frac MIN_KMER_FRAC
--max_kmer_frac MAX_KMER_FRAC  
--kmer_count KMER_COUNT 

--start_gene_id START_GENE_ID
--start_gene_cov START_GENE_COV

--min_component_size MIN_COMPONENT_SIZE
--min_dead_end_size MIN_DEAD_END_SIZE

--scores SCORES
--low_score LOW_SCORE
--min_len MIN_LEN
--allowed_overlap ALLOWED_OVERLAP 
--kmer KMER

Error at minimap step Illegal instruction (core dumped)

Hello,

When I perform an hybrid assembly and a long read-only assembly too, an error occurs at minimap and miniasm step.

The error is : llegal instruction (core dumped)

Compilation is ok and all required tools are loaded.

Data used are the Shigella reads available here : https://github.com/rrwick/Unicycler/tree/master/sample_data

Have you any idea about the origin of this issue ?

Thank for your help.

bowtie2 version parsing is incorrect.

Hi Ryan.

When using version 0.3.1, I noticed that bowtie2 was reported as having version 4.9.3. Seemed odd, so I checked.

This is what I get when running bowtie2 --version on barcoo:

/usr/local/bowtie2/2.2.9-gcc/bin/bowtie2-align-s version 2.2.9
64-bit
Built on barcoo-m.vlsci.unimelb.edu.au
Mon Jun 27 10:33:54 AEST 2016
Compiler: gcc version 4.9.3 (GCC)
Options: -O3 -m64 -msse2  -funroll-loops -g3 -DPOPCNT_CAPABILITY
Sizeof {int, long, long long, void*, size_t, off_t}: {4, 8, 8, 8, 8, 8}

It seems unicycler is picking up the gcc version?

This is what is in the unicycler log:

Dependencies:
  Program         Version    Status
  spades.py       3.8.0      good
  makeblastdb     2.2.30+    good
  tblastn         2.2.30+    good
  bowtie2-build   4.9.3      good
  bowtie2         4.9.3      good
  samtools        1.3.1      good
  java            1.8.0_25   good
  pilon           1.21       good

You can replicate by running it on barcoo.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

rrwick / unicycler Goto Github PK

unicycler's Introduction

Table of contents

2022 update

Introduction

Requirements

Installation

Install from source

Build and run without installation

Quick usage

Background

Assembly graphs

Limitations of short reads

SPAdes graphs

Method: Illumina-only assembly

SPAdes assembly

Multiplicity

Overlap removal

Bridging

Method: long-read-only assembly

miniasm assembly

Racon polishing

Method: hybrid assembly

Long-read plus contig assembly

Direct long-read bridging

Bridge application

Finalisation

Conservative, normal and bold

Options and usage

Standard options

Advanced options

Output files

Tips

Running time

Necessary read length

Bad Illumina reads

Very short contigs

Chromosomes and plasmid depth

Known contamination

Manual multiplicity

Manual completion

Using an external long-read assembly

Assemblies with contig overlaps

Acknowledgements

License

unicycler's People

Contributors

Stargazers

Watchers

Forkers

unicycler's Issues

Recommend Projects

Recommend Topics

Recommend Org