galaxyproject / tools-devteam Goto Github PK

Contains a set of Galaxy Tools mostly written by the Galaxy Team.

Python 59.25% R 7.24% Shell 0.27% HTML 0.69% Perl 32.55%

usegalaxy tools toolshed

tools-devteam's Introduction

Galaxy Tools maintained by the Galaxy Team

This repo contains a subset of Galaxy tools used in the Tool Shed (https://toolshed.g2.bx.psu.edu/) mostly created by the Galaxy Team (devteam).

The repo's main goals are to:

simplify the maintenance of these tools
provide a learning environment to Galaxy Tool developers

If you want to contribute to this repository, please see file CONTRIBUTING.md.

Wrapping tools for use in Galaxy is easy! If you want to start, please see our wiki.

Other repositories with high quality tools:

tools-devteam's People

Contributors

Stargazers

Watchers

Forkers

jmchilton bgruening nsoranzo aerval natefoo davebx nekrut martenson jj-umn dannon jsh58 scholtalbers blankenberg peterjc erxleben jgoecks lobour matthewralston afgane kaktus42 moskalenko jankanis mvdbeek infb cschu mb12985 sanbi-sa-archive anilthanki bwlang pjbriggs a2f0 nturaga markiskander agoepfert dhammaker gregvonkuster bollig kikikaitlyn christopheh stemcellcommons zipho lparsons biodb abretaud tqh003 medda pvanheus marcoquerque shiltemann dwightkuo dd798922110 dpryan79 olgaerm bobular olgamezhnina cfsan-biostatistics bernt-matthias rabb13 aishwaryaseth ccedmendoza lecorguille tmcjp dfornika jdavcs sbenateau davidchristiany tapan82 rplanel yvanlebras phac-nml simonbray ufresearchcomputing harry-stark cat-bro joe-nano rgcca-factory chabrier almahmoud gnaisha wm75 gaybro8777 hannet91 mtekman 5l1v3r1 gallardoalba supernord engynasr minamehr

tools-devteam's Issues

Unsorted BAM input not parsed by sam_merge ("Merge BAM Files" in UI)

Submitting BWA output directly to this tool, without using SortSAM first, produces an error. There is no warning on the tool for that SortSam is required before use. And there is an option to specify that the inputs are unsorted. One of these should be probably be modified to help users understand how to use the tool.

See other issue for sam_merge for the type of output produced by the error and the tool version (issue #158)

package_bowtie_2_2_4 name is confusing

Packages presently on the ToolShed:

package_bowtie_0_12_7 (devteam)
package_bowtie_1_0_0 (iuc)
package_bowtie2_2_1_0 (devteam)
package_bowtie_2_2_4 (devteam)

I think the latter should be renamed to package_bowtie2_2_2_4 for consistency and also because the package name specified inside tool_dependencies.xml is bowtie2:

<package name="bowtie2" version="2.2.4">

Couple of missing repos

Repo does not exist ../tools-devteam/tools/bwa_0_7_10
Repo does not exist ../tools-devteam/packages/package_bwa_0_7_10_039ea20639

I wonder if we are missing any repo updates from that same window - further research is needed but less be careful when uploading to tool shed.

Creating a custom build using a .len file triggers an error

Description is in this biostars post. Replicated on Main by jen. To replicate as well, just use the example data. Fails for both a browsed/loaded len file or for one that is pasted in.

https://biostar.usegalaxy.org/p/12710/#12740

Cuffcompare Test

 # Cuffcompare v2.2.1 | Command line was:
-#cuffcompare -o cc_output -r /home/gvandeweyer/galaxy-dist/galaxy-dist/database/files/000/dataset_193.dat -e 100 -d 100 ./input1 ./input2
+#cuffcompare -o cc_output -r /tmp/tmp1aKqT7files/000/dataset_3.dat -R -e 100 -d 100 ./input1 ./input2
 #

 #= Summary for dataset: ./input1 :
 #     Query mRNAs :      50 in      50 loci  (0 multi-exon transcripts)
 #            (0 multi-transcript loci, ~1.0 transcripts per locus)
-# Reference mRNAs :       9 in       6 loci  (9 multi-exon)
+# Reference mRNAs :       1 in       1 loci  (1 multi-exon)
 # Super-loci w/ reference transcripts:        1
 #--------------------|   Sn   |  Sp   |  fSn |  fSp  
-        Base level:      0.3     2.3     -       - 
+        Base level:      2.2     2.3     -       - 
         Exon level:      0.0     0.0     0.0     0.0
       Intron level:      0.0    -nan     0.0    -nan
 Intron chain level:      0.0    -nan     0.0    -nan
@@ -18,33 +18,16 @@
      Matching intron chains:       0
               Matching loci:       0

-          Missed exons:      36/37     ( 97.3%)
+          Missed exons:       2/3      ( 66.7%)
            Novel exons:      49/50     ( 98.0%)
-        Missed introns:      32/32     (100.0%)
-           Missed loci:       5/6      ( 83.3%)
+        Missed introns:       2/2      (100.0%)
+           Missed loci:       0/1      (  0.0%)
             Novel loci:      49/50     ( 98.0%)

 #= Summary for dataset: ./input2 :
 #     Query mRNAs :      50 in      50 loci  (0 multi-exon transcripts)
 #            (0 multi-transcript loci, ~1.0 transcripts per locus)
-# Reference mRNAs :       8 in       5 loci  (8 multi-exon)
-# Super-loci w/ reference transcripts:        0
-#--------------------|   Sn   |  Sp   |  fSn |  fSp  
-        Base level:      0.0     0.0     -

cuffmerge/cufflinks wrapper state handling

If there is no cuffmerge or cufflinks binary - these wrappers still reports the dataset as in an 'ok' state - though the peak text makes it clear the tool did not run.

It would probably be best to just do away with the wrapper altogether like cuffquant for instance.

tool_data_table_conf definition for a number of vcflib tools

Is the .sample definition for tool_data_table_conf.xml wrong in a number of vcflib tools, for example

tools-devteam/tool_collections/vcflib/vcfleftalign/tool-data/tool_data_table_conf.xml.sample

Line 5 in d9aeab1

<columns>line_type, value, path</columns>

?
(also for vcfgeno2haplo, vcfcheck, vcfvcfintersect, and vcfprimers tools)

The definition for that index file has had 4 columns vs. the 3 defined in those .sample files (see https://github.com/galaxyproject/usegalaxy-playbook/blob/master/files/galaxy/usegalaxy.org/config/tool_data_table_conf.xml#L143).

Idea - consider creating a tools-tuxedo repository?

I love the migration to github, I love the big repo, and in particular I love that we now have a process for the cufflinks additions and the bowtie fixes! This isn't meant to backtrack on that - but actually move the idea forward even a step more.

If these tools are going to see a lot of activity (seems a reasonable assumption) moving them to their own repository would allow us to use planemo's Travis integration to pre-test pull requests and ensure things stay green before they even hit the test toolshed. (Here is an example of the Travis integration for bwa-mem - results here). It is very much modeled after @peterjc work on this.

There are a lot of downsides - it means decreased visibility for these tools in a way (or would it increase it) - less prestige I guess related to pull requests at least. The other downsides is everything we said about it being nice to grep through all of these tools at once, etc... - it diminishes tools-devteam to take some of the highest profile tools out of it.

Missing dependencies in package_fastx_toolkit_0_0_13 .

fasta_clipping_histogram.pl requires Perl (non-core) modules GD::Graph::bars and PerlIO::gzip.

fastq_quality_boxplot_graph.sh requires gnuplot. This can be satisfied by package_gnuplot_4_6 in galaxyproject_tools-iuc repository.

Both scripts are used by tools in tool_collections/fastx_toolkit/.

HISAT paired data collection error

Hi,

I get the following errror using HISAT with a paired data collection.

Traceback (most recent call last):
  File "/usr/local/galaxy/galaxy-dist/lib/galaxy/jobs/runners/__init__.py", line 163, in prepare_job
    job_wrapper.prepare()
  File "/usr/local/galaxy/galaxy-dist/lib/galaxy/jobs/__init__.py", line 858, in prepare
    self.command_line, self.extra_filenames = tool_evaluator.build()
  File "/usr/local/galaxy/galaxy-dist/lib/galaxy/tools/evaluation.py", line 411, in build
    self.__build_command_line( )
  File "/usr/local/galaxy/galaxy-dist/lib/galaxy/tools/evaluation.py", line 427, in __build_command_line
    command_line = fill_template( command, context=param_dict )
  File "/usr/local/galaxy/galaxy-dist/lib/galaxy/util/template.py", line 9, in fill_template
    return str( Template( source=template_text, searchList=[context] ) )
  File "/usr/local/galaxy/galaxy-dist/eggs/Cheetah-2.2.2-py2.7-linux-x86_64-ucs2.egg/Cheetah/Template.py", line 1004, in __str__
    return getattr(self, mainMethName)()
  File "cheetah_DynamicallyCompiledCheetahTemplate_1434276434_44_12951.py", line 115, in respond
  File "/usr/local/galaxy/galaxy-dist/lib/galaxy/tools/wrappers.py", line 364, in __getattr__
    return self.__element_instances[ key ]
  File "/usr/local/galaxy/galaxy_env/lib/python2.7/UserDict.py", line 23, in __getitem__
    raise KeyError(key)
KeyError: 'forward'

weblogo3 tool missing dependency on numpy

The weblogo3 tool was failing for us, with the rgWebLogo.py wrapper reporting that weblogo was returning exit code 1. At the command line I duplicated the job environment and dependency shell commands and was able to reproduce the error:

## rgWebLogo3.py error - executing weblogo -F jpeg -c auto -o /home/galaxy/tmp/logo.jpg -U bits -t "test" -f /home/galaxy/tmp/test.fasta -s medium returned error code 1
## This may be a data problem or a tool dependency (weblogo) installation problem
## Please ensure weblogo is correctly installed and working on the command line -see http://code.google.com/p/weblogo

I then tried running the above weblogo command directly:

Traceback (most recent call last):
  File "/mnt/scdata/scdata_03/galaxy/containers/galaxy-qa/tool_deps/weblogo/3.3/devteam/package_weblogo_3_3/648e4b32f15c/weblogo", line 56, in <module>
    import weblogolib._cli
  File "/mnt/scdata/scdata_03/galaxy/containers/galaxy-qa/export/tool_deps/weblogo/3.3/devteam/package_weblogo_3_3/648e4b32f15c/weblogolib/__init__.py", line 108, in <module>
    from numpy import array, asarray, float64, ones, zeros, int32,all,any, shape
ImportError: No module named numpy

As a quick fix, I added a package dependency requirement for numpy to rgWebLogo3.xml and reloaded the tool in Galaxy, after which the tool worked properly.

picard_SortSam and sam_sort appear to be performing the same operation

Duplication is desired? This may be confusing for end users. Both samtools and picard are nearly always installed in locals as the first go-to tool packages.

Improved Read Group Handling

BWA/BWA-mem should have an option to use collection information to assign read groups.
Bowtie2 should also have an option to use collection information to assign read groups.
Tophat wrapper should be updated also.
Picard AddOrReplaceReadGroups should have an option to use collection information to assign read groups.
There should be a tool to merge bams in such a way that collection information (sample name) is attached as read groups. (MergeSamFiles? MergeBamAlignment?)

freebayes installs dependency binary in wrong location

The tool_dependencies.xml file for freebayes installs the freebayes binary into the INSTALL_DIR, however, the PATH is set to INSTALL_DIR/bin.

<destination_directory>$INSTALL_DIR</destination_directory>
...
<environment_variable name="PATH" action="prepend_to">$INSTALL_DIR/bin</environment_variable>

weblogo3 tool fails when "Text" output format is selected

The weblogo3 tool is failing with the following generic error on Galaxy Main (as well as our local instance) when the "Text" output format is selected:

## rgWebLogo3.py error - executing weblogo -F txt -c auto -o /galaxy-repl/main/files/011/411/dataset_11411895.dat -U bits -t "Galaxy-Rgenetics Sequence Logo" -f /galaxy-repl/main/files/011/411/dataset_11411891.dat -s large returned error code 2
## This may be a data problem or a tool dependency (weblogo) installation problem
## Please ensure weblogo is correctly installed and working on the command line -see http://code.google.com/p/weblogo

On our instance I replicated the job environment and ran weblogo by hand:

$ weblogo -F txt -c auto -o output.dat -U bits -t "Galaxy-Rgenetics Sequence Logo" -f dataset_8485.dat -s large
Usage: weblogo [options]  < sequence_data.fa > sequence_logo.eps

weblogo: error: option -F: invalid choice: 'txt' (choose from 'svg', 'eps', 'jpeg', 'pdf', 'png_print', 'logodata', 'png')

It looks like the option tag for text output in rgWebLogo3.xml should have the value attribute logodata rather than txt.

As an aside, debugging issues with this tool (see also #65) is made more difficult by the omission of weblogo's stdout/stderr in the error message. That output is captured and stored, but is not included in the error message before the early call to sys.exit in the rgWebLogo3.runCL function.

picard_BamIndexStats.xml and rgPicardHsMetrics.xml missing

During the complete rewrite of Picard wrappers in pull request #34, picard_BamIndexStats.xml and rgPicardHsMetrics.xml tools were removed. Since both are still useful, they should be re-added.

Error parsing DT tag sam_merge ("Merge Bam Files" in UI)

When using the date tag (DT) in the format specified on the BWA tool form for read group info, this tool produces an error dataset. Clicking on the bug icon contains this error. Similar to error for MergeSAMFiles (issue #157). Related to other issues with this same tool (not recognizing unsorted input even though the checkbox specifying unsorted input is used) - in issue #159)

Fatal error: Matched on error: [Mon Jun 15 23:55:53 UTC 2015] net.sf.picard.sam.MergeSamFiles INPUT=[/mnt/galaxy/files/000/dataset_216.dat, /mnt/galaxy/files/000/dataset_217.dat] OUTPUT=/mnt/galaxy/files/000/dataset_218.dat MERGE_SEQUENCE_DICTIONARIES=true TMP_DIR=[/mnt/galaxy/tmp] VALIDATION_STRINGENCY=LENIENT SORT_ORDER=coordinate ASSUME_SORTED=false USE_THREADING=false VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false [Mon Jun 15 23:55:53 UTC 2015] Executing as galaxy@w3 on Linux 3.13.0-44-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_80-b15 Ignoring SAM validation error: WARNING: File /mnt/galaxy/files/000/dataset_216.dat, Error parsing SAM header. DT tag value '$rg.DT' is not parseable as a date. Line: @RG ID:one SM:one PL:ILLUMINA LB:$rg.LB CN:$rg.CN DS:$rg.DS DT:$rg.DT PU:$rg.PU Ignoring SAM validation error: ERROR: File /mnt/galaxy/files/000/dataset_216.dat, Error parsing SAM header. Problem parsing @PG key:value pair ID:one clashes with ID:bwa. Line: @PG ID:bwa PN:bwa VN:0.7.10-r876-dirty CL:bwa sampe -r @RG ID:one SM:one PL:ILLUMINA LB:$rg.LB CN:$rg.CN DS:$rg.DS DT:$rg.DT PU:$rg.PU /mnt/galaxyIndices/hg19/bwa_mem_index/hg19/hg19.fa first.sai second.sai /mnt/galaxy/files/000/dataset_113.dat /mnt/galaxy/files/000/dataset_114.dat Ignoring SAM validation error: WARNING: File /mnt/galaxy/files/000/dataset_217.dat, Error parsing SAM header. DT tag value '$rg.DT' is not parseable as a date. Line: @RG ID:two SM:two PL:ILLUMINA LB:$rg.LB CN:$rg.CN DS:$rg.DS DT:$rg.DT PU:$rg.PU Ignoring SAM validation error: ERROR: File /mnt/galaxy/files/000/dataset_217.dat, Error parsing SAM header. Problem parsing @PG key:value pair ID:two clashes with ID:bwa. Line: @PG ID:bwa PN:bwa VN:0.7.10-r876-dirty CL:bwa sampe -r @RG ID:two SM:two PL:ILLUMINA LB:$rg.LB CN:$rg.CN DS:$rg.DS DT:$rg.DT PU:$rg.PU /mnt/galaxyIndices/hg19/bwa_mem_index/hg19/hg19.fa first.sai second.sai /mnt/galaxy/files/000/dataset_115.dat /mnt/galaxy/files/000/dataset_116.dat INFO 2015-06-15 23:55:53 MergeSamFiles Input files are in same order as output so sorting to temp directory is not needed. INFO 2015-06-15 23:55:57 MergeSamFiles Finished reading inputs. [Mon Jun 15 23:55:57 UTC 2015] net.sf.picard.sam.MergeSamFiles done. Elapsed time: 0.06 minutes. Runtime.totalMemory()=236453888

Galaxy Tool Version 1.2.0

cuffmerge wrapper multi-select doesn't work with built-in genomes

We run our own custom galaxy instance in the Amazon Cloud and I am updating our instance with the latest toolshed files. For cuffmerge, in past revisions, we would just have a button to add another GTF input. However, this latest one (10:b6e3849293b1) that has been replaced with a multi-select. We also have a few built-in genomes that we use with cuffmerge. When I choose the first GTF file (using locally cached sequence data) it will populate the reference genome correctly with our built-in genomes. However, once I choose more than one GTF file, I get one of two errors:

No reference genome is available for the build associated with the selected input dataset
OR
An invalid option was selected, please verify

The built-in genomes are definitely there and available for cuffmerge, and each of the GTF files have the correct database build associated with them, but for some reason choosing more than one GTF file in the multi-select causes an error. Is this a bug or am I doing something wrong?

Nik.

Move samtool_filter2

Should we move samtool_filter2 to the other samtools wrapper? Would be worth to update it and stick it into a samtools suite. Comments? @nekrut ?

Make Tools Collection-Aware

Most tools do not require modifications - but tools that should consume pairs of datatsets should probably be updated to allow paired inputs via a <data_collection collection_type="paired"... parameter and tools that do reductions over datasets should be reworked to use multiple="true" data inputs or augmented to allow data_collection parameters.

Tools Consuming Pairs

tophat2
bowtie2
bwa (maybe...)
fastq_paired_end_deinterlacer, fastq_paired_end_interlacer, fastq_paired_end_joiner, fastq_paired_end_splitter.

Dataset Reduction Tools

cuff*

Dependency package needed for TwoBit Data Manager

Currently listed as requiring 'ucsc_tools', needs to have TS installable package that provides the 'faToTwoBit' executable.

Error parsing DT tag with picard_sortsam

Related to #157 and #158. Most likely exists in other tools from this revision/package.

Green "success" dataset with this in the info field:

Picked up _JAVA_OPTIONS: -Xmx2048m -Xms256m Ignoring SAM validation error: WARNING: Error parsing SAM header. DT tag value '$rg.DT' is not parseable as a date. Line: @RG ID:one PL:ILLUMINA PU:$rg.PU LB:$rg.LB DS:$rg.DS DT:$rg.DT SM:one CN:$rg.CN Ignoring

${static_path} should not be used?

For example, bamtools filter uses images in the documentation but the url is not properly constructed:

  <img alt="/repository/static/images/59bc5dbe6dfc6d4c//static%2Fimages%2Fsingle-filter.png" src="/repository/static/images/59bc5dbe6dfc6d4c//static%2Fimages%2Fsingle-filter.png">

Removing static%2Fimages%2F shows the image properly. Also when installed this static%2Fimages%2F should be removed to be able to show the image.

Removing ${static_path} from the .xml solves the issue. Also, the documentation has no mentioning of this and the way stated in the documentation just works (https://wiki.galaxyproject.org/DefiningImagesInToolConfigs).
.. image:: single-filter.png

Note: static_enabled = False/True does not seem to have any effect on this

Snyc COMMENT tags between picard tools

Sync the behaviour of picard_FastqToSam.xml, picard_MarkDuplicatesWithMateCigar.xml and picard_AddCommentsToBam.xml, picard_MarkDuplicates.xml and picard_MergeSamFiles.xml in COMMENT handling.

See: #149

fastq groomer does not parse sequence ids that contain spaces properly

The issue is that the FastqReader class parses the entire "@" line as a sequence id. The specification actually allows for an "optional" description, similar to FASTA which comes after a space. Some files I have seen (such as those from the iMicrobe Project) contain lines like:

@<READID> <DESCRIPTION>
ATCGGTTTCGTTGTGTTATTCGCGGCCAAGGGTTTTGTCGTCGTTATATT
+<READID>
^__ec\cccgce]`J[R`[[`ee][ccecag__aafU\baT_WLY^\Xac

These result in the following error:

Exception: Invalid FASTQ file: could not find quality score of sequence identifier @<READID> <DESCRIPTION>.

I believe this would be a simple fix, however, I'm not actually sure where the code for those utility classes is hosted.

Add citations to all the tools.

Issues encountered with latest updates on test.galaxyproject.org.

Text cannot be copied from the updated tool forms. - Aysam is aware and will fix.
Updating metadata with "auto-detect" for previously failed settings now functions.
Collect Summary Metrics' is functioning on test now.
Picard's Sort SAM returns a green success dataset, but seems to be interpreting input data incorrectly on 'BWA for Illumina' output and the output is reject by Picard as property formatted input.
metadata orange alert" staying persistent in the UI is cleared up
On Test, if a dataset is an input to a "grey - waiting to run job" - you cannot modify the metadata. Since the job hasn't started yet .. and will not until the metadata issue is cleared, I expected a "blue - paused job" instead. And to be able to fix the problem through auto-detect metadata correction.
Workflow import problems = The worflow import problems are expected as some tools changed their ids. :(
BWA for Illumina' fails if the read group settings do not use a fully capitalized "ILLUMINA". This seems tedious. Can we change it so that the info is case-insensitive?

samtools_phase tests fail

when I run planemo test (using samtools 1.2) on this tool https://github.com/galaxyproject/tools-devteam/tree/master/tool_collections/samtools/samtools_phase

I get this test error report: www.googledrive.com/host/0BycJJtbagAMTd2V2UzVURDRRQUE

ping @nekrut

samtools_stats tests fail

planemo run test report:
www.googledrive.com/host/0BycJJtbagAMTQThZdlI2VXAySm8

ping @nekrut

package_r_2_15_0: libreadline.so.6 error

When tools (e.g. MACS14) attempt to use R from this package, I get an error regarding libreadline.so.6. This is on a CentOS5 system. I can reproduce the error attempting to run R from the command line. Perhaps the LDFLAGS weren't set correctly when the binaries were built?

> source ./tool-dependencies/R/2.15.0/devteam/package_r_2_15_0/6c34eaa82fed/env.sh
> R
> R: error while loading shared libraries: libreadline.so.6: cannot open shared object file: No such file or directory

vcftools dependencies: tabix and bgzip

It seems like the tools in the vcftools collection require both bgzip and tabix (see e.g. https://github.com/galaxyproject/tools-devteam/blob/master/tool_collections/vcftools/vcftools_subset/vcftools_subset.xml). However, they are not installed as a dependency from the ToolShed. Are these tools that should be installed on the galaxy user's path, or is this simply a dependency oversight?

clustalw tool outputs in 2 unknown data format

In file https://github.com/galaxyproject/tools-devteam/blob/master/tools/clustalw/rgClustalw.xml#L68 clustal and phylip are used as possible output formats, but are not defined either in Galaxy or in this tool repository.

bowtie2 index data manager fails if index already exists

I believe the issue is caused if the genome index is already built or it handles multiple fasta files during index building.

During multiple fasta file index building, if the basename of the index files is the same it fails. ( i'm guessing this should fail). But if the index already exists on the galaxy instance, it should just tell us that it exists and not build the index again..

The error below is because of same index being built again, or simultaneously

Error Message:

Fatal error: Exit code 1 () Error: could not open /mnt/transient_nfs/tmp/job_working_directory/000/58/dataset_58_files/ce10.fa Error: Encountered internal Bowtie 2 exception (#1) Command: bowtie2-build --wrapper basic-0 /mnt/transient_nfs/tmp/job_working_directory/000/58/dataset_58_files/ce10.fa ce10,ce10,ce10,ce10 Error building index.

Job Command line:

python /mnt/galaxy/shed_tools/toolshed.g2.bx.psu.edu/repos/devteam/data_manager_bowtie2_index_builder/9458af1ab90b/data_manager_bowtie2_index_builder/data_manager/bowtie2_index_builder.py "/mnt/galaxy/files/000/dataset_58.dat" --fasta_filename "/mnt/galaxyIndices/ce10/seq/ce10.fa,/mnt/galaxyIndices/ce10/seq/ce10.fa,/mnt/galaxyIndices/ce10/seq/ce10.fa,/mnt/galaxyIndices/ce10/seq/ce10.fa" --fasta_dbkey "ce10,ce10,ce10,ce10" --fasta_description "C. elegans Oct. 2010 (WS220/ce10) (ce10),C. elegans Oct. 2010 (WS220/ce10) (ce10),C. elegans Oct. 2010 (WS220/ce10) (ce10),C. elegans Oct. 2010 (WS220/ce10) (ce10)" --data_table_name "bowtie2_indexes"

Bowtie2: Mapping paired-dataset collections won't work...

... if also collecting unaligned reads.

bgruening/galaxytools@982412e

Bowtie2 precompiled dependency not executable after toolshed install

After installing the latest bowtie2 wrapper from the toolshed, I get a "permission denied" error when trying to run the tool. I believe this is due to the fact that the bowtie2 executable does not have the executable bit set:

-rw-r--r-- 1 galaxy galaxy 17969 Dec 16 16:30 tool-dependencies/bowtie2/2.2.4/devteam/package_bowtie_2_2_4/2b25b6e8d108/bowtie2

Error parsing DT tag picard_MergeSamFiles.xml

When using the date tag (DT) in the format specified on the BWA tool form for read group info, this tool produces a successful dataset, but in the comments this appears. Similar to error for Merge BAM Files.

Picked up _JAVA_OPTIONS: -Xmx2048m -Xms256m Ignoring SAM validation error: WARNING: File /mnt/galaxy/files/000/dataset_216.dat, Error parsing SAM header. DT tag value '$rg.DT' is not parseable as a date. Line: @RG ID:one SM:one PL:ILLUMINA LB:$rg.LB CN:

Galaxy Tool Version 1.126.0

Linting Issues

Repository names must contain only lower-case letters, numbers and underscore (tools/cummeRbund)
all kraken repos use a readme.md, which is not supported.

Tools are missing owned by devteam in the Test Tool Shed

cufflinks gffread tool

Minnesota researchers want a galaxy tool for the cufflinks gffread command. The tool is available currently from: https://testtoolshed.g2.bx.psu.edu/view/jjohnson/gffread

Would there be any interest in adding this in the devteam all_cufflinks_tool_suite?
Or the IUC tools?

BWA-MEM is not assigning read groups

reported by @jennaj

BWA harmonization

Make greater use of macros to further harmonize BWA and BWA-mem wrappers - including read group handling (see comments on #52).

Bowtie2 precompiled dependency download 403 error

403 error when attempting to download: https://depot.galaxyproject.org/package/linux/x86_64/bowtie2/bowtie2-2.2.4-linux-x86_64.zip

Fill out README.

FASTQC rewrite

FastQC needs a small rewrite to get rid of the python file, which is not needed imho.

BWA: stat: invalid option -- 's'

On bwa and bwa-mem the error at the end of this ticket appears in the stderr. Process seems to finish correctly.

Galaxy Tool ID: toolshed.g2.bx.psu.edu/repos/devteam/bwa/bwa/0.1
Revision: c71dd035971e

Job Command-Line:

  ln -s "/g/galaxy/galaxy-dev_data/files/009/dataset_9059.dat" "localref.fa" && ( size=`stat -c %s "localref.fa" 2>/dev/null`; if [ $? -eq 0 ]; then if [ $size -lt 2000000000 ]; then bwa index -a is "localref.fa"; else bwa index -a bwtsw "localref.fa"; fi; fi; eval $(stat -s "localref.fa"); if [ $? -eq 0 ]; then if [ $st_size -lt 2000000000 ]; then bwa index -a is "localref.fa"; echo "Generating BWA index with is algorithm"; else bwa index -a bwtsw "localref.fa"; echo "Generating BWA index with bwtsw algorithm"; fi; fi; ) && bwa aln -t "${GALAXY_SLOTS:-1}" -b -0 "localref.fa" "/g/galaxy/galaxy-dev_data/files/008/dataset_8792.dat" > first.sai && bwa samse "localref.fa" first.sai "/g/galaxy/galaxy-dev_data/files/008/dataset_8792.dat" | samtools view -Sb - > temporary_bam_file.bam && samtools sort -f temporary_bam_file.bam /g/galaxy/galaxy-dev_data/files/009/dataset_9061.dat

Input Parameter: Value Note for rerun  
Load reference genome from  history     
Use the following dataset as the reference sequence     1: UCSC Main on Mouse: refGene (chr12:1-121257530)  
Select input type: single_bam   
Select BAM dataset: 2: Downsample_KO    
Set advanced single end options?: do_not_set    
Set readgroups information?: do_not_set     
Select analysis mode: illumina

_Error_:

[bwa_index] Pack FASTA... 0.58 sec
[bwa_index] Construct BWT for the packed sequence...
[bwa_index] 23.55 seconds elapse.
[bwa_index] Update BWT... 0.40 sec
[bwa_index] Pack forward-only FASTA... 0.34 sec
[bwa_index] Construct SA from BWT and Occ... 11.21 sec
[main] Version: 0.7.10-r876-dirty
[main] CMD: bwa index -a is localref.fa
[main] Real time: 38.455 sec; CPU: 36.091 sec
stat: invalid option -- 's'
Try `stat --help' for more information.
/g/galaxy/galaxy-dev_data/job_working_directory/005/5943/galaxy_5943.sh: line 35: [: -lt: unary operator expected
[bwa_index] Pack FASTA... 0.54 sec
[bwa_index] Construct BWT for the packed sequence...
[BWTIncCreate] textLength=127596972, availableWord=20977816
[BWTIncConstructFromPacked] 10 iterations done. 34603372 characters processed.
[BWTIncConstructFromPacked] 20 iterations done. 63925708 characters processed.
[BWTIncConstructFromPacked] 30 iterations done. 89983420 characters processed.
[BWTIncConstructFromPacked] 40 iterations done. 113139548 characters processed.
[bwa_index] 56.18 seconds elapse.
[bwa_index] Update BWT... 0.38 sec
[bwa_index] Pack forward-only FASTA... 0.34 sec
[bwa_index] Construct SA from BWT and Occ... 11.34 sec
[main] Version: 0.7.10-r876-dirty
[main] CMD: bwa index -a bwtsw localref.fa
[main] Real time: 70.975 sec; CPU: 68.787 sec
[W::sam_hdr_parse] duplicated sequence 'mm9_refGene_NM_011899'
[W::sam_hdr_parse] duplicated sequence 'mm9_refGene_NM_001100116'
[W::sam_hdr_parse] duplicated sequence 'mm9_refGene_NM_028527'
[W::sam_hdr_parse] duplicated sequence 'mm9_refGene_NM_001177573'
[W::sam_hdr_parse] duplicated sequence 'mm9_refGene_NM_001166062'
[W::sam_hdr_parse] duplicated sequence 'mm9_refGene_NM_178657'
[main] Version: 0.7.10-r876-dirty
[main] CMD: bwa mem -t 1 -v 1 localref.fa /g/galaxy/galaxy-dev_data/files/009/dataset_9065.dat
[main] Real time: 47.632 sec; CPU: 39.081 sec

MACS2 Download.

Download url on MACS2 wrapper seems invalid, this maybe?

https://github.com/modENCODE-DCC/Galaxy/raw/master/modENCODE_DCC_tools/macs2/2.0.10.2.tar

Nevermind - its not ours, sorry.

Cufflinks 2.1.1 download location has moved

Unable to install Cufflinks 2.1.1 from package_cufflinks_2_1_1.

Error downloading from URL
http://cufflinks.cbcb.umd.edu/downloads/cufflinks-2.1.1.tar.gz:
HTTP Error 404: Not Found

Looks like the new URL should be http://cole-trapnell-lab.github.io/cufflinks/assets/downloads/cufflinks-2.1.1.tar.gz

cummeRbund is missing .shed.yml file

There is no .shed.yml file in tools/cummeRbund/. I would add one, but I'm not sure if we want to (and can) reuse jjohnson's Tool Shed repository.

No change when using option "unsorted" as sort order with picard_sortsam

The purpose of this option "unsorted" is unclear. From what I can tell, the dataset is unchanged. Could we clarify what the purpose of the option is in the tool form? Is it a needed option but not functional yet?

Also, samtools_sort does the same operations (without the "unsorted" option) and succeeds with both coordinate and read name options. Users will probably install both picard and samtools as complete packages. Do we need both of these tools? Or should one be retained to avoid user confusion?

Galaxy Tool Version 1.126.0

picard_MergeSAMFiles and sam_merge perform the same function

Duplicated tools. Do we need both?

See related duplicated tool issue and comments: #162