Coder Social home page Coder Social logo

galaxy_tool_metadata_extractor's Introduction

Galaxy Tool Metadata Extractor

What is the tool doing?

plot

This tool automatically collects a table of all available Galaxy tools including their metadata. The created table can be filtered to only show the tools relevant for a specific community. Learn how to add your community.

The tools performs the following steps:

  • Parse tool GitHub repository from Planemo monitor listed
  • Check in each repo, their .shed.yaml file and filter for categories, such as metagenomics
  • Extract metadata from the .shed.yaml
  • Extract the requirements in the macros or xml to get version supported in Galaxy
  • Check available against conda version
  • Extract bio.tools information if available in the macros or xml
  • Check available on the 3 main galaxy instances (usegalaxy.eu, usegalaxy.org, usegalaxy.org.au)
  • Get usage statistics form usegalaxy.eu
  • Creates an interactive table for all tools: All tools
  • Creates an interactive table for all registered communities, e.g. microGalaxy

Usage

Prepare environment

  • Install virtualenv (if not already there)

    $ python3 -m pip install --user virtualenv
    
  • Create virtual environment

    $ python3 -m venv env
    
  • Activate virtual environment

    $ source env/bin/activate
    
  • Install requirements

    $ python3 -m pip install -r requirements.txt
    

Extract all tools

  1. Get an API key (personal token) for GitHub

  2. Export the GitHub API key as an environment variable:

    $ export GITHUB_API_KEY=<your GitHub API key>
    
  3. Run the script

    $ python bin/extract_all_tools.sh
    

The script will generate a TSV file with each tool found in the list of GitHub repositories and metadata for these tools:

  1. Galaxy wrapper id
  2. Description
  3. bio.tool id
  4. bio.tool name
  5. bio.tool description
  6. EDAM operation
  7. EDAM topic
  8. Status
  9. Source
  10. ToolShed categories
  11. ToolShed id
  12. Galaxy wrapper owner
  13. Galaxy wrapper source
  14. Galaxy wrapper version
  15. Conda id
  16. Conda version

Filter tools based on their categories in the ToolShed

  1. Run the extraction as explained before

  2. (Optional) Create a text file with ToolShed categories for which tools need to be extracted: 1 ToolShed category per row (example for microbial data analysis)

  3. (Optional) Create a text file with list of tools to exclude: 1 tool id per row (example for microbial data analysis)

  4. (Optional) Create a text file with list of tools to really keep (already reviewed): 1 tool id per row (example for microbial data analysis)

  5. Run the tool extractor script

    $ python bin/extract_galaxy_tools.py \
        --tools <Path to CSV file with all extracted tools> \
        --filtered_tools <Path to output CSV file with filtered tools> \
        [--categories <Path to ToolShed category file>] \
        [--excluded <Path to excluded tool file category file>]\
        [--keep <Path to to-keep tool file category file>]
    

Add your community

In order to add your community you need to:

  • Fork this repository.
  • Add a folder for your community in data/communities.
  • Add at least the file categories.
  • Add all categories that are relevant to initially filter the tools for your community. Possible categories are listed here Galaxy toolshed.
  • Make a pull request to add your community.
  • The workflow will run every sunday, so on the next monday, your community table should be added to results/<your community name>

Development

To make a test run of the tool to check its functionalities follow Usage to set-up the environnement and the API key, then run

bash ./bin/extract_all_tools_test.sh test.list

This runs the tool, but only parses the test repository Galaxy-Tool-Metadata-Extractor-Test-Wrapper

galaxy_tool_metadata_extractor's People

Contributors

supernord avatar paulzierep avatar bebatut avatar nsoranzo avatar j-swang avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.