Coder Social home page Coder Social logo

Comments (11)

hannaml avatar hannaml commented on May 29, 2024

Unfortunately, I don't think it's going to be realistic to provide an automated install of gatk, as its download requires registration. We might talk to the gatk-people to see if they have any systems set up for this sort of situation, but me feeling is that there won't be a good solution given licensing differences.

I think the most realistic thing to do here is check if gatk is already in the user path, and if not either:

  1. require it be added to the path or
  2. require its path be specified in a yaml config file or the like.
    As a first draft, I would be comfortable just assuming it is already in the path/hardcoding a path into the tool file. However, having an actual config file is something we plan on having eventually anyway, so why not now?

Thoughts? In particular, what do you think would work best for travis integration?

from viral-ngs.

dpark01 avatar dpark01 commented on May 29, 2024

Yeah this is also an issue with Novoalign. So I think there's two things to figure out: one is a config file reader and another is how to get Travis to use some predownloaded tools that live separately from the github repo.. Or maybe we just skip those tests in Travis..

Daniel J. Park, PhD
Computational Biologist, Infectious Disease
Broad Institute of MIT and Harvard
Tel: +1-617-714-8526
[email protected]
http://www.broadinstitute.org/bios/daniel-park

On Oct 30, 2014, at 2:40 PM, Hanna [email protected] wrote:

Unfortunately, I don't think it's going to be realistic to provide an automated install of gatk, as its download requires registration. We might talk to the gatk-people to see if they have any systems set up for this sort of situation, but me feeling is that there won't be a good solution given licensing differences.

I think the most realistic thing to do here is check if gatk is already in the user path, and if not either:

  1. require it be added to the path or
  2. require its path be specified in a yaml config file or the like.

As a first draft, I would be comfortable just assuming it is already in the path/hardcoding a path into the tool file. However, having an actual config file is something we plan on having eventually anyway, so why not now?

Thoughts? In particular, what do you think would work best for travis integration?


Reply to this email directly or view it on GitHub.

from viral-ngs.

hannaml avatar hannaml commented on May 29, 2024

Ok maybe I lied:
https://github.com/broadgsa/gatk

On Thu, Oct 30, 2014 at 2:57 PM, Daniel Park [email protected]
wrote:

Yeah this is also an issue with Novoalign. So I think there's two things
to figure out: one is a config file reader and another is how to get Travis
to use some predownloaded tools that live separately from the github repo..
Or maybe we just skip those tests in Travis..

Daniel J. Park, PhD
Computational Biologist, Infectious Disease
Broad Institute of MIT and Harvard
Tel: +1-617-714-8526
[email protected]
http://www.broadinstitute.org/bios/daniel-park

On Oct 30, 2014, at 2:40 PM, Hanna [email protected] wrote:

Unfortunately, I don't think it's going to be realistic to provide an
automated install of gatk, as its download requires registration. We might
talk to the gatk-people to see if they have any systems set up for this
sort of situation, but me feeling is that there won't be a good solution
given licensing differences.

I think the most realistic thing to do here is check if gatk is already
in the user path, and if not either:

  1. require it be added to the path or
  2. require its path be specified in a yaml config file or the like.

As a first draft, I would be comfortable just assuming it is already in
the path/hardcoding a path into the tool file. However, having an actual
config file is something we plan on having eventually anyway, so why not
now?

Thoughts? In particular, what do you think would work best for travis
integration?


Reply to this email directly or view it on GitHub.


Reply to this email directly or view it on GitHub
#11 (comment)
.

from viral-ngs.

hannaml avatar hannaml commented on May 29, 2024

Also:
https://github.com/broadgsa/gatk-protected

On Thu, Oct 30, 2014 at 4:23 PM, Hanna Mendes Levitin <
[email protected]> wrote:

Ok maybe I lied:
https://github.com/broadgsa/gatk

On Thu, Oct 30, 2014 at 2:57 PM, Daniel Park [email protected]
wrote:

Yeah this is also an issue with Novoalign. So I think there's two things
to figure out: one is a config file reader and another is how to get Travis
to use some predownloaded tools that live separately from the github repo..
Or maybe we just skip those tests in Travis..

Daniel J. Park, PhD
Computational Biologist, Infectious Disease
Broad Institute of MIT and Harvard
Tel: +1-617-714-8526
[email protected]
http://www.broadinstitute.org/bios/daniel-park

On Oct 30, 2014, at 2:40 PM, Hanna [email protected] wrote:

Unfortunately, I don't think it's going to be realistic to provide an
automated install of gatk, as its download requires registration. We might
talk to the gatk-people to see if they have any systems set up for this
sort of situation, but me feeling is that there won't be a good solution
given licensing differences.

I think the most realistic thing to do here is check if gatk is already
in the user path, and if not either:

  1. require it be added to the path or
  2. require its path be specified in a yaml config file or the like.

As a first draft, I would be comfortable just assuming it is already in
the path/hardcoding a path into the tool file. However, having an actual
config file is something we plan on having eventually anyway, so why not
now?

Thoughts? In particular, what do you think would work best for travis
integration?


Reply to this email directly or view it on GitHub.


Reply to this email directly or view it on GitHub
#11 (comment)
.

from viral-ngs.

hannaml avatar hannaml commented on May 29, 2024

Ok so this still requires discussion, because to download gatk from the links posted we need to compile it from source, which requires maven and Java 1.7 and takes 4-5 minutes assuming you already have both. (tested this with oracle-jdk 1.7 and 1.8, + maven 3.2.3)

I think it's a lot easier to ask users to download pre-compiled gatk and write a path in a config.

Regarding testing: if there isn't a good way to give travis a pre-compiled version, the travis server already has maven and several java options installed, so we could compile from source in the .travis.yaml.
Not super-thrilled about the prospect of tests taking an extra 5 minutes, but it's better than no tests.

from viral-ngs.

dpark01 avatar dpark01 commented on May 29, 2024

Yeah.. so this is kind of why I set up that multiple-installers-per-tool idea, because of the thought that Travis might always have to go for the worst case (slowest) mechanism.. you'd do the compiling from source in your installer code in tools/ but you'd set up .travis.yml to include the java and maven versions that you want.

Regarding config files, I'd imagine that python probably has a few libraries built-in that can parse a few config formats easily, would you be able to look into that? Then actually maybe we can set up every tool to start with a generic install method that checks the config file for an entry for this particular tool, then checks if that binary actually exists at that path, and then uses that if it exists. Failing that, download and compile from source.

Regarding Travis, it looks like Travis can be made to pull dependencies from private repos or cache a compiled dependency, but from quick reading online, these seem to require switching it to a private setup. I'm not even sure if that's possible with a public repo project, and I'd also have to check if the Broad pays for private Travis accounts (public ones are free). Some more research to do here... We might need to do this anyway if we want any unit tests around Novoalign...

from viral-ngs.

iljungr avatar iljungr commented on May 29, 2024

In cases where we can download and compile from source, I don't see why we need to use the config file. In DownloadPackage.attempt_install it already checks verify_install, which checks if there is already something at the executable path. So on our machines the compile and build will only happen once -- not a big deal. I think the performance problem is only an issue for Travis, where it is hard to solve.

On Oct 31, 2014, at 10:41 AM, Daniel Park [email protected] wrote:

Yeah.. so this is kind of why I set up that multiple-installers-per-tool idea, because of the thought that Travis might always have to go for the worst case (slowest) mechanism.. you'd do the compiling from source in your installer code in tools/ but you'd set up .travis.yml to include the java and maven versions that you want.

Regarding config files, I'd imagine that python probably has a few libraries built-in that can parse a few config formats easily, would you be able to look into that? Then actually maybe we can set up every tool to start with a generic install method that checks the config file for an entry for this particular tool, then checks if that binary actually exists at that path, and then uses that if it exists. Failing that, download and compile from source.

Regarding Travis, it looks like Travis can be made to pull dependencies from private repos or cache a compiled dependency, but from quick reading online, these seem to require switching it to a private setup. I'm not even sure if that's possible with a public repo project, and I'd also have to check if the Broad pays for private Travis accounts (public ones are free). Some more research to do here... We might need to do this anyway if we want any unit tests around Novoalign...


Reply to this email directly or view it on GitHub.

from viral-ngs.

hannaml avatar hannaml commented on May 29, 2024

The issue is that compiling gatk from source here has two very large,
potentially non-native dependencies: Java 1.7 and Maven. If there is no
config, users would have to either put a precompiled gatk in their path or
setup Java 1.7 and maven and put both in their path. Neither seems
particularly user-friendly.

On Fri, Oct 31, 2014 at 11:19 AM, iljungr [email protected] wrote:

In cases where we can download and compile from source, I don't see why we
need to use the config file. In DownloadPackage.attempt_install it already
checks verify_install, which checks if there is already something at the
executable path. So on our machines the compile and build will only happen
once -- not a big deal. I think the performance problem is only an issue
for Travis, where it is hard to solve.

On Oct 31, 2014, at 10:41 AM, Daniel Park [email protected]
wrote:

Yeah.. so this is kind of why I set up that multiple-installers-per-tool
idea, because of the thought that Travis might always have to go for the
worst case (slowest) mechanism.. you'd do the compiling from source in your
installer code in tools/ but you'd set up .travis.yml to include the java
and maven versions that you want.

Regarding config files, I'd imagine that python probably has a few
libraries built-in that can parse a few config formats easily, would you be
able to look into that? Then actually maybe we can set up every tool to
start with a generic install method that checks the config file for an
entry for this particular tool, then checks if that binary actually exists
at that path, and then uses that if it exists. Failing that, download and
compile from source.

Regarding Travis, it looks like Travis can be made to pull dependencies
from private repos or cache a compiled dependency, but from quick reading
online, these seem to require switching it to a private setup. I'm not even
sure if that's possible with a public repo project, and I'd also have to
check if the Broad pays for private Travis accounts (public ones are free).
Some more research to do here... We might need to do this anyway if we want
any unit tests around Novoalign...


Reply to this email directly or view it on GitHub.


Reply to this email directly or view it on GitHub
#11 (comment)
.

from viral-ngs.

dpark01 avatar dpark01 commented on May 29, 2024

Ugh, right, that's the real issue. Java 1.7 and Maven are not hard to get going in Travis. But for general users.. well Java 1.7 may not be too much to ask, there are a lot of software packages that require that these days, but it starts to add up when you add in the build system, and at that point, if we can't auto-install it (or if auto-install is cumbersome), why not just ask them to use the precompiled binaries.

Man... I've also been looking at a few other things like CDE and Docker that basically try to handle all this by creating a snapshot of all the dependencies (something slightly more lightweight than a full blown VM image). I think it'd work well for internal-only use (or amongst collaborators), but all of these systems pass the legal responsibility off to us, the package/image makers, for making sure we have rights to redistribute, if we do in fact release these things publicly. And Broad IP/legal has made it quite clear to me that redistribution is not a road to go down..

So maybe the end goal should be something like:

  • config file to specify where everything that is pre-installed lives
  • make "from scratch" installers for what we can
  • leave a small handful of things where we just tell folks they have to do it themselves (Novoalign, GATK, probably Java 1.7 also)
  • for those tools where we can't install from scratch, figure out how to make the build system use pre-compiled binaries, which might require a private build server, but for now, I'd venture to say that maybe we just skip those tests until we figure those out? I think there are only two tools so far that fall in this category.

There's a way to make a few categories of tests that at least nose can separate out from each other. Most people will separate all the short and simple tests (typically unit tests) from longer running tests (typically end-to-end tests and regression tests) and have a build server always running the short ones, but then have the full suite only run when asked. Similarly, maybe we can just put tests for Novoalign and GATK and such into a test category that doesn't run on Travis, but does run when we, for example, type some special command on the command line in an environment that has everything pre-installed (like the Broad).

So I see a few investigative tasks to do from here: figuring out a config file system / reader we like, and also figuring out how to have multiple classes of tests that get run in different scenarios. I'll start looking into the latter on Monday if no one else gets to it first.

Danny=

from viral-ngs.

iljungr avatar iljungr commented on May 29, 2024

Do we need to do novoalign? The only recipes that currently contain it are in taxon_filter, and your earlier email suggested we could get rid of that.

Irwin

On Oct 31, 2014, at 3:33 PM, Daniel Park [email protected] wrote:

Ugh, right, that's the real issue. Java 1.7 and Maven are not hard to get going in Travis. But for general users.. well Java 1.7 may not be too much to ask, there are a lot of software packages that require that these days, but it starts to add up when you add in the build system, and at that point, if we can't auto-install it (or if auto-install is cumbersome), why not just ask them to use the precompiled binaries.

Man... I've also been looking at a few other things like CDE and Docker that basically try to handle all this by creating a snapshot of all the dependencies (something slightly more lightweight than a full blown VM image). I think it'd work well for internal-only use (or amongst collaborators), but all of these systems pass the legal responsibility off to us, the package/image makers, for making sure we have rights to redistribute, if we do in fact release these things publicly. And Broad IP/legal has made it quite clear to me that redistribution is not a road to go down..

So maybe the end goal should be something like:

  • config file to specify where everything that is pre-installed lives
  • make "from scratch" installers for what we can
  • leave a small handful of things where we just tell folks they have to do it themselves (Novoalign, GATK, probably Java 1.7 also)
  • for those tools where we can't install from scratch, figure out how to make the build system use pre-compiled binaries, which might require a private build server, but for now, I'd venture to say that maybe we just skip those tests until we figure those out? I think there are only two tools so far that fall in this category.

There's a way to make a few categories of tests that at least nose can separate out from each other. Most people will separate all the short and simple tests (typically unit tests) from longer running tests (typically end-to-end tests and regression tests) and have a build server always running the short ones, but then have the full suite only run when asked. Similarly, maybe we can just put tests for Novoalign and GATK and such into a test category that doesn't run on Travis, but does run when we, for example, type some special command on the command line in an environment that has everything pre-installed (like the Broad).

So I see a few investigative tasks to do from here: figuring out a config file system / reader we like, and also figuring out how to have multiple classes of tests that get run in different scenarios. I'll start looking into the latter on Monday if no one else gets to it first.

Danny=

Reply to this email directly or view it on GitHub.

from viral-ngs.

dpark01 avatar dpark01 commented on May 29, 2024

I'm pretty sure that some other step in consensus.py uses Novoalign.... but we can cross that when we come to it.=

from viral-ngs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.