Coder Social home page Coder Social logo

stratotemplate's Introduction

Stratotemplate

Configuration template for stratocumulus & more

Recommended Usage

Set up your “GClouder”

Create a GCloud VM

  • Log in to Google Cloud, go to the "console" (which is the main dashboard) & enter the Compute Instance Dashboard.
  • From here, create a new VM that will be your virtual devbox to work from (it doesn't need to be beefy).
    • Make sure that, under "Identity and API Access", you select "Allow full access to all Cloud APIs."
  • Add your SSH key to the instance (under Management, disk, networking, SSH keys > SSH Keys at the bottom).
  • The gcloud command-line tool will generate and use SSH keys for you

And click to let the magic happen.

Setup The Host

To SSH into your machine you can use ssh username@instance-ip, instance-ip is found on the information page for your instance after it boots up.
You can also setup a Host in your ~/.ssh/config file.
Let's call it GClouder in the following.

Get to the GClouder:

ssh GClouder

Install a few more things:

sudo apt-get update
sudo apt-get install -y unzip build-essential git

The VM comes with gcloud installed but this deployment does not have full capabilities; we need to get a fresh one (Cf. also https://code.google.com/p/google-cloud-sdk/issues/detail?id=336):

curl https://sdk.cloud.google.com | bash

… saying Yes to everything … and then, you need to reload your .bashrc (just type bash).

For some reason the zone has to be configured again (replace us-east1-c with your favorite part of the world):

gcloud config set compute/zone us-east1-c

Create an SSH key-pair for for gcloud itself:

gcloud compute ssh $(hostname) ls

and accept the prompts (with an empty password).

Get The Configuration Template

We use Git, so that you the user can save their configuration:

git clone https://github.com/smondet/stratotemplate.git
cd stratotemplate

Get a Ketrew Server

This section creates a functional Ketrew server with Google-Container-Engine.

Edit the file configuration.env, make sure you're happy with the $PREFIX and $TOKEN values.

Get the script, and run it:

wget https://raw.githubusercontent.com/hammerlab/stratocumulus/master/tools/gcpketrew.sh -O gcpketrew.sh

. configuration.env
sh gcpketrew.sh up
# The first time this may prompt for a `[Y/n]` question.

When the command returns the deployment is partially ready, one needs to ask for the status a few times before the “External IP” is available:

sh gcpketrew.sh status

When it's ready, a little more configuration is required (is this command fails; wait and try again a minute or so later until it succeeds; the container engine may be slow at creating “pods”):

sh gcpketrew.sh configure+local

(Warning: the +local part will append a line to the ~/.ssh/authorized_keys file, use configure if you don't want that).

At any time the status command will give you the URL of the Ketrew server's WebUI.

Of course, you can save your changes to the stratotemplate repository like any other git repo.

When you want to take the server down (and delete everything related to it):

sh gcpketrew.sh down

Get a Stratocumulus Environment

We're going to use Docker to get a fully functional OCaml/Opam/Stratocumulus environment.

Get Docker:

sudo apt-get install -y docker.io

Get the image:

sudo docker pull smondet/stratocumulus

Make $PWD accessible by the container:

chmod -R a+rw .

Get in:

sudo  docker run -it  -v $PWD:/hostuff/ smondet/stratocumulus bash

Now you're in the right environment to submit stratocumulus deployment jobs.

cd /hostuff

Edit further configuration.env to set GCLOUD_HOST, CLUSTER_NODES … cf. comments in the file.

. configuration.env

Use the URL provided above by sh gcpketrew.sh status to create a Ketrew configuration:

ketrew init --conf ./ketrewdocker/ --just-client $(cat $KETREW_URL)

Create an NFS server with storage:

KETREW_CONFIG=./ketrewdocker/configuration.ml ocaml nfs_server.ml up submit

If you'd like this NFS pool mounted on the cluster you're about to create, you should edit your configuration.env to add it to the CLUSTER_NFS_MOUNT list; stratotemplate does not do this automatically for you. Storage is mounted at /nfs-pool and the witness file is .stratowitness on the newly created servers. You can find the NFS VM name through the GCloud instance list; it will be prefixed with the $PREFIX in your configuration.

Create a compute cluster:

KETREW_CONFIG=./ketrewdocker/configuration.ml ocaml cluster.ml up submit

The 2 above commands submit workflows to the Ketrew server, you can monitor them with the WebUI (see cat $KETREW_URL).

Replace up with down to take the deployments down ☺

Using your machine

Stratotemplate provides a basic biokepi_machine for easy Biokepi.Edsl.Machine.t creation. This you can #use in a script to get a machine, required for most for Biokepi workflows.

A few environment variables need to be set in order for it to work:

  1. PREFIX set already in configuration.env
  2. BIOKEPI_WORK_DIR
  3. GATK_JAR_URL and MUTECT_JAR_URL URLs to GATK and MuTect (1) JARs; Biokepi can't automatically download these because of the restrictive licenses on them.

stratotemplate's People

Contributors

smondet avatar ihodes avatar hammer avatar arahuja avatar

Stargazers

 avatar  avatar

Watchers

Maxim Zaslavsky avatar  avatar Tim O'Donnell avatar  avatar James Cloos avatar Jacki Buros Novik avatar Rohan Pai avatar Leonid Rozenberg avatar giancarlok avatar  avatar

stratotemplate's Issues

Permission denied on PBS server

Used this to re-create a new cluster from scratch. Cluster up seems successful but on submitting a workflow I see:

SSH failed:Warning: Permanently added 'arahuja-strat-pbs-server,10.142.15.192' (ECDSA) to the list of known hosts.
 Permission denied (publickey)

Make configuration.env easier to get right

  • We should indicate which environment variables need to be uncommented
  • PREFIX and TOKEN could be randomly generated
  • On the GClouder image, GCLOUD_HOST could be populated for the user

Have to sudo in order to run sh gcpketrew.up

Otherwise I get

ERROR: (gcloud.components.install) You cannot perform this action because you do not have permission to modify the Google Cloud SDK installation directory [/usr/lib/google-cloud-sdk].

Cluster monitoring web dashboard

Something like Job MonArch (with Ganglia?) might work. Would be nice to be able to easily monitor utilization to know if the cluster is large enough or too large.

Documentation additions

  • what's a witness file?
  • BIOKEPI_WORK_DIR: what should it be?
  • Link to directions to download GATK and MuTect

Consider separating configuration.env into separate configuration files

I see three places where it's used:

  1. Running sh gcpketrew.sh on GClouder
  2. Running nfs_server.ml w/in stratocumulus Docker image on GClouder
  3. Running cluster.ml up w/in stratocumulus Docker image on GClouder

Would it make sense to have 3 separate configuration files, one for each use case?

gcpketrew.sh -> Missing $PREFIX enviroment variable

After running sh gcpketrew.sh for the first time:

sudo sh gcpketrew.sh up
<<<<<<<<
 gcpketrew.sh -> Kubectl not installed; getting it now
>>>>>>>>


Your current Cloud SDK version is: 111.0.0
Installing components from version: 111.0.0

┌─────────────────────────────────────────────┐
│     These components will be installed.     │
├─────────────────────────┬─────────┬─────────┤
│           Name          │ Version │   Size  │
├─────────────────────────┼─────────┼─────────┤
│ kubectl                 │         │         │
│ kubectl (Linux, x86_64) │   1.2.4 │ 8.2 MiB │
└─────────────────────────┴─────────┴─────────┘

For the latest full release notes, please visit:
  https://cloud.google.com/sdk/release_notes

Do you want to continue (Y/n)?

╔════════════════════════════════════════════════════════════╗
╠═ Creating update staging area                             ═╣
╠════════════════════════════════════════════════════════════╣
╠═ Installing: kubectl                                      ═╣
╠════════════════════════════════════════════════════════════╣
╠═ Installing: kubectl (Linux, x86_64)                      ═╣
╠════════════════════════════════════════════════════════════╣
╠═ Creating backup and activating new installation          ═╣
╚════════════════════════════════════════════════════════════╝

Performing post processing steps...done.

Update done!

WARNING: There are older versions of Google Cloud Platform tools on your system PATH.
Please remove the following to avoid accidentally invoking these old tools:

/usr/bin/git-credential-gcloud.sh
/usr/bin/bq
/usr/bin/gcloud
/usr/bin/gsutil


<<<<<<<<
 gcpketrew.sh -> Missing $PREFIX enviroment variable
>>>>>>>>

Race condition in cluster up

Between Setup ihodes-pgv-user's SSH Keys on ihodes-pgv-pbs-server (from ihodes-pgv-pbs-server) and Authorize isaachodes@ihodes-dev for ihodes-pgv-user on ihodes-pgv-pbs-server contesting chmod over /home/username/.ssh

deal with ssh known-hosts automatically

when the user does down and up again with the same ketrew server, the IP address behind $PREFIX-pbs-server changes and SSH stops working.

we need to rm .ssh/known_hosts at the Right Time™

kill stratotemplate and cumulus

Since we don't plan on developing them, nor do we recommend people use them, it might be helpful to shut these repos down? If not, we should at least recommend it.

cluster update

Also a way to update the ketrew server

(This can be mostly documentation)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.