Coder Social home page Coder Social logo

gpalloc's Introduction

ARCHIVED allo, gator

GPALLOC HAS BEEN ARCHIVED

Makes creating billing projects snappy!

For instructions on how to use GPAlloc to provide you with Google projects, see here.

If you need help debugging GPAlloc-related errors, see here.

Developing GPAlloc

Instances

There are three GPAlloc instances, all of which live in the broad-dsp-techops Google project.

Instances for use in tests

The two "production" instances of GPAlloc are:

https://gpalloc-dev.dsp-techops.broadinstitute.org/

  • Creates projects in the @test.firecloud.org domain
  • Used by tests run against FiaBs created in the broad-dsde-dev project (i.e. developer-triggered test runs)

https://gpalloc-qa.dsp-techops.broadinstitute.org/

  • Creates projects in the @quality.firecloud.org domain
  • Used by tests run against FiaBs created in the broad-dsde-qa project (i.e. auto-triggered test runs)

Remember: despite a name that might indicate otherwise, gpalloc-dev is a real instance used by our test environment. It is not for unreleased code.

Instances for GPAlloc developers

https://gpalloc-beta.dsp-techops.broadinstitute.org/ is the "developer" instance. At any given point in time it probably has recent-ish code on it, but you should ssh to the host and run sudo docker ps to find out.

Development process

The conventions for developing on GPAlloc are a little different to what you're used to. The process goes as follows (the complicated steps will be outlined below, hold your horses):

  1. Branch off develop and make your changes.
  2. (Discretionary) Test your changes by manually building the docker and using the gpalloc-instance-deploy Jenkins job to deploy it to gpalloc-beta. Repeat until working.
  3. PR to develop and review.
  4. Wait for CircleCI to finish building off develop.
  5. To release to master, run ./scripts/release_master.sh. This will force-push develop on to master. Yes, you do really want to do this!
  6. Wait for CircleCI to finish building off master.
  7. While you're waiting, make a new release in GitHub
  8. Run the gpalloc-deploy Jenkins job to deploy to the "production" instances. Instructions to deploy to production are here

Getting started

Clone and go into the repo:

$ git clone https://github.com/broadinstitute/gpalloc.git
$ cd gpalloc

Spin up MySQL locally (ensure Docker is running):

$ ./docker/run-mysql.sh start gpalloc

Build GPAlloc and run the tests:

export SBT_OPTS="-Xmx2G -Xms1G -Dmysql.host=localhost -Dmysql.port=3311"
sbt clean compile test

Once you're done, tear down MySQL:

./docker/run-mysql.sh stop gpalloc

Development cycle

Note that you may need to start/resume the dsp-gpalloc-beta VM in the GCP Console. Auth as your @firecloud.org account.

Note that the git branch name is used in the created project names, so

  1. don't make it too long -- 9 characters maximum
  2. don't put an underscore in it.

Otherwise Google won't let you create the project (name too long or contains invalid characters).

To deploy to gpalloc-beta, first manually build and push your branch of gpalloc to DockerHub:

local $ ./docker/build.sh jar
local $ ./docker/build.sh -d build
local $ ./docker/build.sh -d push

Then synchronize with the #gpalloc channel on Slack, just to let people know you're stealing gpalloc-beta.

Finally, run gpalloc-instance-deploy to deploy to gpalloc-beta. At the time of writing the value for PRIV_HOST you want is 10.255.55.42, but it should be in the text of the Jenkins job. Use your Git branch name as the value for IMAGE name. Leave the ENVIRONMENT as dev (unless you really want the config for the @quality.firecloud.org domain).

You can then test your code on gpalloc-beta and repeat this cycle as needed.

Watching CircleCI for auto-builds of develop and master

CircleCI builds Docker images for the develop and master branches of this repository on commits to those branches. Click the link to look at it.

Making a new release in GitHub

Go to the Releases page in GitHub. Hit "Draft a new release". It should look like this:

image

Note that:

  • The version is incremented
  • The target is the commit hash at the tip of master, not the master branch itself. This is important!

This doesn't do anything per se, but it serves as a record of what got released and when.

Deploying the master branch to gpalloc-dev and gpalloc-qa

Use the gpalloc-deploy Jenkins job for this. This time, select image=master. This will deploy to both gpalloc-dev and gpalloc-qa.

Miscellaneous things

Re-deploying gpalloc

Follow the instructions to deploy to production are here

Certificate issues

If re-deploying causes cert issues, we may need to update the path to the certs, like we did in this PR.

Connecting to the gpalloc VM

Make sure you're on the Broad Internal wifi or on the non-split VPN. Run the following from the command line:

For gpalloc-dev: gcloud beta compute ssh --zone "us-central1-a" "dsp-gpalloc-dev101" --project "broad-dsp-techops" [email protected]

For gpalloc-qa: gcloud beta compute ssh --zone "us-central1-a" "dsp-gpalloc-qa101" --project "broad-dsp-techops" [email protected]

Troubleshooting

ssh: connect to host ... port 22: Operation timed out ...

Connect to the Broad Internal wifi or on the non-split VPN

Could not fetch resource | The resource ... was not found

The VMs may have been replaced. Look for the dsp-gpalloc VMs in the GCP Console. Auth as your @firecloud.org account. Update this doc as necessary.

Connection failed | We are unable to connect to the VM on port 22

Don't try to connect to the VPN using GCP. It will only end in tears.

Deploying the develop branch to gpalloc-beta

We don't really have a "dev" environment of GPAlloc; gpalloc-beta is "scratch space for devs" and gpalloc-dev and gpalloc-qa are "production" instances for their respective Firecloud test domains. However, if you ever want to deploy whatever's on develop to gpalloc-beta, you can do that with the gpalloc-deploy Jenkins job.

To deploy the develop Docker image to the -beta instance, select image=develop.

Manually deploying to gpalloc-beta

This is deeply shenanigans and you shouldn't need do it, but it can be quicker if you're making rapid changes. (Caveat: you're unlikely to have SSH access to the machine, and Bernick probably won't give it to you.)

  1. SSH into the machine.
  2. Edit /app/docker-compose.yaml to point to your new Docker image.
  3. Restart the Docker:
gpalloc-beta $ sudo docker-compose -p gpalloc -f /app/docker-compose.yml stop
gpalloc-beta $ sudo docker-compose -p gpalloc -f /app/docker-compose.yml rm -f
gpalloc-beta $ sudo docker-compose -p gpalloc -f /app/docker-compose.yml pull
gpalloc-beta $ sudo docker-compose -p gpalloc -f /app/docker-compose.yml up -d

Communications

  • GPalloc is used by various services in dsp-workbench. If you're re-creating projects pool, please send a heads-up for interruptions in #dsp-workbench.

gpalloc's People

Contributors

akarukappadath avatar andy7i avatar ansingh7115 avatar dvoet avatar gpcarr avatar gpolumbo-broad avatar helgridly avatar jacmrob avatar jdcanas avatar kyuksel avatar marctalbott avatar matthewbemis avatar qi77qi avatar rtitle avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gpalloc's Issues

GPAlloc gets 429 when EnablingServices

The call to enableCloudServices sometimes fails with one or more of the requests returning with Too Many Requests. This kills monitoring for that project until gpalloc restart. GPAlloc should retry nicely.

Clean up ALL THE THINGS

In scrubBillingProject we clean up permissions to the project, permissions to the Cromwell auth bucket, and pet SA keys.

We can (and should) clean up other things:

  • Any running GCE VMs
  • Any running Dataproc clusters
  • ... other?

Google throws PERMISSION_DENIED mysteriously (on cleanup?)

Not much useful info in the stacktrace as we're lost in a maze of ForkJoin and google doesn't tell us what we're not allowed to do. I suspect it's cleanup, since creation appears to be working.

[INFO] [22:43:40.139] [scala-execution-context-global-44] o.b.d.w.g.dao.HttpGoogleBillingDAO - retry-able operation failed: retries remain but predicate failed, not retrying
com.google.api.client.googleapis.json.GoogleJsonResponseException: 403 Forbidden
{
  "code" : 403,
  "errors" : [ {
    "domain" : "global",
    "message" : "The caller does not have permission",
    "reason" : "forbidden"
  } ],
  "message" : "The caller does not have permission",
  "status" : "PERMISSION_DENIED"
}
	at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)
	at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113)
	at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:321)
	at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1065)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
	at org.broadinstitute.dsde.workbench.google.GoogleUtilities.$anonfun$executeGoogleCall$1(GoogleUtilities.scala:70)
	at scala.util.Try$.apply(Try.scala:209)
	at org.broadinstitute.dsde.workbench.google.GoogleUtilities.executeGoogleCall(GoogleUtilities.scala:70)
	at org.broadinstitute.dsde.workbench.google.GoogleUtilities.executeGoogleCall$(GoogleUtilities.scala:67)
	at org.broadinstitute.dsde.workbench.gpalloc.dao.HttpGoogleBillingDAO.executeGoogleCall(HttpGoogleBillingDAO.scala:40)
	at org.broadinstitute.dsde.workbench.google.GoogleUtilities.executeGoogleRequest(GoogleUtilities.scala:51)
	at org.broadinstitute.dsde.workbench.google.GoogleUtilities.executeGoogleRequest$(GoogleUtilities.scala:50)
	at org.broadinstitute.dsde.workbench.gpalloc.dao.HttpGoogleBillingDAO.executeGoogleRequest(HttpGoogleBillingDAO.scala:40)
	at org.broadinstitute.dsde.workbench.gpalloc.dao.HttpGoogleBillingDAO.$anonfun$googleRq$1(HttpGoogleBillingDAO.scala:251)
	at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:81)
	at scala.concurrent.impl.ExecutionContextImpl$DefaultThreadFactory$$anon$2$$anon$5.block(ExecutionContextImpl.scala:76)
	at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3313)
	at scala.concurrent.impl.ExecutionContextImpl$DefaultThreadFactory$$anon$2.blockOn(ExecutionContextImpl.scala:71)
	at scala.concurrent.package$.blocking(package.scala:142)
	at org.broadinstitute.dsde.workbench.google.GoogleUtilities.$anonfun$retryWhen500orGoogleError$3(GoogleUtilities.scala:42)
	at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:653)
	at scala.util.Success.$anonfun$map$1(Try.scala:251)
	at scala.util.Success.map(Try.scala:209)
	at scala.concurrent.Future.$anonfun$map$1(Future.scala:287)
	at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:29)
	at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:29)
	at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)
	at scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:140)
	at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
	at java.util.concurrent.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1021)
	at java.util.concurrent.ForkJoinPool$WorkQueue.execLocalTasks(ForkJoinPool.java:1046)
	at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1058)
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

Rawls needs an endpoint to assume control of an already created billing project

This is work in Rawls but I figured we should track it here.

NOTE: And an endpoint to forget it ever assumed control of it.

I thought this might be difficult, but the Doge says: "Assuming project is created right, 1 row in rawls db, create resource in sam."

We should consider leaving the endpoint undocumented in Swagger / admin-only / otherwise secret.

Handle failures more gracefully

Right now we just dump the error into the log and leave the project record in whatever status it was before the error. Transition it to failed and (maybe?) delete the project too.

Add more logging around running out of projects

QA keeps reporting that gpalloc is running out of projects. Add more logging around throwing NoGoogleProjectAvailable and the subsequent calls that trigger creation of new projects so we can verify that it's at least trying to keep up with the load.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.