cognoma / core-service Goto Github PK

Cognoma Core API

License: Other

Python 8.10% Shell 1.55% Jupyter Notebook 90.35%

cognoma django python webserver docker codeforphilly

core-service's Introduction

Cognoma core-service

This repository, under the umbrella of Project Cognoma (https://github.com/cognoma), holds the source code, under open source license, of a runnable django rest API, a component in the overall system specified in Project Cognoma.

Getting started

Make sure to fork this repository on +GitHub first.

Prerequisites

Docker - tested with 1.12.1
Docker Compose - tested with 1.8.0

Starting up the service

docker-compose up

Sometimes the postgres image takes a while to load on first run and the Django server starts up first. If this happens just ctrl+C and rerun docker-compose up

The code in the repository is also mounted as a volume in the core-service container. This means you can edit code on your host machine, using your favorite editor, and the django server will automatically restart to reflect the code changes.

The server should start up at http://localhost:8080/, see the API docs.

Swagger UI

Accessing the root API endpoint (ex: http://localhost:8080/) will bring up the Swagger UI for viewing the API.

Note: swagger will only display endpoints that you are authorized to view. In order to authenticate, go to the top right corner and click Authorize. Where it says api_key, type in Bearer <your_random_slug_here> and hit enter to authenticate for the rest of the session.

Running tests locally

Make sure the service is up first using docker-compose up then run:

docker-compose exec core python manage.py test

Loading cancer static data

To load data, again with service up run:

docker-compose exec core bash
python manage.py acquiredata
python manage.py loaddata

To verify, run curl http://localhost:8000/diseases/ to get a list of all diseases.

Or, run curl http://localhost:8000/samples?limit=10 to view data for 10 samples.

Deployment

Prerequisites

This project is deployed within the Greene Lab AWS account. To be able to deploy this project you will need to:

Be invited to the account.
Receive an AWS access key and secret key.

Logging Into ECR

This project leverages AWS Ec2 Container Service (ECS). ECS provides a private container registry called the Ec2 Container Repository (ECR). To be able to push Docker images to this repository you will first need to get a login with:

aws ecr get-login --region us-east-1

and then run the output of that command. It will look something like:

docker login -u AWS -p <A_GIANT_HASH> -e none https://589864003899.dkr.ecr.us-east-1.amazonaws.com

Building, Tagging, and Pushing the Container

This project uses two containers: one for Nginx and one for the core-service. You will probably be deploying only the core-service unless you have modified config/prod/nginx.conf.

Core Service Container

Run these commands:

docker build --tag cognoma-core-service .
docker tag cognoma-core-service:latest 589864003899.dkr.ecr.us-east-1.amazonaws.com/cognoma-core-service:latest
docker push 589864003899.dkr.ecr.us-east-1.amazonaws.com/cognoma-core-service:latest

Nginx Container

Run these commands:

docker build --tag cognoma-nginx --file config/prod/Dockerfile_nginx .
docker tag cognoma-nginx:latest 589864003899.dkr.ecr.us-east-1.amazonaws.com/cognoma-nginx:latest
docker push 589864003899.dkr.ecr.us-east-1.amazonaws.com/cognoma-nginx:latest

Restarting the ECS Task

Navigate to Cognoma's ECS Tasks Page and select the tasks corresponding to the container you are deploying. The tasks will have a Task Definition like either cognoma-core-service:X or cognoma-nginx:X which can be used to determine which are the correct tasks. Once you have selected the correct tasks click the Stop button. This will cause the tasks to be stopped and ECS will restart them with the new version of the container you have pushed. Therefore you're now done.

Updating the Data

Take a look at api/management/commands/acquiredata.py. In there are two commit hashes, one for most of the data called COMMIT_HASH and one just for genes data called GENES_COMMIT_HASH. Update these hashes to the hash of the data that you want to update to. Once you've done so you should redploy the core-service. Once the core-service has been redeployed you should ssh onto the EC2 instance next you should run docker ps to get a list of containers running on that instance. Find the name for the core-service container and run docker exec -it <that_name> /bin/bash. Within that shell run the following commands:

python3 manage.py acquiredata
python3 manage.py loaddata

If these complete successfully then the data has been downloaded and loaded into cognoma's database.

core-service's People

Contributors

Stargazers

Watchers

Forkers

jhonjairoroa87 ypar dhimmel stephenshank cgreene aelkner cepstralspike kurtwheeler bdolly dcgoss znatty22 marcelkooi

core-service's Issues

Endpoint/Models for "samples/examples"

Researchers will select which samples they want to include in the analysis. From a machine learning point of view, we mean which examples are relevant to the researcher. These samples will have various metadata. The GDC Data Portal [ https://gdc-portal.nci.nih.gov/search/s ] has a very nice interface for these metadata. Essentially the facets on the left for "cases" are the same ones that we would expect to be relevant here.

endpoint to get a full single user object by random_slug value

see thread #36

Switching to conda to manage the Python environment

Currently (bee0519), django-python specifies using virtualenv to manage the environment. I'm wondering if it makes sense to switch to conda.

I've been using conda for managing my Python environment for over a year and it's pretty awesome. It does a great job quickly installing advanced scientific libraries.

If people think this makes sense, I'm happy to submit a pull request. We should also choose which version of python we would like to use. I think Python 3.5.1 is the natural choice, but conda makes switching your python version easy.

Finally, does anyone have advice on whether we should use a project wide environment or whether each component repository should specify its own environment?

Set Default User name to "Guest User ${user.id}"

would be nice to have the user name saved as "Gust User <user.id>" in the DB when creating a new user. Right now it's handled in the front-end.

Add filter to and field selection to /samples

In response to the needs presented in cognoma/cancer-data#29 we need to add the ability for the frontend to get a list of samples related to a given mutated gene.

Example:
/samples?mutations__gene=1234&mutations__status=true&&fields=sample_id

Looks like the SearchFilter class supports filtering related models. The dynamic fields library can be used to filter fields. Note: Even though the examples use ModelSerializer, it should work with the regular Serializer used in this repo.

Update analysis finished email

Update text to read:

Your Cognoma classifier is complete. The results are available as a Jupyter notebook. Please see classifier_120.ipynb.

This classifier predicts mutations in the following genes: XXX, XXX, XXX. Cancers of the following types were included: XX, XX, XX. In total, XX of XX cancers were mutated for at least one of the query genes.

See cognoma/frontend#132 for more details

Core-service does not return correct sample counts for multi-gene query

This was initially filed as cognoma/frontend#160

After digging there, it appears to be a core-service issue. I'm filing this issue to track things in the correct repository.

Consolidate core-service and task-service

Following discussion on cognoma/task-service#17 and with @cgreene, I believe it makes sense to move forward with consolidating core-service and task-service. I am currently wrapping up a PR which does just that, by means of transferring the relevant TaskDef and Task columns from task-service directly onto the Classifier in core-service.

Gene Search

The Cognoma team would like to provide some search functionality to assist the user during gene selection.

Options include:

Simple LIKE operation on the field(s) to be searched
Postgres full text search
Elasticsearch

The Greene Lab has already used elasticsearch with the django-genes model that we're including in this project. They used haystack which is a django elasticsearch library.

There is a haystack library for django rest framework.

If the search is simple, or doesn't need full text capabilities, then LIKE is the simplest solution.

Postgres full text search would mean one less moving part. Here is a good blog post on setting it up in the django rest framework

Elasticsearch has already been used with this model and provides advanced search capabilities, but would mean running another server. AWS does have a hosted elasticsearch solution, and setup can be pretty simple. I've also used elasticsearch for a lot of things, including searching medical records.

I think we should start by getting some examples.

@cgreene @dhimmel @gwaygenomics Can you provide some examples queries? Literally what a user would be typing, along with how you think it would connect to the gene model, like "this person is typing in part of a standard name" or something similar. CC'ing @BobMiller @bdolly

Jobs fail due to memory usage

When fitting models on all disease types, it's common for the job to exceed its memory allotment and fail.

We can increase the instance size as a first step. If that becomes cost prohibitive, we can consider changes to our dask-searchcv configuration. Currently, we use the default cache_cv=True:

Whether to extract each train/test subset at most once in each worker process, or every time that subset is needed. Caching the splits can speedup computation at the cost of increased memory usage per worker process. If True, worst case memory usage is (n_splits + 1) * (X.nbytes + y.nbytes) per worker. If False, worst case memory usage is (n_threads_per_worker + 1) * (X.nbytes + y.nbytes) per worker.

This really speeds things up, so setting cache_cv=False would not be ideal.

Algorithms Model/Endpoint

Cognoma will need to be able to return a list of the supported algorithms, as well as the characteristics of those algorithms relevant for the algorithm selector.

We should define how to contribute

The parent's contribution file is available here:
https://github.com/cognoma/cognoma/blob/master/CONTRIBUTING.md

We can use unmodified but we may want to add some discussion of emergency/hot fixes if required.

docker-compose up error on Windows

I'm using docker for windows on Windows 10. The "hello world" on docker for windows works fine but when I try to run "docker-compose up" from this repo it doesn't seem to work. I posted the error message below. This is likely user error but I figured I'd post this issue in case anyone has seen this before or has any advice. Thanks in advance.

Starting coreservice_core_db_1 ...
Starting coreservice_core_db_1 ... done
Starting coreservice_core_1 ...
Starting coreservice_core_1 ... done
Starting coreservice_nginx_1 ...
Starting coreservice_nginx_1 ... done
Attaching to coreservice_core_db_1, coreservice_core_1, coreservice_nginx_1
core_db_1 | LOG: database system was shut down at 2017-09-01 22:13:40 UTC
core_db_1 | LOG: MultiXact member wraparound protections are now enabled
: invalid optionn/bash: -
core_db_1 | LOG: database system is ready to accept connections
core_1 | Usage: /bin/bash [GNU long option] [option] ...
core_db_1 | LOG: autovacuum launcher started
core_1 | /bin/bash [GNU long option] [option] script-file ...
core_1 | GNU long options:
core_1 | --debug
core_1 | --debugger
core_1 | --dump-po-strings
core_1 | --dump-strings
core_1 | --help
core_1 | --init-file
core_1 | --login
core_1 | --noediting
core_1 | --noprofile
core_1 | --norc
core_1 | --posix
core_1 | --rcfile
core_1 | --restricted
core_1 | --verbose
core_1 | --version
core_1 | Shell options:
core_1 | -ilrsD or -c command or -O shopt_option (invocation only)
core_1 | -abefhkmnptuvxBCHP or -o option
coreservice_core_1 exited with code 2

Automated Testing

We should set up a continuous integration framework to automatically run tests. I am at a conference and study section this week but can look into this next weekend, or if someone else wants to snag this task - feel free to assign to yourself!

Need to use an application server like uwsgi

Hi team,

One thing I noticed is the backend is using the built in development server that comes with Django.
It's not suitable for production environment, you can get a better performance by using app server like uwsgi.

Another thing, nginx might not be needed if you use uwsgi directly without using static files like images and css in the backend. It looks like this app is used only as an api end point.

If you need to modify any headers in the returned request, you can have that in a middle ware, and have all the requests have any headers you want.

--
Ahmed

Serving Static Files

In accordance with cognoma/infrastructure#5, some settings and nginx changes need to be made in order to serve static files. In the future we may want to serve these from S3.

https://docs.djangoproject.com/en/1.11/howto/static-files/deployment/

GET api.cognoma.org/users returns the users random_slugs array

@dcgoss looks like the GET/users endpoint is returning the full user object for all users. Shouldn't this endpoint exclude the random_slugs array since we only want that returned on POST /users and GET /users/<random_slug> ?

Index user.random_slugs

The user.random_slugs field is searched every time a request authenticates a user. Because that operation is run so often, this field should be indexed. The field is a postgres array type, an array of strings. It looks like the best way to index this is using gin.

This currently uses the __contains operation built into django. Will operation take advantage of an index? If not, we need to somehow use one that will, even if it means passing raw SQL. Most querysets (all?) expose a query property that can be used to see what sql was run, passing this to EXPLAIN in postgres will tell you if the index is being used.

Authentication?

Do we need auth? If so should we use sessions, tokens, JWT, or a combo? The type will depend on what platforms cognoma can be accessed from (browser, app, etc).

Development environment requirements

So far...

python 3.5
django 1.10 (needs verification)
postgres 9.5+ (needs verification)

We will finalize and make a md or yml doc for env req/dep

python manage.py loaddata is killed mysteriously while loading mutations

While attempting to solve #56 (comment) @dhimmel and I ran python manage.py loaddata on one of the EC2 instances within the running Docker container. The script got Killed mysteriously. Here is the output of said script:

root@core-service:/code# python manage.py loaddata                  
Loading mutations table...
Processing 1000 rows so far
Processing 2000 rows so far
Processing 3000 rows so far
Processing 4000 rows so far
Processing 5000 rows so far
Processing 6000 rows so far
Processing 7000 rows so far
Bulk loading mutation data...
Killed
root@core-service:/code# $?
bash: 137: command not found

We researched what error code 137 means and it appears to be a SIGKILL with priority 9. We cannot determine what would be sending that signal. @dhimmel thinks it may be caused by running out of memory, however we monitored the memory usage during the execution and it didn't exceed 23% of the memory. We tried multiple times to execute this command, some of which it died before getting as far as it did in the above output.

This is the relevant code block where the command is getting murdered.

We used the API to inspect the number of diseases, samples, and genes and those tables all seem to have been populated successfully.

@awm33 @stephenshank any ideas?

Determine branching pattern.

We should develop a common branching pattern with the front end so that someone working on one repo will be familiar with the patterns on the other.

/genes/ sluggish response

Accessing the /genes/ endpoint takes a long time, because the server has to process 100 gene objects with all of its related mutations. This can be resolved by reducing pagination page sizes.

Checking In

Hey all,

I've been off the grid for a few days now, I promise I will catch up. Due to some trips planned months in advance, I will be unable to attend the Tuesday meeting for the next three weeks. I will be sure to check for the updates and do my best to contribute remotely.

Cheers!
Derek

Rename the repo?

We started referring to this as the "core" or "core api". There was also the "brain" but I think that would be confusing in a ML project :).

Some ideas:

cognoma/core
cognoma/core-backend
cognoma/core-api
cognoma/core-service
cognoma/api

Maybe others can post some suggestions?

Report figshare data version in notebook output

Track which version of the data (figshare or cancer data sha) that was used for a classifer

Create api doc for classifier creation user interface rest endpoint

@dhimmel please label this as a task and tell me how you did that as I'm having trouble finding a link to add a label.

Specify python version

Currently we have a requirements.txt, but we don't specify our Python version.

I support using Python 3.5, so we can take advantage of the improved language. Package support shouldn't be an issue, since most packages now are Python 3 compatible. And if a package does not yet support Python 3, I don't think it's good technology to learn for the future.

https://api.cognoma.org is offline

Any get request to any of the endpoints are getting a DNS error. It seems as though the backend needs restarted.

Improve error handling from ml-worker

When ml-worker fails processing a classifier, it hits the fail endpoint and that's that. There is no information stored about what type of error it was or what caused it. Furthermore, when a user is emailed about the error it just says that it failed, with no reasons why.

There should be two fields added to the classifier that store errors provided by ml-worker when classifier processing fails:

fail_reason: string of title of error, example: memory_error or processing_error
fail_message: string providing explanation of fail_reason, maybe a traceback

This information should be included in failure emails as well.

Gene Selection API needs/isssues

Here are the three methods the UI will use for the user to create the desired gene list. Each has its own data/API issues.

Gene Selection - direct selection

User needs to select genes from list of 20,000 genes. User will enter the first few letters of the gene identifier, a listbox will be populated containing only genes matching those letters.
Here are three possible options for doing this:

UI uses API requests full gene list (20,000), server sends it to UI. UI is responsible for filtering list to display. This approach may tax UI resources, and be too slow for a smooth interface.
UI uses API to request only genes starting with a string of letters, server returns a filtered list to UI. This process will happen multiple times as user thins list. Quick response from the server is critical for smooth performance.
UI uses API to request full ordered gene list, server returns not only full list, but also an index of the list. The index will contain the stating location of every 2 letter combination. The UI would then use the index to quickly filter the full gene list and populate the listbox.

Gene Selection - selection by path

In selection by path, a listbox is populated by known pathways, the user selects a path, a second listbox is populated containing only those genes associated with the selected path. The user then selects some or all of the desired genes from this second listbox.

To initialize this screen, an API will be needed to request the path list. The server should return the list of know paths.

Once a user selects a path, an API is needed to request only those genes associated with this path. The server should return a gene list.

Gene Selection - custom query

This is the least defined of the three gene selection methods. In this method, the UI will provide the user with a query building gui in which the user will construct a custom gene selection query. An API will submit the custom query/criteria to the server. The server will return a gene list.

Swagger Docs

If we document the API using swagger, generating docs and API clients would be easy.

Internal API Auth

This issue is needed by task-service as well. Basically, we internal processes, such as workers or the services, to be able to talk to each other and make POST/PUT calls. The current UI only lets users do this, and POST/PUTs to the task-service will not be allowed from the public.

The most basic thing would be to just hardcode something in the environment. But I think digitally signed tokens / JSON Web Tokens using a public/private key pair would be best.

The signed tokens would be created using a python script and access to a private key. The running services would have access to a public key to verify tokens and each would have a token.

Failed to hit internal service for: post /classifiers/143/upload/

On February 16, I submitted a classifier which failed. The title of the email was "Cognoma Classifier 143 Processing Failure" and the body was:

An error has occurred and your classifier could not be processed.
Error: Failed to hit internal service for: post /classifiers/143/upload/
Support is available at https://github.com/cognoma.

Loading cancer static tables

Ideally some sort of script that can load the data on the local machine and in production.

genes - comes django-genes
organisms - comes django-organisms
diseases
samples
mutations

Status/'y' selector

This is a very brief description that we'll need to flesh out more, particularly with help from @gwaygenomics:

Cognoma will provide the opportunity to construct a supervised machine learning model that predicts a feature of interest. For example, mutations of genes in a specific pathway.

We anticipate that this endpoint will allow the user to specify a set of genes and samples. The endpoint would return the number of samples that contain alterations within that set.

Task creation and expansion

When a classifier is created, we need to queue a task in the task server. When classifier objects are retrieved or listed using the REST API, the task should be available for expansion as a child.

Looking at using this RemoteField library, or potentially rolling my own since this doesn't like compabilable with djangorestframework-expander, which works well. Might be a good opportunity for a mini open source lib of my own. Would really help for using django rest framework for microservices.

@dbolly @BobMiller Is the frontend going to keep the classifier in-memory before it's ready to go, or persist it before then? If it needs to be persisted before then, we need to track the frontend state with a new field. Basically, if it's ready to be queued, but you may want to track more states for the wizard like UI, so the user can pick up where they left off.

CORS settings need adjusted for local dev environment

@dcgoss the front-end in a local dev environment runs from http://locahost:3000 so requests to the core-service running locally via docker at http://localhost:8080 are not from the same origin and I think it's causing CORS issue with the POST /classifier request.

Angular sends a pre-flight OPTIONS request that is throwing the error below preventing the POST request to go through

Docker Environments

Testing Environment
Production Environment
Development Environment?

django-cognoma and task-service both use python+postgres

See implementation diagram https://github.com/cognoma/cognoma#implementation-details

Continuous deployment to automatically deploy updates

Let's look into how we can get CircleCI to deploy the codebase whenever master is updated.

[HackNight] 08/09/16 Meeting minute

Note: This is an intent to formalize hacknights outcomes and todo list
Note2: I assume this project is the (maybe temporary) placeholder for backend development

HackNight reference: http://www.meetup.com/DataPhilly/events/233070705/

The following points were mentioned. Some related issues are already existing, while others aren't.
All attendants are warmly encouraged to discuss this summary

New architecture proposal (please update the picture)
Need for specification: Front-end API (interface between the JavaScript UI and the "backend") Link to issue.
Need for specification: Asynchronous Task Queue (ATQ) (interface between the django backend and the ML module(s))
Need for specification: Machine Learning API (interface between the ATQ and the Machine Learning module(s))
Need for specification: Containers deployment (to start with, a basic microservice breakdown through docker). We need to draw the rough lines of which software module goes where (i.e is an independently deploy-able unit, with its own - and specifed - APIs)
Need for diagram: Database layout. The purpose of this diagram is only to map data to modules. At this time we can cope with imprecision about the scale of data.
Need for directive: How to contribute to django-cognoma. Link to issue

Next meeting: in 2 weeks [please confirm / update with exact date, time and location]

Notebook upload fails when user has no email

If there is no email associated with the user on a classifier the notebook upload requests will fail. This is problematic for ml-workers.
The correct behavior should just be to fail silently.

GET from web browser location bar 500

When a user makes a request user the browser location bar or a link (eg to view in the browser, not JS) the request 500s.

It looks like the renderer needs to be explicitly setup.

From @dhimmel

Maybe it could also be smart enough to add newlines when queried by a browser.

If there's a plugin or some easy way to do that, go with it, but idk if we want to write a custom django rest framework renderer to do it. You would need to detect a browser based on the Agent or that text/html has a higher priority than application/json.

Django rest framework does support the indent parameter in the Accept header.

api.cognoma.org not returning accurate GET responses

@dhimmel @dcgoss @cgreene @awm33
Hey guys when the front-end makes any GET requests to api.cognoma.org it is returning the same response without any data being populated.

EXAMPLE:
GET https://api.cognoma.org/samples?disease=ACC
RESPONSE 200 OK {count: 0, next: null, previous: null, results: []}
It should return the disease data with an array of mutations and the count should be the number of samples per disease as per the API DOCS, this data is used in calculations in the front-end

This same null response is appearing for other GET requests such as
GET https://api.cognoma.org/samples?limit=1&disease=ACC&mutations__gene=770&mutations__gene=767&mutations__gene=4871 which is used to calculate the positive and negative for Mutations in each Disease Type

Sending email after notebook upload to classifier

After ml-worker uploads a notebook to core-service and the notebook is saved on the classifier, an email should be sent to the proper recipient with a link to the file.

core-service `docker-compose up` on mac < 10.12

Working with @aelkner here to get this working and we've been trying to get to the bottom of why docker-compose up works for me but not for him. We've tracked it down to this situation:
https://docs.docker.com/docker-for-mac/networking/

I will let @aelkner update this issues with the details of the workaround. @aelkner will update the readme with a note on the workaround for older macos versions.

Fail to run migrations from scratch using Django management, but all tests run

Been running into a few issues this evening trying to run the migrations for this project. When running locally (using Postgres running on my laptop) or via Docker, I run into the following issue

Starting coreservice_core_db_1
Recreating coreservice_core_1
Recreating coreservice_nginx_1
Attaching to coreservice_core_db_1, coreservice_core_1, coreservice_nginx_1
core_1     | + python manage.py migrate -v3 --no-input
core_db_1  | LOG:  database system was shut down at 2017-01-18 01:56:53 UTC
core_db_1  | LOG:  MultiXact member wraparound protections are now enabled
core_db_1  | LOG:  database system is ready to accept connections
core_db_1  | LOG:  autovacuum launcher started
core_1     | Operations to perform:
core_1     |   Apply all migrations: api, contenttypes
core_1     | Running pre-migrate handlers for application contenttypes
core_1     | Running pre-migrate handlers for application rest_framework
core_1     | Running pre-migrate handlers for application api
core_1     | Running pre-migrate handlers for application organisms
core_1     | Running pre-migrate handlers for application genes
core_1     | Running migrations:
core_1     |   Rendering model states... DONE (0.004s)
core_db_1  | ERROR:  relation "genes_gene" does not exist
core_db_1  | STATEMENT:  ALTER TABLE "mutations" ADD CONSTRAINT "mutations_gene_id_6289b708_fk_genes_gene_id" FOREIGN KEY ("gene_id") REFERENCES "genes_gene" ("id") DEFERRABLE INITIALLY DEFERRED
core_1     |   Applying api.0001_initial...Traceback (most recent call last):
core_1     |   File "/usr/local/lib/python3.5/site-packages/django/db/backends/utils.py", line 64, in execute
core_1     |     return self.cursor.execute(sql, params)
core_1     | psycopg2.ProgrammingError: relation "genes_gene" does not exist
core_1     | 
core_1     | 
... <truncated>

When running the tests locally the migrations all run successfully, and all the tables are created:

$ ./manage.py test -v3
Creating test database for alias 'default' ('test_cognoma_core_service')...
Operations to perform:
  Synchronize unmigrated apps: rest_framework, organisms, staticfiles, genes, postgres
  Apply all migrations: contenttypes, api
Running pre-migrate handlers for application contenttypes
Running pre-migrate handlers for application rest_framework
Running pre-migrate handlers for application api
Running pre-migrate handlers for application organisms
Running pre-migrate handlers for application genes
Synchronizing apps without migrations:
  Creating tables...
    Creating table organisms_organism
    Creating table genes_gene
    Creating table genes_crossrefdb
    Creating table genes_crossref
    Running deferred SQL...
Running migrations:
  Rendering model states... DONE (0.011s)
  Applying api.0001_initial... OK (0.356s)
  Applying api.0002_alter_sample_fields... OK (0.048s)
  Applying api.0003_genes_mutations... OK (0.094s)
  Applying contenttypes.0001_initial... OK (0.031s)
  Applying contenttypes.0002_remove_content_type_name... OK (0.033s)
Running post-migrate handlers for application contenttypes
Adding content type 'contenttypes | contenttype'
Running post-migrate handlers for application rest_framework
Running post-migrate handlers for application api
Adding content type 'api | mutation'
Adding content type 'api | user'
Adding content type 'api | disease'
Adding content type 'api | sample'
Adding content type 'api | gene'
Adding content type 'api | classifier'
Running post-migrate handlers for application organisms
Adding content type 'organisms | organism'
Running post-migrate handlers for application genes
Adding content type 'genes | gene'
Adding content type 'genes | crossref'
Adding content type 'genes | crossrefdb'
test_cannot_update_other_user_classifier (api.test.test_classifiers.ClassifierTests) ... ok
... <truncated>

To get around this, and so that I could start playing around with the project this evening, I copied the schema from the test database and set that as my local DB. Obviously not a proper way of doing things, but meant that I could hack on the project a bit.

Has anybody else encountered this? I'm sure I'm missing something really obvious with regards to the different settings between running the unit tests and the management commands, but I can't seem to figure out what the issue is. Any help much appreciated.

Learning Resources for the Uninitiated - Backend

Branching off from cognoma/cognoma#15, and in similar fashion to cognoma/machine-learning#7 and cognoma/frontend#2, here are some resources for people to become familiar with the backend world.

Front end screen data needs

Here is a list of Front End Screens and there associated data needs.

User Id
Password

Note: User needs to be able to login anonymously.

Sample Chooser/ Status Chooser Screen

initial data input

List of tissues
1a) Tissue description (maybe?)
1b Tissue graphic (maybe?)
List of genes
2a)Gene decription (maybe?)

Algorithm Chooser Screen
initial data input

List of algorithms
1a) Algorithm descrition (maybe?)
Logic info about algorithms (TBD)

Job Submit screen
will send you all chosen data for job
1)selected tissues
2)selected genes
3)selected algorithm
4)user id

expect to receive back job status info for submitted job(s)

Question

while user is going from screen to screen selecting data, should we hold all data selection in browser session and send it to you when job request is submitted,

send you the results of each screen and use the data you have to populate the job Submit screen?

Upload endpoint to store completed notebook on classifier object

When a new classifier is created, core-service creates a new task which is then queued for processing by a ml-worker. When ml-worker finishes processing, it needs to upload the completed notebook back to core-service directly to the classifier.

This uploaded notebook should be stored by core-service as a file, eventually in S3.

Steps:

adding a notebook_file FileField to Classifier
updating the Classifier serializer to handle the new notebook_file field
new permissions to ensure only an internal service like ml-worker can upload a notebook
tests for the permissions and file uploading

cognoma / core-service Goto Github PK

core-service's Introduction

Cognoma core-service

Getting started

Prerequisites

Starting up the service

Swagger UI

Running tests locally

Loading cancer static data

Deployment

Prerequisites

Logging Into ECR

Building, Tagging, and Pushing the Container

Core Service Container

Nginx Container

Restarting the ECS Task

Updating the Data

core-service's People

Contributors

Stargazers

Watchers

Forkers

core-service's Issues

Gene Selection - direct selection

Gene Selection - selection by path

Gene Selection - custom query

Recommend Projects

Recommend Topics

Recommend Org