Coder Social home page Coder Social logo

cognoml's Introduction

Project Cognoma

Putting machine learning in the hands of cancer biologists.

Project Cognoma is an open source project to create a webapp for analyzing cancer data. We're a community-driven philanthropic project that began as a collaboration between the Greene Lab, DataPhilly, and Code for Philly. Our contributors are primarily based in the Philadelphia area, but anyone anywhere is welcome. This GitHub repository is the administrative and informational home of Cognoma.

The Meetup phase of Cognoma is now complete! The Childhood Cancer Data Lab of Alex's Lemonade Stand Foundation will be providing longterm maintenance. Public contributions are still welcome through GitHub. The main priority is enhancements and bug fixes to improve http://cognoma.org. For a nice overview of the project, see its coverage by The Philadelphia Citizen.

Teams and Repositories

The project is composed of four teams with their own corresponding repositories:

Team Name Repositories Description
Cancer Data cancer-data, genes, figshare processing the underlying cancer data to the formats required for this project.
Machine Learning machine-learning, cognoml building classifiers to predict mutation status from gene expression data.
Backend core-service, task-service, ml-workers, infrastructure creating the infrastructure to power the webapp and glue the components together.
Frontend frontend, uiux building the webapp that users interact with.

New Here?

If you are a new user and would like to get involved, please introduce yourself. Contributions are made through GitHub, so if you are unfamiliar with git or GitHub, check out the sandbox for a place to learn by doing.

Meetup Schedule

We hold project meetups. Our usual meeting spot is at Industrious (where CandiDate is located). The address is 230 S Broad St, Floor 17, Philadelphia.

πŸ“… Date ⌚ Time πŸ—Ί Location ℹ️ Meetup Details πŸ’° Sponsor
Wednesday, October 11, 2017 6:00 PM MilkBoy DataPhilly Alex’s Lemonade Stand Foundation
Tuesday, August 15, 2017 6:00 PM CandiDate DataPhilly Penn Institute for Biomedical Informatics
Tuesday, July 11, 2017 6:00 PM CandiDate DataPhilly Penn Institute for Biomedical Informatics
Tuesday, June 27, 2017 6:00 PM CandiDate DataPhilly Penn Institute for Biomedical Informatics
Tuesday, May 30, 2017 6:00 PM CandiDate DataPhilly Penn Institute for Biomedical Informatics
Tuesday, April 25, 2017 6:00 PM CandiDate DataPhilly Penn Institute for Biomedical Informatics
Tuesday, April 4, 2017 6:00 PM CandiDate DataPhilly Penn Institute for Biomedical Informatics
Tuesday, February 28, 2017 6:00 PM CandiDate DataPhilly Penn Institute for Biomedical Informatics
Monday, February 13, 2017 6:00 PM CandiDate DataPhilly Penn Institute for Biomedical Informatics
Tuesday, January 31, 2017 6:00 PM CandiDate DataPhilly Penn Institute for Biomedical Informatics
Monday, January 16, 2017 9:00 AM Philly Think Space Frontend Only MLK Day Volunteers from Think Company
Tuesday, January 10, 2017 6:00 PM CandiDate DataPhilly Penn Institute for Biomedical Informatics
Tuesday, December 20, 2016 6:00 PM CandiDate DataPhilly Penn Institute for Biomedical Informatics
Tuesday, December 6, 2016 6:00 PM CandiDate DataPhilly Penn Institute for Biomedical Informatics
Tuesday, November 15, 2016 6:00 PM CandiDate DataPhilly Penn Institute for Biomedical Informatics
Tuesday, November 1, 2016 6:00 PM CandiDate DataPhilly Penn Institute for Biomedical Informatics
Tuesday, October 18, 2016 6:00 PM CandiDate DataPhilly Penn Institute for Biomedical Informatics
Tuesday, October 4, 2016 6:00 PM CandiDate DataPhilly Penn Institute for Biomedical Informatics
Monday, September 19, 2016 6:00 PM CandiDate DataPhilly Penn Institute for Biomedical Informatics
Tuesday, September 6, 2016 6:00 PM CandiDate DataPhilly Penn Institute for Biomedical Informatics
Tuesday, August 23, 2016 6:00 PM CandiDate DataPhilly Penn Institute for Biomedical Informatics
Tuesday, August 9, 2016 6:00 PM CandiDate DataPhilly Penn Institute for Biomedical Informatics
Tuesday, July 26, 2016 6:00 PM CandiDate DataPhilly Penn Institute for Biomedical Informatics
Tuesday, July 19, 2016 6:00 PM CandiDate DataPhilly Penn Institute for Biomedical Informatics
Tuesday, July 12, 2016 6:00 PM CandiDate DataPhilly MilkBoy
Tuesday, July 5, 2016 6:00 PM CandiDate DataPhilly Neo Technology
Tuesday, June 28, 2016 6:00 PM MilkBoy DataPhilly / Code for Philly MilkBoy

Contributing

Community contributions are the driving force behind Cognoma. The heatmap below shows which users have contributed to which repositories:

Contribution Heatmap

See the guidelines for contributing for more information.

Maintainers

Cognoma relies on our generous community maintainers to assist with contributions. Thanks to the following maintainers for their help:

cognoml's People

Contributors

dhimmel avatar jessept avatar rdvelazquez avatar vasudevanv avatar wisygig avatar yl565 avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

cognoml's Issues

Write unit tests

Testing will include 3 different components

  1. Testing for data
  2. Testing for the classifier
  3. Testing for the pipeline

Possible test frameworks to use: pytest, unittest, nose

Add worker to query classifier data to fit cognoml model

@awm33 I created this so we can track it going forward.

A given classifier task has a list of entrezids and disease types. The worker code will query for any samples that match the list of disease types and join that to the mutations table. The result will not be an a [sample_id,mutation_status] form, so the worker needs to transform it into that form and pass it to the cognoml code.

Use OrderedDict in performance processing

Dicts do not preserve ordering and the output is in JSON form, so we may end up seeing erroneous diffs if ordering is not preserved. This is just needed in the get_results method in the CognomlClassifier class.

Create base data class

CognomlData assumes too much about the size and shape of the data sets. We need to create a base data class with some basic methods, then let additional classes inherit those methods.

Explore optimization of machine learning pipeline

The current classifier pipeline takes a long time to fit and may be fitting the same model multiple times. This should be looked at in hopes of finding some low-hanging fruit in performance gains.

Write integration tests

We have a separate issue for unit tests, integration tests will be more complicated.

We need to create a "test" data set, run it, and check it against expected output.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.