Coder Social home page Coder Social logo

Help with creating a cohort about cohorts HOT 14 OPEN

js2dark avatar js2dark commented on September 22, 2024
Help with creating a cohort

from cohorts.

Comments (14)

jburos avatar jburos commented on September 22, 2024

Hi @js2dark - This is a great use case for cohorts; we are happy to help.

We have some worked examples of how to combine VCFs & clinical data into a cohort object. For example, an example using TCGA data, with some explanatory text, which references an earlier example for creating a cohort with clinical data only.

In all of these examples, the basic approach is the same: you loop over the units in your cohort (ie patients), creating a Patient object for each one. You then pass this list of Patients to create the Cohort object.

I will say, neither of the examples above includes the use of BAMs; to include these you will want to (when creating a Patient), also create Samples for each of your samples (tumor &/or normal). Then these Sample objects get included when creating the Patient.

For example:

normal_sample = Sample(
    is_tumor=False,
    bam_path_dna=bam_path_dna_normal)
tumor_sample = Sample(
    is_tumor=True,
    bam_path_dna=bam_path_dna_tumor,
    bam_path_rna=bam_path_rna_tumor,
    kallisto_path=kallisto_path,
    cufflinks_path=cufflinks_path)

These are then passed to the Patient object when it is instantiated:

patient = Patient(id=patient_id,
    benefit=row["is_benefit"],
    os=row["OS in days"],
    pfs=row[pfs_col], # Depends in RECIST choice
    deceased=row["is_deceased"],
    progressed=row["is_progressed"],
    progressed_or_deceased=row["is_progressed_or_deceased"],
    hla_alleles=row["hla_allele_list"],
    vcf_paths=snv_vcf_paths,
    normal_sample=normal_sample,  # <- here
    tumor_sample=tumor_sample,     # <- and here
    additional_data=row.to_dict())

NB: these examples are taken from the code we used recently to analyze some data from a cohort. Including that code here as possibly a more complete example, although beware it was using an earlier version of cohorts so some options may have changed since then.

Hope this gives you a good starting point. Feel free to get in touch if you run into sticky points or to give feedback on the documentation -- admittedly we need to do more on that front & to make these examples easier to find.

from cohorts.

js2dark avatar js2dark commented on September 22, 2024

from cohorts.

jburos avatar jburos commented on September 22, 2024

@js2dark happy to hear that. Sorry the error you are seeing is my fault - the syntax changed in the latest version to variants=[vcf_path1,...]

Apologies.

from cohorts.

js2dark avatar js2dark commented on September 22, 2024

from cohorts.

jburos avatar jburos commented on September 22, 2024

Hi @js2dark / Jason,

This looks great - happy to hear you're getting these results to run, albeit partially. Yes the predicted neoantigen piece requires data for HLA types on each patient. You would need to infer these from your WES / WGS sequencing, or know them for you patients by some other means.

Unfortunately predicted neoantigen data do depend on the HLA type data. You would pass this information to the Patient objects, as a list of HLA types much as you did for other features.

Just to be clear, this would look something like the following:

    Patient(id = "", 
    hla_alleles = ['A*01:01',
        'A*24:02',
        'B*08:01',
        'B*15:17',
        'C*07:01',
        'C*07:01'],
    ... )

from cohorts.

js2dark avatar js2dark commented on September 22, 2024

from cohorts.

jburos avatar jburos commented on September 22, 2024

I'm going to see if I can reproduce this error you're getting - will get back to you. Thanks!

from cohorts.

jburos avatar jburos commented on September 22, 2024

If I'm in a new python 3.5.2 session with mhcnames v 1.2.0, I see the same thing you're seeing:

Python 3.5.2 |Continuum Analytics, Inc.| (default, Jul  2 2016, 17:53:06)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from mhcnames.parsing_helpers import AlleleParseError
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: cannot import name 'AlleleParseError'

It looks like in v 1.2.0 this should read:

from mhcnames import AlleleParseError

from cohorts.

jburos avatar jburos commented on September 22, 2024

@js2dark can you send us a traceback from this error you're getting when you have a chance? This will help us determine where in the code this is coming up. Thanks so much!

from cohorts.

jburos avatar jburos commented on September 22, 2024

@js2dark this issue should be fixed in the latest version of cohorts. It was caused by a conflict in the latest version of mhctools & the latest version of mhcnames.

If you do pip install git+git://github.com/hammerlab/cohorts it should be resolved. Thanks for the feedback & please let us know if you continue to run into issues --

from cohorts.

js2dark avatar js2dark commented on September 22, 2024

from cohorts.

tavinathanson avatar tavinathanson commented on September 22, 2024

Hey @js2dark,

Apologies for this being a bit confusing, but you'll actually need to use the versions of mhctools and mhcnames that cohorts now requires vs. upgrading to the latest versions of both of them. @jburos recently made a change in cohorts to pin mhcnames to 0.1.0 to solve this automatically.

If you pip install -r requirements.txt in cohorts, does that resolve the issue?

Tavi

from cohorts.

js2dark avatar js2dark commented on September 22, 2024

from cohorts.

tavinathanson avatar tavinathanson commented on September 22, 2024

Hey @js2dark, mhctools and therefore cohorts expects that you have NetMHC* tools (e.g. NetMHCcons) installed; we can't install those for you for license reasons, but the download page is at: www.cbs.dtu.dk/cgi-bin/nph-sw_request?netMHCcons.

You can also configure cohorts to use other tools (via mhctools), including our open source tool, https://github.com/hammerlab/mhcflurry.

Does that help?

from cohorts.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.