Hello, I'm fairly new to python and I've been trying to use the cohorts library to mai

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Help with creating a cohort about cohorts HOT 14 OPEN

js2dark commented on September 22, 2024

Help with creating a cohort

from cohorts.

Comments (14)

jburos commented on September 22, 2024

Hi @js2dark - This is a great use case for cohorts; we are happy to help.

We have some worked examples of how to combine VCFs & clinical data into a cohort object. For example, an example using TCGA data, with some explanatory text, which references an earlier example for creating a cohort with clinical data only.

In all of these examples, the basic approach is the same: you loop over the units in your cohort (ie patients), creating a Patient object for each one. You then pass this list of Patients to create the Cohort object.

I will say, neither of the examples above includes the use of BAMs; to include these you will want to (when creating a Patient), also create Samples for each of your samples (tumor &/or normal). Then these Sample objects get included when creating the Patient.

For example:

normal_sample = Sample(
    is_tumor=False,
    bam_path_dna=bam_path_dna_normal)
tumor_sample = Sample(
    is_tumor=True,
    bam_path_dna=bam_path_dna_tumor,
    bam_path_rna=bam_path_rna_tumor,
    kallisto_path=kallisto_path,
    cufflinks_path=cufflinks_path)

These are then passed to the Patient object when it is instantiated:

patient = Patient(id=patient_id,
    benefit=row["is_benefit"],
    os=row["OS in days"],
    pfs=row[pfs_col], # Depends in RECIST choice
    deceased=row["is_deceased"],
    progressed=row["is_progressed"],
    progressed_or_deceased=row["is_progressed_or_deceased"],
    hla_alleles=row["hla_allele_list"],
    vcf_paths=snv_vcf_paths,
    normal_sample=normal_sample,  # <- here
    tumor_sample=tumor_sample,     # <- and here
    additional_data=row.to_dict())

NB: these examples are taken from the code we used recently to analyze some data from a cohort. Including that code here as possibly a more complete example, although beware it was using an earlier version of cohorts so some options may have changed since then.

Hope this gives you a good starting point. Feel free to get in touch if you run into sticky points or to give feedback on the documentation -- admittedly we need to do more on that front & to make these examples easier to find.

from cohorts.

js2dark commented on September 22, 2024

Hello Jacki, Thank you so much for your response and help I was able to successfully make patients and create them into a Cohort. When I was making Patients with just clinical features such as OS, PFS, deceased and etc. I faced no problem, but when I try to put vcf path by entering either "snv_vcf_paths=..." or "vcf-paths=....", I encounter a "TypeError: __init__() got an unexpected keyword argument 'snv_vcf_paths" or "vcf_paths". I'm sorry if these are really basic questions as I'm still new to python Thank you so much for your help Sincerely, Jason

from cohorts.

jburos commented on September 22, 2024

@js2dark happy to hear that. Sorry the error you are seeing is my fault - the syntax changed in the latest version to variants=[vcf_path1,...]

Apologies.

from cohorts.

js2dark commented on September 22, 2024

Hi Jackie, thank you for your help I got the cohort to run and got the results but for for neoantigen_count, i've been getting "NaN" the code i'm running looks like import pandas as pd import numpy as np import sys from os import path, getcwd, environ from cohorts import Sample, Patient, Cohort, DataFrameLoader from cohorts.variant_stats import variant_stats_from_variant from cohorts.functions import missense_snv_count, neoantigen_count, snv_count patient_1 = Patient(id="patient_1",variants=["/Users/Balthazars/Desktop/Hypermutation/IRCR_GBM_352_TL_SS.mutect_rerun_filter_vep.vcf"],os=70,pfs=24,deceased=True,progressed=True,benefit=False) patient_2 = Patient(id="patient_2",variants=["/Users/Balthazars/Desktop/Hypermutation/IRCR_BT15_847_T02_SS.mutect_pair_filter_vep.vcf"],os=100,pfs=50,deceased=True,progressed=False,benefit=True) #print patient_1 #print patient_2 cohort = Cohort(patients=[patient_1,patient_2],cache_dir="/Users/Balthazars/Desktop/Hypermutation/Results") df = cohort.as_dataframe(on=neoantigen_count) #print df df.to_csv(r'/Users/Balthazars/Desktop/Hypermutation/Results/results.csv',index=None,sep=',',mode='a') Is it because due to absence of HLA alleles in my Patient object? Because when I run the code it says "HLA alleles did not exist for patient patient_1" and the same for patient_2 or is there another required file besides vcf file If it's due to absence of HLA allele, Is there a builtin function within the cohorts for analyzing HLA allele? Thank you so much Sincerely, Jason

from cohorts.

jburos commented on September 22, 2024

Hi @js2dark / Jason,

This looks great - happy to hear you're getting these results to run, albeit partially. Yes the predicted neoantigen piece requires data for HLA types on each patient. You would need to infer these from your WES / WGS sequencing, or know them for you patients by some other means.

Unfortunately predicted neoantigen data do depend on the HLA type data. You would pass this information to the Patient objects, as a list of HLA types much as you did for other features.

Just to be clear, this would look something like the following:

    Patient(id = "", 
    hla_alleles = ['A*01:01',
        'A*24:02',
        'B*08:01',
        'B*15:17',
        'C*07:01',
        'C*07:01'],
    ... )

from cohorts.

js2dark commented on September 22, 2024

Hi Jackie, I got the HLA type information for the patient that I'm running and annotated with " hla_alleles='A2' " or " hla_alleles='B2' " for corresponding patients and I've been using python 3.6 and updated all other packages including mhctools,tensorflow and etc. But seems like from "base_commandline_predictor.py" under mhctools It cant process "from mhcnames.parsing_helpers import AlleleParseError" I was wondering if syntax has changed for this under mhcnames or a different version is required to run this. my mhcnames version is 0.2.1 and mhctools is 1.5.0 Thank you Sincerely, Jason

…

On Tue, Jul 18, 2017 at 1:00 AM, Jacki Buros Novik ***@***.*** > wrote: Hi @js2dark <https://github.com/js2dark> / Jason, This looks great - happy to hear you're getting these results to run, albeit partially. Yes the predicted neoantigen piece requires data for HLA types on each patient. You would need to infer these from your WES / WGS sequencing, or know them for you patients by some other means. Unfortunately predicted neoantigen data do depend on the HLA type data. You would pass this information to the Patient objects, as a list of HLA types much as you did for other features. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#227 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AKp6fui8OydWLdKJ4MDWqtMTJw08PEPiks5sO4OXgaJpZM4OVAnZ> .

-- Jason Kyungha Sa, Ph.D Institute for Refractory Cancer Research Samsung Medical Center

from cohorts.

jburos commented on September 22, 2024

I'm going to see if I can reproduce this error you're getting - will get back to you. Thanks!

from cohorts.

jburos commented on September 22, 2024

If I'm in a new python 3.5.2 session with mhcnames v 1.2.0, I see the same thing you're seeing:

Python 3.5.2 |Continuum Analytics, Inc.| (default, Jul  2 2016, 17:53:06)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from mhcnames.parsing_helpers import AlleleParseError
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: cannot import name 'AlleleParseError'

It looks like in v 1.2.0 this should read:

from mhcnames import AlleleParseError

from cohorts.

jburos commented on September 22, 2024

@js2dark can you send us a traceback from this error you're getting when you have a chance? This will help us determine where in the code this is coming up. Thanks so much!

from cohorts.

jburos commented on September 22, 2024

@js2dark this issue should be fixed in the latest version of cohorts. It was caused by a conflict in the latest version of mhctools & the latest version of mhcnames.

If you do pip install git+git://github.com/hammerlab/cohorts it should be resolved. Thanks for the feedback & please let us know if you continue to run into issues --

from cohorts.

js2dark commented on September 22, 2024

Hi Jackie, Below is the traceback from the error I got previously, Traceback (most recent call last): File "Neoantigen_cohorts.py", line 4, in <module> from cohorts import Sample, Patient, Cohort, DataFrameLoader File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/cohorts/__init__.py", line 15, in <module> from .cohort import Cohort File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/cohorts/cohort.py", line 40, in <module> from mhctools import NetMHCcons, EpitopeCollection File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/mhctools/__init__.py", line 12, in <module> from .netmhc import NetMHC File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/mhctools/netmhc.py", line 20, in <module> from .netmhc3 import NetMHC3 File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/mhctools/netmhc3.py", line 17, in <module> from .base_commandline_predictor import BaseCommandlinePredictor File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/mhctools/base_commandline_predictor.py", line 24, in <module> from mhcnames.parsing_helpers import AlleleParseError ImportError: cannot import name 'AlleleParseError' I updated the cohort through github link that you sent and updated mhctools to version 1.6.0 from 0.3.1 and mhcnames to 0.3.0 from 0.1.0as well. and now I'm getting the following errors Using TensorFlow backend. Traceback (most recent call last): File "Neoantigen_cohorts.py", line 4, in <module> from cohorts import Sample, Patient, Cohort, DataFrameLoader File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/cohorts/__init__.py", line 15, in <module> from .cohort import Cohort File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/cohorts/cohort.py", line 41, in <module> from mhctools import NetMHCcons, EpitopeCollection ImportError: cannot import name 'EpitopeCollection' the versions of cohort is cohorts (0.6.4+14.g6926523) Do I need to use different versions of the above packages or maybe there is another issue Thank you and hope to hear from you soon Sincerely, Jason

…

On Wed, Jul 19, 2017 at 12:51 AM, Jacki Buros Novik < ***@***.***> wrote: @js2dark <https://github.com/js2dark> can you send us a traceback from this error you're getting when you have a chance? This will help us determine where in the code this is coming up. Thanks so much! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#227 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AKp6foMIti0ucyBaJ6SCc_voLrcQHtUIks5sPNSNgaJpZM4OVAnZ> .

-- Jason Kyungha Sa, Ph.D Institute for Refractory Cancer Research Samsung Medical Center

from cohorts.

tavinathanson commented on September 22, 2024

Hey @js2dark,

Apologies for this being a bit confusing, but you'll actually need to use the versions of mhctools and mhcnames that cohorts now requires vs. upgrading to the latest versions of both of them. @jburos recently made a change in cohorts to pin mhcnames to 0.1.0 to solve this automatically.

If you pip install -r requirements.txt in cohorts, does that resolve the issue?

Tavi

from cohorts.

js2dark commented on September 22, 2024

Hi Tavi, I ran the commands and fixed the version to provenance_file_summary': {'cohorts': '0.5.5', 'isovar': '0.7.0', 'mhctools': '0.3.1', 'numpy': '1.13.0', 'pandas': '0.20.3', 'pyensembl': '1.0.3', 'scipy': '0.19.1', 'topiary': '0.1.2', 'varcode': '0.5.15'}} but i'm getting the following errors Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/mhctools/base_commandline_predictor.py", line 137, in __init__ run_command([self.program_name]) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/mhctools/process_helpers.py", line 74, in run_command process = AsyncProcess(args, **kwargs) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/mhctools/process_helpers.py", line 47, in __init__ self.process = Popen(args, stdout=stdout, stderr=stderr) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py", line 707, in __init__ restore_signals, start_new_session) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py", line 1326, in _execute_child raise child_exception_type(errno_num, err_msg) FileNotFoundError: [Errno 2] No such file or directory: 'netMHCcons' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "Neoantigen_cohorts.py", line 14, in <module> df = cohort.as_dataframe(on=neoantigen_count) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/cohorts/cohort.py", line 367, in as_dataframe return apply_func(on, func_name(on), df).return_self(return_cols) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/cohorts/cohort.py", line 355, in apply_func df[col] = df.progress_apply(func, axis=1) ## depends on tqdm on prev line File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tqdm/_tqdm.py", line 530, in inner result = getattr(df, df_function)(wrapper, *args, **kwargs) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/frame.py", line 4262, in apply ignore_failures=ignore_failures) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/frame.py", line 4358, in _apply_standard results[i] = func(v) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tqdm/_tqdm.py", line 526, in wrapper return func(*args, **kwargs) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/cohorts/cohort.py", line 351, in <lambda> func = lambda row: on(row=row, cohort=self, **kwargs) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/cohorts/functions.py", line 41, in wrapper **kwargs) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/cohorts/functions.py", line 58, in wrapper **kwargs) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/cohorts/functions.py", line 230, in neoantigen_count **kwargs) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/cohorts/cohort.py", line 977, in load_neoantigens filter_fn=filter_fn) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/cohorts/cohort.py", line 1012, in _load_single_patient_neoantigens process_limit=process_limit) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/mhctools/netmhc_cons.py", line 41, in __init__ process_limit=process_limit) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/mhctools/base_commandline_predictor.py", line 139, in __init__ raise SystemError("Failed to run %s" % self.program_name) SystemError: ('Failed to run netMHCcons', 'occurred at index 0') Thank you

from cohorts.

tavinathanson commented on September 22, 2024

Hey @js2dark, mhctools and therefore cohorts expects that you have NetMHC* tools (e.g. NetMHCcons) installed; we can't install those for you for license reasons, but the download page is at: www.cbs.dtu.dk/cgi-bin/nph-sw_request?netMHCcons.

You can also configure cohorts to use other tools (via mhctools), including our open source tool, https://github.com/hammerlab/mhcflurry.

Does that help?

from cohorts.

Help with creating a cohort about cohorts HOT 14 OPEN

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent