Comments (14)
Hi @js2dark - This is a great use case for cohorts; we are happy to help.
We have some worked examples of how to combine VCFs & clinical data into a cohort object. For example, an example using TCGA data, with some explanatory text, which references an earlier example for creating a cohort with clinical data only.
In all of these examples, the basic approach is the same: you loop over the units in your cohort (ie patients), creating a Patient object for each one. You then pass this list of Patients to create the Cohort object.
I will say, neither of the examples above includes the use of BAMs; to include these you will want to (when creating a Patient
), also create Sample
s for each of your samples (tumor &/or normal). Then these Sample
objects get included when creating the Patient
.
For example:
normal_sample = Sample(
is_tumor=False,
bam_path_dna=bam_path_dna_normal)
tumor_sample = Sample(
is_tumor=True,
bam_path_dna=bam_path_dna_tumor,
bam_path_rna=bam_path_rna_tumor,
kallisto_path=kallisto_path,
cufflinks_path=cufflinks_path)
These are then passed to the Patient
object when it is instantiated:
patient = Patient(id=patient_id,
benefit=row["is_benefit"],
os=row["OS in days"],
pfs=row[pfs_col], # Depends in RECIST choice
deceased=row["is_deceased"],
progressed=row["is_progressed"],
progressed_or_deceased=row["is_progressed_or_deceased"],
hla_alleles=row["hla_allele_list"],
vcf_paths=snv_vcf_paths,
normal_sample=normal_sample, # <- here
tumor_sample=tumor_sample, # <- and here
additional_data=row.to_dict())
NB: these examples are taken from the code we used recently to analyze some data from a cohort. Including that code here as possibly a more complete example, although beware it was using an earlier version of cohorts so some options may have changed since then.
Hope this gives you a good starting point. Feel free to get in touch if you run into sticky points or to give feedback on the documentation -- admittedly we need to do more on that front & to make these examples easier to find.
from cohorts.
from cohorts.
@js2dark happy to hear that. Sorry the error you are seeing is my fault - the syntax changed in the latest version to variants=[vcf_path1,...]
Apologies.
from cohorts.
from cohorts.
Hi @js2dark / Jason,
This looks great - happy to hear you're getting these results to run, albeit partially. Yes the predicted neoantigen piece requires data for HLA types on each patient. You would need to infer these from your WES / WGS sequencing, or know them for you patients by some other means.
Unfortunately predicted neoantigen data do depend on the HLA type data. You would pass this information to the Patient
objects, as a list of HLA types much as you did for other features.
Just to be clear, this would look something like the following:
Patient(id = "",
hla_alleles = ['A*01:01',
'A*24:02',
'B*08:01',
'B*15:17',
'C*07:01',
'C*07:01'],
... )
from cohorts.
from cohorts.
I'm going to see if I can reproduce this error you're getting - will get back to you. Thanks!
from cohorts.
If I'm in a new python 3.5.2 session with mhcnames v 1.2.0, I see the same thing you're seeing:
Python 3.5.2 |Continuum Analytics, Inc.| (default, Jul 2 2016, 17:53:06)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from mhcnames.parsing_helpers import AlleleParseError
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: cannot import name 'AlleleParseError'
It looks like in v 1.2.0 this should read:
from mhcnames import AlleleParseError
from cohorts.
@js2dark can you send us a traceback from this error you're getting when you have a chance? This will help us determine where in the code this is coming up. Thanks so much!
from cohorts.
@js2dark this issue should be fixed in the latest version of cohorts
. It was caused by a conflict in the latest version of mhctools & the latest version of mhcnames.
If you do pip install git+git://github.com/hammerlab/cohorts
it should be resolved. Thanks for the feedback & please let us know if you continue to run into issues --
from cohorts.
from cohorts.
Hey @js2dark,
Apologies for this being a bit confusing, but you'll actually need to use the versions of mhctools
and mhcnames
that cohorts
now requires vs. upgrading to the latest versions of both of them. @jburos recently made a change in cohorts
to pin mhcnames
to 0.1.0
to solve this automatically.
If you pip install -r requirements.txt
in cohorts
, does that resolve the issue?
Tavi
from cohorts.
from cohorts.
Hey @js2dark, mhctools
and therefore cohorts
expects that you have NetMHC*
tools (e.g. NetMHCcons
) installed; we can't install those for you for license reasons, but the download page is at: www.cbs.dtu.dk/cgi-bin/nph-sw_request?netMHCcons.
You can also configure cohorts
to use other tools (via mhctools
), including our open source tool, https://github.com/hammerlab/mhcflurry.
Does that help?
from cohorts.
Related Issues (20)
- Support easy Cohort creation
- Annoying conflicts between patient-attributes & variables in `additional_data`
- Invalid RGBA argument error using `plot_survival` HOT 3
- Make cohorts more workflow-engine and cloud storage friendly
- Error using median_vaf_purity on a combined-polyphen-snpeff.vcf HOT 6
- StopIteration error results in cached empty VariantCollection
- ImportError: cannot import name 'AlleleParseError' HOT 2
- Update README & documentation to match current usage
- Explain why predicted neoantigens are NaN when HLA types missing
- Set mhcflurry as the default MHC binding predictor
- plot_survival for the whole cohort, not "on" something
- logging does not respond to user-configured level
- KeyError when predicting neoantigens
- Capture logger output when caching a result
- Test new parameters - fail_on_missing_bams, etc
- Refactor `_load_single_patient_variants` function
- Don't cache after only top-priority effects; post-filter instead HOT 3
- all_effects caches, but does not return, all effects HOT 1
- ExonicSpliceSite with Subsitution/etc. alternate_effect is not included in filtering
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cohorts.