wansonchoi / cookhla Goto Github PK
View Code? Open in Web Editor NEWAn accurate and efficient HLA imputation method.
An accurate and efficient HLA imputation method.
Hello, @WansonChoi ,
It's very kind of you create this method for HLA imputation. I noticed that this software automaticlly transformed the imputed targeted data to hg18, but current data are mostly based on hg19 and hg38, how should i do if i'm willing to construct my own reference panel in hg19 and impute based on it?
Waiting for your kindly reply.
python -m MakeGeneticMap \
-i example/1958BC.hg19 \
-hg 19 \
-ref 1000G_REF/1000G_REF.EUR.chr6.hg18.29mb-34mb.inT1DGC \
-o MyAGM/1958BC+1000G_REF.EUR
and also an error happens when i run this (used genetic map file your guys put in example file) example code, like below
python CookHLA.py -i example/1958BC.hg19 -hg 19 -o MyHLAImputation/1958BC+HM_CEU_REF -ref example/HM_CEU_REF -gm example/AGM.1958BC+HM_CEU_REF.mach_step.avg.clpsB -ae example/AGM.1958BC+HM_CEU_REF.aver.erate
Thx a lot if you can help me.
Hi there,
I'm wondering if it's possible to meta-analyse more than 2 datasets at once?
Thanks
In the readme it's mentioned that the user has to download the dependencies by themselves, but the dependencies are already there in the dependency folder which comes with CookHLA. Can someone please clarify?
Hi @WansonChoi and team, thank you for this helpful tool. I am using this pipeline with the T1DGC reference to impute HLA alleles for ~60k samples. I am performing this on a desktop with 32 GB memory and 4.29 GHz 6-core processor, using the options -mem 29g -mp 6 -nth 8
. The software got to the point of Performing HLA imputation(exon3 / overlap: 5000) in about 2 hours, then was stuck there for 3 days. Would you be able to comment on whether this is normal, or what is the usual expected runtime for this many samples on such a machine?
I also got a warning initially that ~114k markers failed to lift down to hg18. Would that indicate the use of a different reference? Currently using hg19 which I believe is correct for the samples, and I tried different ones but CookHLA would give errors.
Thanks in advance!
I am having difficulty in preparing the input files for this function. Given that I am starting with CRAM files, I convert to BAM and then to BED using samtools and bedtools. I then sort the bed files. I am having difficulty merging these bed files as the code and examples seem to suggest as I need to do. I was wondering if anyone has a solution to this (I have tried using cat to join them all and then using mergeBed, but I was told there was an out of order record with the start coordinate being outside of the region I specified [28999852 when specified the start coordinate as 29000000])
I set up CookHLA for our study containing ~43k samples -- it failed with BEAGLE although I reserved 250GB RAM; would it be possible to do so? When I used only 2,491 samples it worked.
The screen output is as follows for the ~43k sample,
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/rds/user/jhz22/hpc-work/CookHLA/src/HLA_Imputation_BEAGLE5.py", line 555, in IMPUTE
subprocess.run(re.split('\s+', command), check=True, stdout=f_log, stderr=f_log)
File "/usr/local/software/master/python/3.7/lib/python3.7/subprocess.py", line 512, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['java', '-Djava.io.tmpdir=/home/jhz22/Caprion/analysis/work/hla_CookHLA.javatmpdir', '-Xmx250000m', '-jar', './dep>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/software/master/python/3.7/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/rds/user/jhz22/hpc-work/CookHLA/src/HLA_Imputation_BEAGLE5.py", line 559, in IMPUTE
raise CookHLAImputationError(std_ERROR_MAIN_PROCESS_NAME + "Imputation({} / overlap:{}) failed.\n".format(_exonN, _overlap))
src.CookHLAError.CookHLAImputationError:
[HLA_Imputation_BEAGLE5.py::ERROR]: Imputation(exon3 / overlap:1.5) failed.
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "CookHLA.py", line 1035, in
f_save_IMPUTATION_INPUT=args.save_IMPUTATION_INPUT)
File "CookHLA.py", line 862, in CookHLA
f_measureAcc_v2=f_measureAcc_v2)
File "/rds/user/jhz22/hpc-work/CookHLA/src/HLA_Imputation_BEAGLE5.py", line 179, in init
self.dict_IMP_Result[_exonN][_overlap] = dict_Pool[_exonN][_overlap].get()
File "/usr/local/software/master/python/3.7/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
src.CookHLAError.CookHLAImputationError:
[HLA_Imputation_BEAGLE5.py::ERROR]: Imputation(exon3 / overlap:1.5) failed.
[CookHLA.py]: CookHLA : Performing HLA imputation for 'MyHLAImputation/APMDA.samples.phased.COPY.LiftDown_hg18.NoAmbig'
[HLA_Imputation.py::ERROR]: Input file for imputation('MyHLAImputation/APMDA.samples.51.MHC.QC.vcf') contains nothing. Please check it again.
[HLA_Imputation.py::ERROR]: Input file for imputation('MyHLAImputation/APMDA.samples.51.MHC.QC.vcf') contains nothing. Please check it again.
[HLA_Imputation.py::ERROR]: Input file for imputation('MyHLAImputation/APMDA.samples.51.MHC.QC.vcf') contains nothing. Please check it again.
....
Dear,
I'm sorry to bother you. I'm having trouble with CookHLA. We were very interested in CookHLA, which is amazing and great work.
We want to know if GWAS summary data can run CookHLA because I see he needs .bed .fam .bim files.
If so, what to do with the GWAS summary file.
Looking forward to your reply.
@WansonChoi
Hi ,when i try to use CookHLA on a reference panel build by myself, MakeGeneticMap scripts will go wrong with FATAL ERROR Marker AA_C_-18_31347808_Rx is duplicated.
i check the .markers file it include marks :
AA_C_-18_31347808_R 31347808 P A
AA_C_-18_31347808_x 31347808 P A
AA_C_-18_31347808_Q 31347808 P A
AA_C_-18_31347808_X 31347808 P A
AA_C_-18_31347808_Rx 31347808 P A
AA_C_-18_31347808_RQ 31347808 P A
AA_C_-18_31347808_RX 31347808 P A
it may be caused by "AA_C_-18_31347808_Rx 31347808 P A" and "AA_C_-18_31347808_RX 31347808 P A"
because it seem like Rx/RX make this error.
And some times error like this "Error: Duplicate ID 'chr6_31529929_C_T'. "
Do u have any suggestion?
Thanks for making this software available. I can successfully impute from the 1000G individual superpopulation files, but I'm seeing an error when I try to use the combined overall 1000G reference panel. Specifically Beagle says:
java.lang.IllegalArgumentException: 3
at vcf.BitSetGTRec.get(BitSetGTRec.java:171)
at vcf.BasicGT.allele(BasicGT.java:136)
at vcf.SplicedGT.allele(SplicedGT.java:104)
at phase.ImputeBaum.unscaledAlProbs(ImputeBaum.java:151)
at phase.ImputeBaum.imputeInterval(ImputeBaum.java:124)
at phase.ImputeBaum.phase(ImputeBaum.java:107)
at phase.PhaseLS.lambda$runStage2$2(PhaseLS.java:148)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
though exactly which exon raises the error doesn't seem to be consistent. Sometimes its 2.1 but also 2.1.5 too.
Have you been able to use the 100G_ALL reference successfully? Is there anything that needs to be changed or checked to work with the larger reference panel?
Hi,
I dont want to use conda and instead use virtualenv to run CookHLA. But I am unable to use YML files. Is it possible to get a requirements.txt file?
Thanks,
Pooja.
@WansonChoi
Thank you very much for developing this software.
I encountered this problem when using Pan-Asian as panel-reference:
ERROR: Reference and target files have no markers in common in interval:
6:25002566-26995909
Can you tell me how to solve this problem?
Hi @WansonChoi ,
I am running CookHLA with a target data (N larger than 50000) and the 1000G reference data in your software(N=504). Everything went well without an error but no results were achieved. So I am wandering if the strange issue came up because of my sample is bigger than the example in your github from which I used the parameter of "mem"(2g) and "window"(5). The imputation log is as follows:
respri.hg19.hla.MHC.QC.exon2.0.5.raw_imputation_out.log
I will be very grateful if you can reply!
Thanks,
Guo
Hello @WansonChoi
I am facing error in the last step of CookHLA pipeline, I used my data (chr6 29mb-34mb) data with the reference data from 1000 genome reference panel however it failed due to the following error
[4] Performing HLA imputation(exon2 / overlap:0.5).
[HLA_Imputation_BEAGLE5.py::ERROR]: Imputation(exon2 / overlap:0.5) failed.
Traceback (most recent call last):
File "/share/home/aamir/CookHLA-master/src/HLA_Imputation_BEAGLE5.py", line 555, in IMPUTE
subprocess.run(re.split('\s+', command), check=True, stdout=f_log, stderr=f_log)
File "/share/home/aamir/anaconda3/envs/CookHLA/lib/python3.6/subprocess.py", line 418, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['java', '-Djava.io.tmpdir=MyHLAImputation/data+1000G_REF.EAS.javatmpdir', '-Xmx2000m', '-jar', './dependency/beagle5.jar', 'gt=MyHLAImputation/data1+1000G_REF.EAS.MHC.QC.vcf', 'ref=MyHLAImputation/1000G_REF.EAS.chr6.hg18.29mb-34mb.inT1DGC.exon2.phased.vcf', 'out=MyHLAImputation/data1+1000G_REF.EAS.MHC.QC.exon2.0.5.raw_imputation_out', 'impute=true', 'gp=true', 'overlap=0.5', 'err=0.00350207085828343', 'map=MyHLAImputation/data1+1000G_REF.EAS.mach_step.avg.clpsB.exon2.txt', 'window=5', 'ne=10000', 'nthreads=1']' returned non-zero exit status 1.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "CookHLA.py", line 1035, in
f_save_IMPUTATION_INPUT=args.save_IMPUTATION_INPUT)
File "CookHLA.py", line 862, in CookHLA
f_measureAcc_v2=f_measureAcc_v2)
File "/share/home/aamir/CookHLA-master/src/HLA_Imputation_BEAGLE5.py", line 154, in init
self.AVER, self.dict_ExonN_AGM[_exonN], f_prephasing=f_prephasing)
File "/share/home/aamir/CookHLA-master/src/HLA_Imputation_BEAGLE5.py", line 559, in IMPUTE
raise CookHLAImputationError(std_ERROR_MAIN_PROCESS_NAME + "Imputation({} / overlap:{}) failed.\n".format(_exonN, _overlap))
src.CookHLAError.CookHLAImputationError:
[HLA_Imputation_BEAGLE5.py::ERROR]: Imputation(exon2 / overlap:0.5) failed.
Can you please guide me to solve this issue
Thank you for developing this wonderful tool. I have two questions for you as I failed to reproduce the toy example.
MyAGM
folder in the working directory, so I don't know if I missed anything.python -m MakeGeneticMap -i example/1958BC.hg19 -hg 19 -ref 1000G_REF/1000G_REF.EUR.chr6.hg18.29mb-34mb.inT1DGC -o MyAGM/1958BC+1000G_REF.EUR
Namespace(human_genome='19', input='example/1958BC.hg19', out='MyAGM/1958BC+1000G_REF.EUR', reference='1000G_REF/1000G_REF.EUR.chr6.hg18.29mb-34mb.inT1DGC')
sh: 1: None: not found
Traceback (most recent call last):
File "/home/wem26/miniconda3/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/wem26/miniconda3/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/wem26/CookHLA/MakeGeneticMap/__main__.py", line 102, in <module>
CookHLA_MakeGeneticMap(args.input, args.human_genome, args.reference, args.out)
File "/home/wem26/CookHLA/MakeGeneticMap/__main__.py", line 39, in __init__
self.GeneticMap = MakeGeneticMap(_input, _reference, _out)
File "/home/wem26/CookHLA/MakeGeneticMap/MakeGeneticMap.py", line 26, in MakeGeneticMap
N_sample_target = getSampleNumbers(_input+'.fam')
File "/home/wem26/CookHLA/src/checkInput.py", line 26, in getSampleNumbers
with open(_fam, 'r') as f_fam:
FileNotFoundError: [Errno 2] No such file or directory: 'MyAGM/1958BC.hg19.COPY.LiftDown_hg18.fam'
I am trying to use the Han reference panel for which the authors have provided .markers and .bgl files. Any advice on how to generate the other reference files needed by cookHLA would be most helpful. Most of the file conversion tools I have tried complain about indels and A/P markers. Thank you for your guidance.
@WansonChoi
Thanks for making this tools.
When i use Pan-Asian reference panel to do imputation ,on final step Converting out imputation result(s),it will report TypeError: expected string or bytes-like object.
Pan_Asina_REF_log.txt
I would be grateful if you could shed some light on this.
Hello, I am having difficulty interpreting the alleles file due to lack of column naming. It does not seem to be described in the readme. Could you please add this to the readme?
Hi, I have 4050 sampled for HLA imputation. In the HLA_IMPUTATION_OUT.HLATypeCall.log file, I got the following notes.
Why HLA_DQA1, HLA_DPA1 and HLA_DPB1 were not imputed? Is that memory issue? Thanks in advance!
[1] "No HLA_DQA1 in this study."
[1] "No HLA_DPA1 in this study."
[1] "No HLA_DPB1 in this study."
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.