Coder Social home page Coder Social logo

cookhla's People

Contributors

wansonchoi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

cookhla's Issues

version of reference panel

Hello, @WansonChoi ,
It's very kind of you create this method for HLA imputation. I noticed that this software automaticlly transformed the imputed targeted data to hg18, but current data are mostly based on hg19 and hg38, how should i do if i'm willing to construct my own reference panel in hg19 and impute based on it?
Waiting for your kindly reply.

some errors happen when I run your example code

image

i just used your example code, and raise errors like picture above.
python -m MakeGeneticMap \
    -i example/1958BC.hg19 \
    -hg 19 \
    -ref 1000G_REF/1000G_REF.EUR.chr6.hg18.29mb-34mb.inT1DGC \
    -o MyAGM/1958BC+1000G_REF.EUR

and also an error happens when i run this (used genetic map file your guys put in example file) example code, like below

python CookHLA.py -i example/1958BC.hg19 -hg 19 -o MyHLAImputation/1958BC+HM_CEU_REF -ref example/HM_CEU_REF -gm example/AGM.1958BC+HM_CEU_REF.mach_step.avg.clpsB -ae example/AGM.1958BC+HM_CEU_REF.aver.erate

image

Thx a lot if you can help me.

Meta-analysis

Hi there,
I'm wondering if it's possible to meta-analyse more than 2 datasets at once?
Thanks

Dependencies

In the readme it's mentioned that the user has to download the dependencies by themselves, but the dependencies are already there in the dependency folder which comes with CookHLA. Can someone please clarify?

Exon3 imputation for 3 days+?

Hi @WansonChoi and team, thank you for this helpful tool. I am using this pipeline with the T1DGC reference to impute HLA alleles for ~60k samples. I am performing this on a desktop with 32 GB memory and 4.29 GHz 6-core processor, using the options -mem 29g -mp 6 -nth 8. The software got to the point of Performing HLA imputation(exon3 / overlap: 5000) in about 2 hours, then was stuck there for 3 days. Would you be able to comment on whether this is normal, or what is the usual expected runtime for this many samples on such a machine?

I also got a warning initially that ~114k markers failed to lift down to hg18. Would that indicate the use of a different reference? Currently using hg19 which I believe is correct for the samples, and I tried different ones but CookHLA would give errors.

Thanks in advance!

Producing input files for MakeGeneticMap

I am having difficulty in preparing the input files for this function. Given that I am starting with CRAM files, I convert to BAM and then to BED using samtools and bedtools. I then sort the bed files. I am having difficulty merging these bed files as the code and examples seem to suggest as I need to do. I was wondering if anyone has a solution to this (I have tried using cat to join them all and then using mergeBed, but I was told there was an out of order record with the start coordinate being outside of the region I specified [28999852 when specified the start coordinate as 29000000])

Is it possible with ~43k sample?

I set up CookHLA for our study containing ~43k samples -- it failed with BEAGLE although I reserved 250GB RAM; would it be possible to do so? When I used only 2,491 samples it worked.

The screen output is as follows for the ~43k sample,

multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/rds/user/jhz22/hpc-work/CookHLA/src/HLA_Imputation_BEAGLE5.py", line 555, in IMPUTE
subprocess.run(re.split('\s+', command), check=True, stdout=f_log, stderr=f_log)
File "/usr/local/software/master/python/3.7/lib/python3.7/subprocess.py", line 512, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['java', '-Djava.io.tmpdir=/home/jhz22/Caprion/analysis/work/hla_CookHLA.javatmpdir', '-Xmx250000m', '-jar', './dep>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/software/master/python/3.7/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/rds/user/jhz22/hpc-work/CookHLA/src/HLA_Imputation_BEAGLE5.py", line 559, in IMPUTE
raise CookHLAImputationError(std_ERROR_MAIN_PROCESS_NAME + "Imputation({} / overlap:{}) failed.\n".format(_exonN, _overlap))
src.CookHLAError.CookHLAImputationError:
[HLA_Imputation_BEAGLE5.py::ERROR]: Imputation(exon3 / overlap:1.5) failed.

"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "CookHLA.py", line 1035, in
f_save_IMPUTATION_INPUT=args.save_IMPUTATION_INPUT)
File "CookHLA.py", line 862, in CookHLA
f_measureAcc_v2=f_measureAcc_v2)
File "/rds/user/jhz22/hpc-work/CookHLA/src/HLA_Imputation_BEAGLE5.py", line 179, in init
self.dict_IMP_Result[_exonN][_overlap] = dict_Pool[_exonN][_overlap].get()
File "/usr/local/software/master/python/3.7/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
src.CookHLAError.CookHLAImputationError:
[HLA_Imputation_BEAGLE5.py::ERROR]: Imputation(exon3 / overlap:1.5) failed.

I got error on convert vcf step "Inconsistent marker IDs: [markers file]"

[CookHLA.py]: CookHLA : Performing HLA imputation for 'MyHLAImputation/APMDA.samples.phased.COPY.LiftDown_hg18.NoAmbig'

  • Java memory = 8000m(Mb)
  • Using Local Embedding.
  • Using Adaptive Genetic Map.
  • Small Sample mode. (because # of target samples < 100)
    [1] Extracting SNPs from the MHC.
    [2] Performing SNP quality control.
    Warning: 3 variants had at least one non-A/C/G/T allele name.
    Warning: At least 2 duplicate IDs in --exclude file.
    62248
    62248
    62248
    [3] Converting data to beagle format.
    Exception in thread "main" java.lang.IllegalArgumentException: Inconsistent marker IDs: [markers file]=SNPS_DQA1_2481_32639939_intron1 [BEAGLE file]=chr6:32667088:G:A
    at beagleutil.Beagle2Vcf.checkConsistency(Beagle2Vcf.java:153)
    at beagleutil.Beagle2Vcf.main(Beagle2Vcf.java:67)

[HLA_Imputation.py::ERROR]: Input file for imputation('MyHLAImputation/APMDA.samples.51.MHC.QC.vcf') contains nothing. Please check it again.

[HLA_Imputation.py::ERROR]: Input file for imputation('MyHLAImputation/APMDA.samples.51.MHC.QC.vcf') contains nothing. Please check it again.

[HLA_Imputation.py::ERROR]: Input file for imputation('MyHLAImputation/APMDA.samples.51.MHC.QC.vcf') contains nothing. Please check it again.
....

Input problem

Dear,

I'm sorry to bother you. I'm having trouble with CookHLA. We were very interested in CookHLA, which is amazing and great work.

We want to know if GWAS summary data can run CookHLA because I see he needs .bed .fam .bim files.

If so, what to do with the GWAS summary file.

Looking forward to your reply.

MakeGeneticMap with FATAL ERROR Marker # is duplicated

@WansonChoi
Hi ,when i try to use CookHLA on a reference panel build by myself, MakeGeneticMap scripts will go wrong with FATAL ERROR Marker AA_C_-18_31347808_Rx is duplicated.
i check the .markers file it include marks :
AA_C_-18_31347808_R 31347808 P A
AA_C_-18_31347808_x 31347808 P A
AA_C_-18_31347808_Q 31347808 P A
AA_C_-18_31347808_X 31347808 P A
AA_C_-18_31347808_Rx 31347808 P A
AA_C_-18_31347808_RQ 31347808 P A
AA_C_-18_31347808_RX 31347808 P A
it may be caused by "AA_C_-18_31347808_Rx 31347808 P A" and "AA_C_-18_31347808_RX 31347808 P A"
because it seem like Rx/RX make this error.
And some times error like this "Error: Duplicate ID 'chr6_31529929_C_T'. "
Do u have any suggestion?

Reference 1000G ALL not working?

Thanks for making this software available. I can successfully impute from the 1000G individual superpopulation files, but I'm seeing an error when I try to use the combined overall 1000G reference panel. Specifically Beagle says:

java.lang.IllegalArgumentException: 3
	at vcf.BitSetGTRec.get(BitSetGTRec.java:171)
	at vcf.BasicGT.allele(BasicGT.java:136)
	at vcf.SplicedGT.allele(SplicedGT.java:104)
	at phase.ImputeBaum.unscaledAlProbs(ImputeBaum.java:151)
	at phase.ImputeBaum.imputeInterval(ImputeBaum.java:124)
	at phase.ImputeBaum.phase(ImputeBaum.java:107)
	at phase.PhaseLS.lambda$runStage2$2(PhaseLS.java:148)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

though exactly which exon raises the error doesn't seem to be consistent. Sometimes its 2.1 but also 2.1.5 too.

Have you been able to use the 100G_ALL reference successfully? Is there anything that needs to be changed or checked to work with the larger reference panel?

Requirements.txt

Hi,
I dont want to use conda and instead use virtualenv to run CookHLA. But I am unable to use YML files. Is it possible to get a requirements.txt file?

Thanks,
Pooja.

can not get alleles result but no error

Hi @WansonChoi ,

I am running CookHLA with a target data (N larger than 50000) and the 1000G reference data in your software(N=504). Everything went well without an error but no results were achieved. So I am wandering if the strange issue came up because of my sample is bigger than the example in your github from which I used the parameter of "mem"(2g) and "window"(5). The imputation log is as follows:

respri.hg19.hla.MHC.QC.exon2.0.5.raw_imputation_out.log

I will be very grateful if you can reply!

Thanks,
Guo

[HLA_Imputation_BEAGLE5.py::ERROR]: Imputation(exon2 / overlap:0.5) failed.

Hello @WansonChoi
I am facing error in the last step of CookHLA pipeline, I used my data (chr6 29mb-34mb) data with the reference data from 1000 genome reference panel however it failed due to the following error
[4] Performing HLA imputation(exon2 / overlap:0.5).

[HLA_Imputation_BEAGLE5.py::ERROR]: Imputation(exon2 / overlap:0.5) failed.

Traceback (most recent call last):
File "/share/home/aamir/CookHLA-master/src/HLA_Imputation_BEAGLE5.py", line 555, in IMPUTE
subprocess.run(re.split('\s+', command), check=True, stdout=f_log, stderr=f_log)
File "/share/home/aamir/anaconda3/envs/CookHLA/lib/python3.6/subprocess.py", line 418, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['java', '-Djava.io.tmpdir=MyHLAImputation/data+1000G_REF.EAS.javatmpdir', '-Xmx2000m', '-jar', './dependency/beagle5.jar', 'gt=MyHLAImputation/data1+1000G_REF.EAS.MHC.QC.vcf', 'ref=MyHLAImputation/1000G_REF.EAS.chr6.hg18.29mb-34mb.inT1DGC.exon2.phased.vcf', 'out=MyHLAImputation/data1+1000G_REF.EAS.MHC.QC.exon2.0.5.raw_imputation_out', 'impute=true', 'gp=true', 'overlap=0.5', 'err=0.00350207085828343', 'map=MyHLAImputation/data1+1000G_REF.EAS.mach_step.avg.clpsB.exon2.txt', 'window=5', 'ne=10000', 'nthreads=1']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "CookHLA.py", line 1035, in
f_save_IMPUTATION_INPUT=args.save_IMPUTATION_INPUT)
File "CookHLA.py", line 862, in CookHLA
f_measureAcc_v2=f_measureAcc_v2)
File "/share/home/aamir/CookHLA-master/src/HLA_Imputation_BEAGLE5.py", line 154, in init
self.AVER, self.dict_ExonN_AGM[_exonN], f_prephasing=f_prephasing)
File "/share/home/aamir/CookHLA-master/src/HLA_Imputation_BEAGLE5.py", line 559, in IMPUTE
raise CookHLAImputationError(std_ERROR_MAIN_PROCESS_NAME + "Imputation({} / overlap:{}) failed.\n".format(_exonN, _overlap))
src.CookHLAError.CookHLAImputationError:
[HLA_Imputation_BEAGLE5.py::ERROR]: Imputation(exon2 / overlap:0.5) failed.
Can you please guide me to solve this issue

Fail to reproduce the toy example

Thank you for developing this wonderful tool. I have two questions for you as I failed to reproduce the toy example.

  1. When I tried to generate the adaptive genetic map using the toy dataset available in the folder, I got the following errors. I have created MyAGM folder in the working directory, so I don't know if I missed anything.
python -m MakeGeneticMap -i example/1958BC.hg19 -hg 19 -ref 1000G_REF/1000G_REF.EUR.chr6.hg18.29mb-34mb.inT1DGC -o MyAGM/1958BC+1000G_REF.EUR
Namespace(human_genome='19', input='example/1958BC.hg19', out='MyAGM/1958BC+1000G_REF.EUR', reference='1000G_REF/1000G_REF.EUR.chr6.hg18.29mb-34mb.inT1DGC')
sh: 1: None: not found
Traceback (most recent call last):
  File "/home/wem26/miniconda3/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/wem26/miniconda3/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/wem26/CookHLA/MakeGeneticMap/__main__.py", line 102, in <module>
    CookHLA_MakeGeneticMap(args.input, args.human_genome, args.reference, args.out)
  File "/home/wem26/CookHLA/MakeGeneticMap/__main__.py", line 39, in __init__
    self.GeneticMap = MakeGeneticMap(_input, _reference, _out)
  File "/home/wem26/CookHLA/MakeGeneticMap/MakeGeneticMap.py", line 26, in MakeGeneticMap
    N_sample_target = getSampleNumbers(_input+'.fam')
  File "/home/wem26/CookHLA/src/checkInput.py", line 26, in getSampleNumbers
    with open(_fam, 'r') as f_fam:
FileNotFoundError: [Errno 2] No such file or directory: 'MyAGM/1958BC.hg19.COPY.LiftDown_hg18.fam'
  1. Is there a way to merge reference panels, e.g., all 1000Genome panels?

advice on using the Han panel

I am trying to use the Han reference panel for which the authors have provided .markers and .bgl files. Any advice on how to generate the other reference files needed by cookHLA would be most helpful. Most of the file conversion tools I have tried complain about indels and A/P markers. Thank you for your guidance.

Interpreting the alleles file

Hello, I am having difficulty interpreting the alleles file due to lack of column naming. It does not seem to be described in the readme. Could you please add this to the readme?

HLA Classic alleles not completed imputed

Hi, I have 4050 sampled for HLA imputation. In the HLA_IMPUTATION_OUT.HLATypeCall.log file, I got the following notes.
Why HLA_DQA1, HLA_DPA1 and HLA_DPB1 were not imputed? Is that memory issue? Thanks in advance!

[1] "No HLA_DQA1 in this study."
[1] "No HLA_DPA1 in this study."
[1] "No HLA_DPB1 in this study."

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.