Coder Social home page Coder Social logo

deep-hla's People

Contributors

countdigi avatar tatsuhikonaito avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

deep-hla's Issues

Low number of SNPs consistent with model SNPs

Thanks for your tool - looking forward to using it on our dataset.

The output logger reported:

Imputation processes started at Tue Mar 15 16:43:04 2022.
1349 SNPs used for training.
101355 people loaded from sample.
7382 SNPs loaded from sample.
2 SNPs consistent with model SNPs in position and used for imputation.

Only have 2 SNPs consistent with model SNPs in positions seems quite low. I guess that means SNPs overlapping with those in /DEEP-HLA/Pan-Asian/model/model.bim - I think this is because they are on b36 and my data is on b37.

Do you have any advice on using these models with my data which is b37? It seems like I should liftover the PanAsian dataset to b37, phase and then retrain the model, unless there is an easier way?

Thanks

Problem with impute.py

Hello,

I am currently using DEEP-HLA for the first time, and I have encountered an issue when running the impute.py script. Specifically, I am receiving the following error:

Traceback (most recent call last):
File "impute.py", line 235, in
main()
File "impute.py", line 231, in main
impute(args)
File "impute.py", line 156, in impute
result_phased.loc[hla_info[hla][digit], sample_fam_batch.iid] = phased
File "/home/laura/.pyenv/versions/env_deep_hla-374/lib/python3.7/site-packages/pandas/core/indexing.py", line 205, in setitem
self._setitem_with_indexer(indexer, value)
File "/home/laura/.pyenv/versions/env_deep_hla-374/lib/python3.7/site-packages/pandas/core/indexing.py", line 593, in _setitem_with_indexer
self.obj._data = self.obj._data.setitem(indexer=indexer, value=value)
File "/home/laura/.pyenv/versions/env_deep_hla-374/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 560, in setitem
return self.apply("setitem", **kwargs)
File "/home/laura/.pyenv/versions/env_deep_hla-374/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 438, in apply
applied = getattr(b, f)(**kwargs)
File "/home/laura/.pyenv/versions/env_deep_hla-374/lib/python3.7/site-packages/pandas/core/internals/blocks.py", line 940, in setitem
values[indexer] = value
ValueError: shape mismatch: value array of shape (17,962) could not be broadcast to indexing result of shape (34,962)

I have tried to understand the problem, but as a novice in Python, I am struggling to find a solution. I have successfully run the training.py script, so I believe the issue is related to the impute.py script.

I would greatly appreciate any help or advice you can provide.

Thank you,
Laura

Same example sample in both training and imputing

Hi, I'm trying out DEEP-HLA after reading your paper on Nat. Comm.

I'm trying to follow the example data, I found that step 1. Model training and step 2. Imputation both use the option --sample, and use the same sample 1958BC.

I'm not so clear about the usage of that option. From my understanding, step 1 uses sample 1958BC for validation in training, and step 2 uses sample 1958BC for testing. Am I right?

As I understand, In case that I have my own dataset, say 1000 sequencing samples, I can take out 900 samples to build a training set, 100 samples for testing (both in step 1). Then I genotyped 500 more samples and just input them in step 2.

Please help me to understand this use case, thank you very much.

IndexError: positional indexers are out-of-bounds

hello @tatsuhikonaito , it's vary kind of you to creat and share this algorithm.
i met some errors when applying the train.py to my own reference panel and sample data in cross validation, i'm not sure is this error came out due to the large sample size or too many SNP sites? The detailed information of this error as follow:
`time python $Software_Dir/train.py --ref ${ref} --sample $indir/$sample --model $model --hla $hla --model-dir $indir/$sample.model
Logging to training.log.
Training processes started at Thu May 13 20:19:27 2021.
Loading files...
10689 people loaded from reference.
29948 SNPs loaded from reference.
27956 SNPs loaded from sample.
27990 SNPs matched in position and used for training.
Traceback (most recent call last):
File "/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py", line 1469, in _get_list_axis
return self.obj._take_with_is_copy(key, axis=axis)
File "/opt/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py", line 3363, in _take_with_is_copy
result = self.take(indices=indices, axis=axis)
File "/opt/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py", line 3351, in take
indices, axis=self._get_block_manager_axis(axis), verify=True
File "/opt/anaconda3/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 1440, in take
indexer = maybe_convert_indices(indexer, n)
File "/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexers.py", line 250, in maybe_convert_indices
raise IndexError("indices are out-of-bounds")
IndexError: indices are out-of-bounds

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/dengcm/soft/DEEP-HLA//train.py", line 375, in
main()
File "/home/dengcm/soft/DEEP-HLA//train.py", line 371, in main
train(args)
File "/home/dengcm/soft/DEEP-HLA//train.py", line 189, in train
ref_concord_phased = ref_phased.iloc[np.where(concord_snp)[0]]
File "/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py", line 879, in getitem
return self._getitem_axis(maybe_callable, axis=axis)
File "/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py", line 1487, in _getitem_axis
return self._get_list_axis(key, axis=axis)
File "/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py", line 1472, in _get_list_axis
raise IndexError("positional indexers are out-of-bounds") from err
IndexError: positional indexers are out-of-bounds
`
Looking forward to your kindly reply!

hot to get .model.json file

hello @tatsuhikonaito :
i am trying to use DEEP-HLA ,but i do not konw how to build a model configuration file as {MODEL}.model.json.
i see info as including grouping of HLA genes, window size of SNV (Kb), and parameters of neural networks.but it is really confusing. how to build a configuration file for a reference panel diff from Pan-Asian_REF ?
can u add some explain in git?or maybe it can also use a script to get like make_hlainfo.py ?

Obtain the imputed SNP VCF

Hello,

I have used DEEP-HLA for HLA imputation, and I noticed that the current output only includes the HLA typing results. I am interested in obtaining the imputed SNP VCF file after the imputation process. Could you please guide me on how to generate or obtain the imputed SNP VCF file using DEEP-HLA?

I appreciate your help. Thank you in advance!

Best regards,
Troye

Preparation of input data

I am currently using DEEP-HLA for the first time. I have variants data analyzed by general pipeline for Whole-exome sequencing data.
Could you guide me on how to create the input data? And I would also like to ask how to prepare the input data for test run.

I tried the following command as test run.

$ MODEL_DIR=./DEEP-HLA/Pan-Asian
$ python ./DEEP-HLA/impute.py --sample 1958BC --model ${MODEL_DIR}/Pan-Asian_REF.model.json --hla ${MODEL_DIR}/Pan-Asian_REF.hla.json --model-dir ${MODEL_DIR}/model --out test

I prepared a sample data "1958BC.bim" from https://software.broadinstitute.org/mpg/snp2hla/.
And Then, I got a error message "No such file or directory: '1958BC.bgl.phased'". There is no such a file in the downloaded snp2hla folder and DEEP-HLA/Pan-Asia folder.

Thanks,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.