rabadanlab / arcashla Goto Github PK
View Code? Open in Web Editor NEWFast and accurate in silico inference of HLA genotypes from RNA-seq
License: GNU General Public License v3.0
Fast and accurate in silico inference of HLA genotypes from RNA-seq
License: GNU General Public License v3.0
"arcas reference --version" or "--update" is looking for dat/IMGTHLA, which I could not find in the most recent repo.
Hi,
when attempting to switch to IMGT reference 3.24.0 for testing the installation I get the following message:
Traceback (most recent call last):
File "./scripts/reference.py", line 481, in <module>
build_fasta()
File "./scripts/reference.py", line 363, in build_fasta
utrs, exons, final_exon_length) = process_hla_dat()
File "./scripts/reference.py", line 127, in process_hla_dat
lines = file.read().splitlines()
File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 6249748: ordinal not in range(128)
I attempted to grab the reference from the git history as well but I get the same error until around version 3.25.0
./arcasHLA reference --commit 70055402cf42eef5e0d13a1d2ef3b93de0c020f9
Also if trying to pull the latest version by commit (3.35.0) I see a different error
Traceback (most recent call last):
File "./scripts/reference.py", line 467, in <module>
build_fasta()
File "./scripts/reference.py", line 386, in build_fasta
seq_out, allele_idx, lengths = complete_records(cDNA, other)
File "./scripts/reference.py", line 308, in complete_records
offset = i + 1
which occurs until I get to around 3.33.0 any ideas what happening?
relevant code:
Line 95 in a138383
Here is the docs sections from kallisto regarding single-ended reads and the -l
and -s
flags:
In the case of single-end reads, the -l option must be used to specify the average fragment length. Typical Illumina libraries produce fragment lengths ranging from 180β200 bp but itβs best to determine this from a library quantification with an instrument such as an Agilent Bioanalyzer.
Note that analyze_reads
is determining the read lengths/sds, not the fragment lengths/sds (which must be determined experimentally e.g. using an Agilent Bioanalyzer). Read length is related to the number of cycles on the sequencer and so my expectation is that using the read length standard deviation implies much less variation than is true (in fact the current code for arcasHLA
prevents -s 0
which happens because you are using read length). Also read length and fragment length can and often do differ substantially.
I would suggest exposing these arguments to the arcasHLA
call, setting the default to -l 200 -s 20
, and warning the user if the defaults are used. These defaults or similar are also used in a few other repos (Curse, drugseqr - personal repo, salmon) and similar suggestions have been made elsewhere. I've submitted a PR if you are interested.
One advantage is the time and memory usage is dramatically reduced. Might be nice to redo the analysis you presented in your paper to make sure it doesn't substantially change things. Thank you for the great package!
Hi,
We ran arcasHLA on multiple samples. Only one sample failed. The error is posted bellow.
Best,
Astrid
[genotype] Pairs by % explained reads:
allele pair explained
C04:01:81, C04:339:02 87.94%
C04:04:02, C04:339:02 87.94%
Traceback (most recent call last):
File "./scripts/genotype.py", line 917, in
args.zygosity_threshold)
File "./scripts/genotype.py", line 585, in genotype_gene
zygosity_threshold)
File "./scripts/genotype.py", line 490, in predict_genotype
max_prior = max(pair_prior.values())
ValueError: max() arg is an empty sequence
Hi,
I am trying to replicate the test as explained in the README. The genotyping gives the expected results in test.genotype.json, however when I proceed to the partial typing I get the following error:
[alignment] Analyzing read length
[alignment] Pseudoaligning with Kallisto:
kallisto pseudo -i arcasHLA-0.2.0/scripts/../dat/ref/hla_partial.idx -t 8 -o /tmp/arcas_c637d791-c0d7-4f56-a472-1f5c0b0a41d0/ test/output/test.extracted.1.fq.gz test/output/test.extracted.2.fq.gz
Error: kallisto index file not found arcasHLA-0.2.0/scripts/../dat/ref/hla_partial.idx
[alignment] Processing pseudoalignment
Traceback (most recent call last):
File "arcasHLA-0.2.0/scripts/partial.py", line 485, in <module>
args.threads, True)
File "arcasHLA-0.2.0/scripts/align.py", line 277, in get_alignment
exon_combos)
File "arcasHLA-0.2.0/scripts/align.py", line 171, in process_partial_counts
with open(count_file,'r', encoding='UTF-8') as file:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/arcas_c637d791-c0d7-4f56-a472-1f5c0b0a41d0/pseudoalignments.tsv'
Indeed the hla_partial.idx file is missing. Should this file have been downloaded with the rest when executing ./arcasHLA reference --version 3.24.0
or is the user expected to create it?
Thanks
Hi, I can not choose reference version using this command:
./arcasHLA reference --version 3.24.0
The error information is here:
Traceback (most recent call last):
File "/cluster/home/zheyang/biosoft/arcasHLA-master/scripts/reference.py", line 548, in <module>
build_fasta()
File "/cluster/home/zheyang/biosoft/arcasHLA-master/scripts/reference.py", line 416, in build_fasta
utrs, exons, final_exon_length) = process_hla_dat()
File "/cluster/home/zheyang/biosoft/arcasHLA-master/scripts/reference.py", line 134, in process_hla_dat
with open(hla_dat, 'r', encoding='UTF-8') as file:
FileNotFoundError: [Errno 2] No such file or directory: '/cluster/home/zheyang/biosoft/arcasHLA-master/scripts/../dat/IMGTHLA/hla.dat'
Also, it also can not update reference.
./arcasHLA reference --updata
usage: arcasHLA reference [options]
arcasHLA reference: error: unrecognized arguments: --updata
Hi, could you create a new release? I'm creating a bioconda package right now and it would be nice to have the latest and greatest version up there...
correct command line : ./arcasHLA reference --rebuild -v
Encountered a problem using the recommended version of samtools 1.19. Updated to version 1.3.1 and tool ran successfully.
FileNotFoundError: [Errno 2] No such file or directory: '/arcas/scripts/../dat/ref/hla.p'
Get this error when running arcasHLA genotyp
I've aligned my samples against the hg38 human reference provided by GATK, which includes alt chromosomes such as the HLA loci. Nonetheless, when I run arcasHLA, it only extracts data from chromosome 6, although I saw in the code that it should also extract from those alt chromosomes... for some reason it doesn't seem to be finding them?
Thanks for the great tool! I ran it on a bunch of RNA-seq samples and so far it seems to be working.
Suppose I receive the following result from arcasHLA:
{
"A": ["A*02:816", "A*03:01:77"],
"B": ["B*83:01"],
"DQB1": ["DQB1*03:03:02", "DQB1*03:94"]
}
I'd like to ask if you might be able to help me understand the answers to a few questions:
Is it true that A*02:816
is on the same haplotype as DQB1*03:03:02
? In other words, are the alleles of different genes in phase with each other?
Does arcasHLA claim that we have these two chromosomes?
chromosome A: A*02:816 DQB1*03:03:02
chromosome B: A*03:01:77 DQB1*03:94
Alternatively, are the genotypes unphased, so arcasHLA does not distinguish the above situation from the following situation?
chromosome A: A*02:816 DQB1*03:94
chromosome B: A*03:01:77 DQB1*03:03:02
What is the genotype for HLA-B? Is it homozygous with two copies of B*83:01
? Or, is it actually heterozygous with one copy of B*83:01
and one copy of unknown genotype?
Is arcasHLA claiming that this is the situation?
chromosome A: B*83:01
chromosome B: B*83:01
Or this?
chromosome A: B*83:01
chromosome B: Unknown
Does arcasHLA offer any way to assess the probability or the confidence for the genotype calls?
I can see this output in the genes.json
file:
"A": [7378.0, 1202, 0.10358776417015307]
Which corresponds to this output in the genotype.log
file:
[alignment] Observed HLA genes:
gene abundance read count classes
HLA-A 10.36% 7378 1202
HLA-B 6.80% 4807 989
HLA-C 10.30% 7354 1090
I will try to use these numbers to assess the confidence, but I wonder if you are able to get some calibrated p-value or probability from the kallisto results?
Do you have any sense of what the read pileup coverage in a genome browser should look like in order to get accurate genotype calls?
Thank you!
I understand, that input is RNA-seq bam, reads are extracted as .fq and then genotyped. However, is it also possible to use (not extracted) raw fastq files?
I tried using this paired end sample: https://www.ncbi.nlm.nih.gov/sra/ERX002711[accn]
and i get this error:
manager@bl8vbox[arcasHLA] ./arcasHLA genotype test/ERR009111_1.fastq.gz test/ERR009111_2.fastq.gz -g A,B,C,DPB1,DQB1,DQA1,DRB1 test/output/ERR009111
usage: arcasHLA genotype [options] FASTQs or alignment.p file
arcasHLA genotype: error: The format of test/ERR009111_1.fastq.gz is invalid.
Dear arcasHLA developer,
I was following the "test" section in the README and did "reference" and "extract". However, when running "genotype" I get warnings and not the expected result. See below:
manager@bl8vbox[arcasHLA] ./arcasHLA reference --version 3.24.0 [ 9:34am]
manager@bl8vbox[arcasHLA] ./arcasHLA extract test/test.bam -o test/output --paired -v
[extract] Extracting reads from test/test.bam
[extract] Extracting chromosome 6:
samtools view -H -@1 test/test.bam -o /tmp/test.hla.sam
[extract] Extracting chromosome 6:
samtools view -@1 -f 2 test/test.bam 6 >> /tmp/test.hla.sam
[extract] Converting SAM to BAM:
samtools view -Sb -@1 /tmp/test.hla.sam > /tmp/test.hla.bam
[extract] Sorting bam:
samtools sort -n -@1 /tmp/test.hla.bam -o /tmp/test.hla.sorted.bam
[extract] Converting bam to fastq:
bedtools bamtofastq -i /tmp/test.hla.sorted.bam -fq test/output/test.extracted.1.fq -fq2 test/output/test.extracted.2.fq
manager@bl8vbox[arcasHLA] ./arcasHLA genotype test/output/test.extracted.1.fq.gz test/output/test.extracted.2.fq.gz -g A,B,C,DPB1,DQB1,DQA1,DRB1 -o test/output -v
[log] Date: 2019-03-13
[log] Sample: test
[log] Input file(s): test/output/test.extracted.1.fq.gz, test/output/test.extracted.2.fq.gz
[log] Reference: cfb6db3de7f3a7e76d88467271541ff0cc8fbca1
[alignment] Analyzing read length
./scripts/genotype.py:82: UserWarning: genfromtxt: Empty input file: "/tmp/test.reads.txt"
read_lengths = np.genfromtxt(reads_file)
/home/manager/.local/lib/python3.6/site-packages/numpy/core/fromnumeric.py:3118: RuntimeWarning: Mean of empty slice.
out=out, **kwargs)
/home/manager/.local/lib/python3.6/site-packages/numpy/core/_methods.py:85: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
/home/manager/.local/lib/python3.6/site-packages/numpy/core/_methods.py:140: RuntimeWarning: Degrees of freedom <= 0 for slice
keepdims=keepdims)
/home/manager/.local/lib/python3.6/site-packages/numpy/core/_methods.py:110: RuntimeWarning: invalid value encountered in true_divide
arrmean, rcount, out=arrmean, casting='unsafe', subok=False)
/home/manager/.local/lib/python3.6/site-packages/numpy/core/_methods.py:132: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
[alignment] Pseudoaligning with Kallisto:
kallisto pseudo -i dat/ref/hla.idx -t 1 -o /tmp/test/ test/output/test.extracted.1.fq.gz test/output/test.extracted.2.fq.gz
manager@bl8vbox[arcasHLA] [ 9:36am]
./arcasHLA/arcasHLA merge -i genotypeoutputs/ -o mergedoutput --run test
Traceback (most recent call last):
File "/myhome/arcasHLA/scripts/merge.py", line 173, in
'genes')
File "/myhome/arcasHLA/scripts/merge.py", line 92, in process_count
lines = lines.split('-'*80)[2].split('\n')
IndexError: list index out of range
The steps to install are quite involved -- may I request a Dockerfile recipe to be hosted on this official repository?
There are a few existing ones already, with some already built and hosted on dockerhub:
But these will eventually lag behind. Some edited version inspired by these could be officially hosted here and verified for the most recent release of arcasHLA; it would be much easier for users to test the tool on their own data and be confident that the latest arcasHLA is being used. You can trigger an autobuild on dockerhub pointing to new updates to this recipe.
Ideally, for academic settings with HPC environments, the container should also run fine after conversion from docker to singularity, but this is more minor. Happy to volunteer to test.
I used two alignment tools, STAR and HISAT2.
bam file that is aligned by STAR worked well, I got HLA types using arcasHLA.
But I couldn't get the chr6 extracts from bam file aligned by HISAT2.
error message is
[extract] Error: unable to index bam file.
Is this because HISAT2 bam file that I made isn't correctly aligned?
command line is
arcasHLA extract -t 10 --paired --unmapped --log 1607370_log.out --o /home/sunghyepark_lab/test/test_files/arcas /home/sunghyepark_lab/test/test_files/RNA/rawData/HISAT2_aligned/1607370_RT.bam
I attached log file.
I cannot reproduce the test case as described in README.md
. Here's how I've run arcas HLA:
# Install requirements
conda create -n arcas-hla-deps coreutils 'bedtools>=2.27.1' biopython git-lfs 'kallisto>=0.44.0' numpy pandas pigz 'python>=3.6.1' 'samtools>=1.9' scipy
conda activate arcas-hla-deps
# Get latest release
curl -L https://github.com/RabadanLab/arcasHLA/archive/v0.2.0.tar.gz | tar zx
# Obtain reference version required for tests
./arcasHLA-0.2.0/arcasHLA reference --version 3.24.0
# Extract reads
./arcasHLA-0.2.0/arcasHLA extract arcasHLA-0.2.0/test/test.bam -o arcasHLA-0.2.0/test/output --paired -t 8 -v
# Genotyping
./arcasHLA-0.2.0/arcasHLA genotype arcasHLA-0.2.0/test/output/test.extracted.1.fq.gz arcasHLA-0.2.0/test/output/test.extracted.2.fq.gz -g A,B,C,DPB1,DQB1,DQA1,DRB1 -o arcasHLA-0.2.0/test/output -t 8 -v
This is the output I get:
{
"A": ["A*03:01:01", "A*01:01:01"],
"B": ["B*39:39:01", "B*07:02:01"],
"C": ["C*01:02:01", "C*08:01:01"],
"DPB1": ["DPB1*02:01:02", "DPB1*14:01:01"],
"DQA1": ["DQA1*05:03:01", "DQA1*02:01:01"],
"DQB1": ["DQB1*06:04:01", "DQB1*02:02:01"],
"DRB1": ["DRB1*03:02:01", "DRB1*10:01:01"]
}
which mismatches on DQB1 and DRB1.
I also tested this on the master branch.
According to the sample.extract.log, the extract --unmapped --paired
appears to be using:
samtools view -f 12 sample.bam 6
Indeed this appears to be the way the command is constructed.
Why require a region of chromosome 6 in the case of both unmapped reads? Naively, this seems contradictory (an unmapped read shouldn't have a chromosome, right?). I tested this separately on my bam and this exact filter scheme actually causes the output to be empty. In contrast, removing the chr 6 region requirement shows output with both reads unmapped and the chromosome as "*". Would any reads ever be marked as unmapped and to chr6, or is it an impossible to satisfy filter?
Why require both reads to be unmapped (the bitwise 12)? What about cases where one read is mapped to chr6 while the other is unmapped?
I have not checked the logic for single-end input
These may be worth clarifying in the manual/readme so the user is aware for these situations.
time samtools view -@8 -f 12 sample.bam 6 > sample.hla.chr6.bothunmapped.sam &
time samtools view -@8 -f 12 sample.bam > sample.hla.any.bothunmapped.sam &
time samtools view -@8 -f 12 sample.bam '*' > sample.hla.asterisk.bothunmapped.sam &
Hi, can you please update the samtools version to say 1.9 instead of 1.19. I found out I was having this same issue from reading the closed issue #5. Thanks
While I am able to do
./arcasHLA reference --version 3.24.0
running ./arcasHLA reference --update
gives me an error
Traceback (most recent call last):
File "./scripts/reference.py", line 467, in
build_fasta()
File "./scripts/reference.py", line 386, in build_fasta
seq_out, allele_idx, lengths = complete_records(cDNA, other)
File "./scripts/reference.py", line 308, in complete_records
offset = i + 1
UnboundLocalError: local variable 'i' referenced before assignment
Dear developers:
Thanks for the excellent tools.
After getting genotype from arcasHLA , I was wondering that is that possible to get allele specific expression using these genotypes ?
The actual question to me is that I was trying to quantify HLA allele expression from single cell Smartseq2 RNA data.
So far, I got the HLA genotype from bulk RNA seq.
Then is that any possible way to apply these HLA genotype to quantify specific HLA allele expression for scRNA data (assuming these single cell data are from the same sample) ?
Thank you
I have been using arcasHLA for bulk RNA and it has been great. Can arcasHLA be applied to single cell RNA sequencing data too
Hi,
We have installed arcasHLA on our server. It works fine. However, the software is using relative paths to find scripts located in the script directory. Same thing for the dat directory. This is an issue as we are using module to run software on the server. The users load arcasHLA module to have access to the path to arcasHLA executable but are not expected to run the code in the software directory. We temporary solved the issue by adding symbolic link to the script and dat directories; however, this is only a temporary patch.
It would be highly appreciated if you could change the usage of hardcoded relative path so that the software could comply to module installation.
On another note, the fact that arcasHLA is calling HLA class II is highly welcomed.
Thanks,
Astrid
Hi there,
Thanks for this tool. Is it possible to use arcasHLA to type DPA1? The documentation contains output for DPB1, but ti doesn't seem to be output for the alpha chain.
Kind regards,
Fong
I have a rather large set of single-end data that I'd like to perform HLA typing for and would love to use this tool but I am hung up on one thing. arcasHLA uses Kallisto and for single-end this requires fragment length and std deviation of fragment length. Quickly looking through I see that this is estimated as read length and std deviation of read length in this case. Will this affect results down stream? From what I have found it seems that Kallisto can be rather picky when it comes to these parameters. This is most likely not an issue for a vast majority as it seems paired-end is more common :^(
Is this tool able to type MICA? Running
arcasHLA genotype test/output/test.extracted.1.fq.gz test/output/test.extracted.2.fq.gz -g MICA -o test/output -t 8 -v
results in
arcasHLA genotype: error: The gene list MICA is invalid.
Hi,
I do not seem to be able to install and run arcasHLA.
I cloned the repository and when I try:
./arcasHLA reference --update
I get the following error:
FileNotFoundError: [Errno 2] No such file or directory: '/home/jose.fernandez.navarro/shared/arcasHLA/scripts/../dat/IMGTHLA/wmda/hla_nom_p.txt'
If I git clone IMGTHLA
to dat
and I try again then I get this error:
[reference] Error: dat/IMGTHLA/hla.dat empty or corrupted.
I get the same error if I try specific versions like 3.24.0 and so on...
In addition to the listed dependencies, I had to install scipy to get references.py to run. Minor problem but could be worth adding scipy to the list on the readme.
I download new merge.py file, and ran arcasHLA.
Errors that are exist before are solved I think, but new Syntax error appeared.
File "/home/sunghyepark_lab/packages/arcasHLA/scripts/merge.py", line 7
< !DOCTYPE html > <- (I add spaces at both side to write this issue)
^
SyntaxError: invalid syntax
Merged file were made like before.
Hi,
I'm trying out arcasHLA but aparently the version in bioconda doesn't work due to the git lfs update. Can you please make a new tag:ed release, and update the bioconda recipe. That would help many people a lot.
Also, updating the HLA database versions would be nice. Now it only goes to 3.34.0 (or so) while https://github.com/ANHIG/IMGTHLA is at 3.43.0
Thanks
Hi,
When running genotype I get the following error:
arcasHLA genotype test.extracted.1.fq.gz test.extracted.2.fq.gz
Traceback (most recent call last):
File "/hpc/packages/minerva-centos7/arcashla/0.2.0/arcasHLA-0.2.0/scripts/genotype.py", line 674, in
check_ref()
File "/hpc/packages/minerva-centos7/arcashla/0.2.0/arcasHLA-0.2.0/scripts/reference.py", line 80, in check_ref
build_convert(False)
File "/hpc/packages/minerva-centos7/arcashla/0.2.0/arcasHLA-0.2.0/scripts/reference.py", line 462, in build_convert
p_group = process_hla_nom(hla_nom_p)
File "/hpc/packages/minerva-centos7/arcashla/0.2.0/arcasHLA-0.2.0/scripts/reference.py", line 232, in process_hla_nom
for line in open(hla_nom, 'r', encoding='UTF-8'):
FileNotFoundError: [Errno 2] No such file or directory: '/hpc/packages/minerva-centos7/arcashla/0.2.0/arcasHLA-0.2.0/scripts/../dat/IMGTHLA/wmda/hla_nom_p.txt'
The log shows that the error is at git lsf clone command
git lfs clone https://github.com/ANHIG/IMGTHLA.git /hpc/packages/minerva-centos7/arcashla/0.2.0/arcasHLA-0.2.0/scripts/../dat/IMGTHLA/
git: 'lfs' is not a git command. See 'git --help'.
How I can deal with this error?.
Thank you!
$ ./arcasHLA reference --update
Traceback (most recent call last):
File "/.../arcasHLA/scripts/reference.py", line 534, in <module>
build_convert(False)
File "/.../arcasHLA/scripts/reference.py", line 462, in build_convert
p_group = process_hla_nom(hla_nom_p)
File "/.../arcasHLA/scripts/reference.py", line 232, in process_hla_nom
for line in open(hla_nom, 'r', encoding='UTF-8'):
FileNotFoundError: [Errno 2] No such file or directory: '/.../arcasHLA/scripts/../dat/IMGTHLA/wmda/hla_nom_p.txt'
$ ./arcasHLA reference --rebuild
Traceback (most recent call last):
File "/.../arcasHLA/scripts/reference.py", line 539, in <module>
build_convert()
File "/.../arcasHLA/scripts/reference.py", line 462, in build_convert
p_group = process_hla_nom(hla_nom_p)
File "/.../arcasHLA/scripts/reference.py", line 232, in process_hla_nom
for line in open(hla_nom, 'r', encoding='UTF-8'):
FileNotFoundError: [Errno 2] No such file or directory: '/.../arcasHLA/scripts/../dat/IMGTHLA/wmda/hla_nom_p.txt'
Hi,
Wondering how to interpret *genes.json output (see below). Thanks!
sm
{"A": [9104.0, 3628, 0.18921475974399068], "B": [6499.0, 2754, 0.13618954185635945], "C": [6877.0, 2593, 0.14254001726640184], "DMA": [184.0, 2, 0.005342202898901628], "DMB": [246.0, 4, 0.007088184
725790673], "DOA": [17.0, 6, 0.0005152037872793143], "DPA2": [1.0, 1, 3.113301114046414e-05], "DPB1": [290.0, 119, 0.008517302674554006], "DQB1": [129.0, 70, 0.0036343754745788493], "DRA": [1396.0,
7, 0.04164367848847119], "DRB1": [536.0, 55, 0.015270644795199345], "DRB5": [216.0, 2, 0.006153841932393766], "E": [9707.0, 1359, 0.20435292065495925], "F": [1112.0, 25, 0.024376938375165955], "HF
E": [63.0, 1, 0.0013731531245993252], "K": [23.0, 5, 0.0004802117427420722], "L": [2298.0, 2, 0.04784808621111001], "T": [2586.0, 1, 0.12857038272586738], "U": [1.0, 1, 0.00011643110798959293], "W"
: [57.0, 8, 0.0013897097633116797], "DRB4": [158.0, 16, 0.004501421413510255], "DPA1": [746.0, 158, 0.021742133953775635], "DOB": [114.0, 13, 0.003164886468271855], "DQA1": [199.0, 68, 0.0059131236
14617295], "DQA2": [1.0, 1, 2.9714189018177362e-05]}
I was trying to run arcasHLA but received this error:
Traceback (most recent call last):
File "/home/yzx5896/.conda/envs/arcasHLA/bin/scripts/genotype.py", line 674, in
check_ref()
File "/home/yzx5896/.conda/envs/arcasHLA/bin/scripts/reference.py", line 80, in check_ref
build_convert(False)
File "/home/yzx5896/.conda/envs/arcasHLA/bin/scripts/reference.py", line 462, in build_convert
p_group = process_hla_nom(hla_nom_p)
File "/home/yzx5896/.conda/envs/arcasHLA/bin/scripts/reference.py", line 232, in process_hla_nom
for line in open(hla_nom, 'r', encoding='UTF-8'):
FileNotFoundError: [Errno 2] No such file or directory: '/home/yzx5896/.conda/envs/arcasHLA/bin/scripts/../dat/IMGTHLA/wmda/hla_nom_p.txt'
Where could I get the IMGTHLA/wmda/hla_nom_p.txt' file? I would appreciate your help.
Hi I am interested in obtaining RPKM like data from arcasHLA output. I noticed that one of the output fines(*samplename.genes.json) has the following output format : HLA-type: [read count, equivalence class, ]
I was wondering if an estimate of expression for a specific allele type can be obtained by:
RPKM = (Read count x equivalence class)/1,000,000
ex: HLA-A:[20000,10000,]
Can HLA-A RPKM = (20000x10000)/1000000 = 200 ?
Can I use arcasHLA for cancer scRNA-seq data?
$ ./arcasHLA reference --version 3.24.0
[reference] Error: dat/IMGTHLA/hla.dat empty or corrupted.
It appears that the hash for version 3.24.0 in parameters.p is incorrect. Searching the IMGTHLA repo, I cannot find a commit with hash of c5acf7a4342869351b2382b1cc1d1b5763e7e04e: https://github.com/ANHIG/IMGTHLA/search?q=hash%3Ahash%3Ac5acf7a4342869351b2382b1cc1d1b5763e7e04e&type=Commits.
I did find what appears to be a valid commit for 3.24.0 and attempted to use the hash for it, 4a0401af6be02ca688adeef3f63f5e55288d14fe; however, that fails with the same error message.
Hi,
I'm running into this issue when running the docker.
[log] Date: 2020-11-13
[log] Sample: test
[log] Input file(s): test.extracted.1.fq.gz
test.extracted.2.fq.gz
Traceback (most recent call last):
File "/path/genotype.py", line 697, in
with open(hla_p, 'rb') as file:
FileNotFoundError: [Errno 2] No such file or directory: '/path/../dat/ref/hla.p'
I would greatly appreciate any solution to this error?,
Thank you so much,
Hello,
Thank you for making your tool available.
In your manuscript, you state "We established the HLA genotyping ground truth for the Virome dataset", but I could not find your HLA genotype calls anywhere.
Similarly, there is no URL to the manhattan virome data / this data is not in public repos.
I would be grateful if you could please share those with us so we can reproduce / compare to your results
Thank you!
Can you provide an additional merge.py file which can generate a tsv table with HLA locus relative abundance?
Hello,
in my attempt to run 'arcasHLA partial', I got an error because there was no 'database/parameters.p' file as required in line #346 of 'scripts/partial.py'. I tried replacing this with 'dat/info/parameters.p', then the program runs for while until it crashes again with the following message:
File "./scripts/partial.py", line 528, in <module>
args.keep_files)
ValueError: not enough values to unpack (expected 3, got 2)
I would appreciate if you could look into this.
Thank you
Hi I had a similar problem with an issue that was closed without an resolution!
So this is an attempt to reopen it.
The following is my command:
arcasHLA extract --unmapped --log my.log --outdir my_outdir my.bam
I got an error:
[extract] Error: unable to index bam file.
May I know how I can resolve this?
Thank you!
This is python why cant I install this via pip
or conda
?
Hello,
would it be possible for your tool to type also the DPA1 allele?
Kind regards,
Matthias
When I run your tool, I get this error
(py36) [rwarren@hpce704 arcasHLA]$ ./arcasHLA reference --version 3.24.0
[reference] Error: dat/IMGTHLA/hla.dat empty or corrupted.
turns out the file hla.dat isn't empty:
(py36) [arcasHLA]$ cat ./dat/IMGTHLA/hla.dat
version https://git-lfs.github.com/spec/v1
oid sha256:d6bca31aedfe138f603eb605550d6d2bd5f5206b7cad2646cd191a56d12a2dfc
size 163633161
I tried cloning the IMGTHLA into dat, as suggested in #28
but then, the ./ref/hla.p is missing
thoughts?
Thank you!
Hi I'm having some trouble merging *genotype.json and *genes.json results via the merge function, can you please point me in the right direction?
[x@y out]$ ll *genotype*json | wc -l
1640
[x@y out]$ ll *genes*json | wc -l
1640
[x@y out]$ mkdir -p merged_results; arcasHLA merge --run test -i . -o merged_results
Traceback (most recent call last):
File "/hpc/packages/minerva-centos7/arcashla/0.2.0/arcasHLA-0.2.0/scripts/merge.py", line 173, in <module>
'genes')
File "/hpc/packages/minerva-centos7/arcashla/0.2.0/arcasHLA-0.2.0/scripts/merge.py", line 92, in process_count
lines = lines.split('-'*80)[2].split('\n')
IndexError: list index out of range
[x@y out]$
Hi,
Is it possible to build a reference using MHC alleles from other species? I have the allele sequences, and I wonder which files should I modify in arcasHLA/dat/.
Thanks
Note that default behaviour changed with #44 (breaking change).
Old behaviour: set as single-end if one fastq file supplied, paired-end if two supplied.
Change with #44: default is paired-end unless --single
flag present.
Rationale: can be two fastq.gz files for single-end and e.g. 4 for paired-end (see kallisto manual)
Let me know if you would like to change it back to the old behaviour. It got added to #44 by mistake when I was updating it to reflect my own preference.
Hi,
Thank you for developing arcasHLA. I have a question regarding the requirements for installation of arcasHLA. Will the following dependencies and their versions work (or do they need to be the ones listed in your Github):
kallisto/0.46.2
bedtools2/2.29.2,
pigz 2.4.1, samtools 1.10
python/3.8.0,
Looking forward to hearing from you.
Thank you.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.