anhig / imgthla Goto Github PK
View Code? Open in Web Editor NEWGithub for files currently published in the IPD-IMGT/HLA FTP Directory hosted at the European Bioinformatics Institute
Home Page: http://www.ebi.ac.uk/ipd/imgt/hla/
License: Other
Github for files currently published in the IPD-IMGT/HLA FTP Directory hosted at the European Bioinformatics Institute
Home Page: http://www.ebi.ac.uk/ipd/imgt/hla/
License: Other
Is there documentation for the txt alignment format (for example: A_gen)?
Thank you for hosting this on github!
Line 106 of the Deleted_alleles.txt file (HLA00615,DQA1*05013,To take account of coding polymorphism in the leader peptide, sequence renamed DQA1*05:05 (April 1998)
) includes a comma in the Description field.
This results in an extra column being added for this line when parsing the file as a .csv document.
Could this comma be removed? It doesn't change the meaning of the entry.
For the DPB1 alleles, the alignmentreference element attributes have an empty alleleid attribute, and the allelename attribute contains "DPB101:01:01", but the allele element in the file has the extended name "DPB101:01:01:01" so the reference is not made.
DRBx alleles also have an empty alleleid alignmentreference attribute, but in these cases the DRB1*01:01:01 allele is named consistently
john
In the hla.xml for release 3.33.0, the names of the DRB4*03:01N intron features do not match the feature order numbers for other DRB intron features.
Here the the intron elements for DRB4*03:01N:
<feature id="914.5" order="5" featuretype="Intron" name="Intron 1">
<SequenceCoordinates start="1" end="2684" />
</feature>
<feature id="914.7" order="7" featuretype="Intron" name="Intron 2">
<SequenceCoordinates start="2967" end="3670" />
</feature>
<feature id="914.9" order="9" featuretype="Intron" name="Intron 3">
<SequenceCoordinates start="3782" end="4255" />
</feature>
<feature id="914.11" order="11" featuretype="Intron" name="Intron 4">
<SequenceCoordinates start="4280" end="4581" />
</feature>
Here are the corresponding intron elements for other DRB alleles (e.g., DRB4*01:03:01:03):
<feature id="6603.3" order="3" featuretype="Intron" name="Intron 1">
<SequenceCoordinates start="414" end="9976" />
</feature>
<feature id="6603.5" order="5" featuretype="Intron" name="Intron 2">
<SequenceCoordinates start="10247" end="12983" />
</feature>
<feature id="6603.7" order="7" featuretype="Intron" name="Intron 3">
<SequenceCoordinates start="13266" end="13969" />
</feature>
<feature id="6603.9" order="9" featuretype="Intron" name="Intron 4">
<SequenceCoordinates start="14081" end="14554" />
</feature>
<feature id="6603.11" order="11" featuretype="Intron" name="Intron 5">
<SequenceCoordinates start="14579" end="14880" />
</feature>
Shouldn't all DRB Intron 1 sequences be intron order 3, and all intron sequences of intron order 5 be intron 2?
Hi all,
This extra G is causing us some issues.
allele id="HLA18583" name="HLA-C02:02:37" dateassigned="2018-03-29"
hla_g_group status="C02:10:01GG"
hla_p_group status="C*02:02P"
Thank you!
Marney
One base pair before point mutation 968G>T, the sequences seem to diverge. The mutation (T
) is higlighted:
From alignment file (that I think is correct):
GGAGAACGGTAA...
vs the fasta section:
GGAGAACGACCC...
In the 3.34.0 HLA-B protein alignment, the HLA-B*13:120Q
peptide sequence is 11 amino acids longer than the reference, but these positions are not accounted for in the reference with .
symbols. As a result, even though the last sequence block for all other alleles only include 69 amino-acid positions, the last 11 amino acids of the HLA-B*13:120Q
sequence appear in a separate block, as below.
This also occurs for the B_nuc.txt alignment, as below.
The same thing is also true for the C*04:09N
allele in the C_prot.txt and C_nuc.txt alignments.
It seems like these extra peptide positions should be included in the reference sequences as sequence indels.
There are two new alleles where "dateassigned" is blank in the hla.xml file, DQA1 05:05:01:20 (HLA22679) and DRB4 01:03:01:10 (HLA22663). The dates are listed appropriately in the hla_nom file.
There is an inconsistency between hla_nom and hla.xml for HLA00886, where the xml file has the allele name as v2 DRB3 010101 while the nom file has v3 DRB3 01:01:01. Could you explain this for us?
The hla.xml file has a G group listed as C*07:726N:01G while nom_g lists it as 07:726:01G. Could you please look into this one too?
There is an inconsistency between nom_p and hla.xml regarding DQA1 05:05:01:20. This allele is listed as part of DQA1*05:01P in nom_p but has no p group status in the xml file.
Any help on the above is greatly appreciated. Thanks!
Hi James,
During the processing of a bunch of new alleles, we ran into an issue with C*17:01:01:02
The hla.dat file we pulled from the git repository has Exon 5 marked as "pseudo" while on the IPD-IMGT/HLA website it is not marked as such. A cursory look at the history of the sequence does not indicate any recent changes. We were wondering if this was intentional and something we should take into account in our work flow ?
Cheers,
Vineeth
Hi James,
in the new release 3.33.0 of hla.dat some DRB1 sequences are invalid. See for example DRB1*13:09, the substring "y/alignment_libraries/libs/drb1345genomiclib:drb1_13:09" should not be there, i think.
FH Key Location/Qualifiers
FH
FT source 1..325
FT /organism="Homo sapiens"
FT /mol_type="genomic DNA"
FT /db_xref="taxon:9606"
FT /ethnic="Hispanic"
FT /cell_line="MJD"
FT /cell_line="NT01111"
FT CDS <1..270
FT /codon_start=1
FT /partial
FT /gene="HLA-DRB1"
FT /allele="HLA-DRB113:09"
FT /product="MHC Class II HLA-DRB113:09 sequence"
FT /translation="RFLEYSTSECHFFNGTERVRFLDRYFHNQEENVRFDSDVGEFRAV
FT TELGRPDAEYWNSQKDILEQARAAVDTYCRHNYGVVESFTVQRR"
FT exon 1..270
FT /number="2"
FT UTR 271..328
SQ Sequence 325 BP; 58 A; 67 C; 100 G; 51 T; 49 other;
cacgtttctt ggagtactct acgtctgagt gtcatttctt caatgggacg gagcgggtgc 60
ggttcctgga cagatacttc cataaccagg aggagaacgt gcgcttcgac agcgacgtgg 120
gggagttccg ggcggtgacg gagctggggc ggcctgatgc cgagtactgg aacagccaga 180
aggacatcct ggagcaggcg cgggccgcgg tggacaccta ctgcagacac aactacgggg 240
ttgtggagag cttcacagtg cagcggcgag y/alignmen t_librarie s/libs/drb 300
1345genomi clib:drb1_ 13:09 325
//
Cheers,
Markus
The user manual and the HLA.Dat file appear to be out of sync. The user manual states that the DT Entry will have 3 per entry. When I look at the 3.30.0 HLA.Dat file, there are only 2 per entry.
Greetings,
Allele HLA-C02:10:01:01 has an extra 'G' <hla_g_group status="C02:10:01GG"/> in hla.xml for 3.32.0.
May the force be with you,
Marney
The following line is found in the hla.dat file for 3.21.0, 3.22.0, 3.23.0 and 3.24.0.
RA Balas A, S�nchez-Gordo F, Garcia-S�nchez F, Gomez-Zumaquero JM, Vicario JL;
This prevents these files from being properly parsed.
Here are the specific alleles that have this issue:
Release = 3210, line # = 121045, Allele = HLA-A*11:210N
Release = 3210, line # = 177260, Allele = HLA-A*26:107N
Release = 3220, line # = 125142, Allele = HLA-A*11:210N
Release = 3220, line # = 183644, Allele = HLA-A*26:107N
Release = 3230, line # = 127802, Allele = HLA-A*11:210N
Release = 3230, line # = 187727, Allele = HLA-A*26:107N
Release = 3240, line # = 129967, Allele = HLA-A*11:210N
Release = 3240, line # = 191426, Allele = HLA-A*26:107N
On line 1 of hla_ambigs.xml, the XML declaration is not followed by a newline character, so the tns:ambiguityData start-tag appears on the same line.
A newline character is not required by the XML spec, but could be a helpful aesthetic enhancement.
In the hla.dat files for 3.20.0 and 3.21.0 a join
is being used for the CDS sequence when it shouldn't be which causes parsers to fail. Here's an example:
DR EMBL; AJ427352; AJ427352.1.
XX
FH Key Location/Qualifiers
FH
FT source 1..270
FT /organism="Homo sapiens"
FT /mol_type="genomic DNA"
FT /db_xref="taxon:9606"
FT /ethnic="Caucasoid"
FT /cell_line="Barpay"
FT CDS join(1..270)
FT /codon_start=1
FT /partial
FT /gene="HLA-DRB5"
FT /allele="HLA-DRB5*01:12"
FT /product="MHC Class II HLA-DRB5*01:12 sequence"
FT /translation="RFLQQDKYECHFFNGTERVRFLHRDIYNQEEDLRFDSDVGEYRAV
FT TELGRPDAESWNSQKDFLERRRAEVDTVCRHNYGVGESFTVQRR"
Should be FT CDS 1..270
or FT CDS <1..270>
instead.
Here's a list of all the alleles that have this:
HLA01638.1 HLA-DRB5*01:11
HLA01634.1 HLA-DRB5*01:12
HLA01871.1 HLA-DRB5*01:13
HLA00927.1 HLA-DRB5*02:03
HLA00928.1 HLA-DRB5*02:04
HLA01280.1 HLA-DRB5*02:05
HLA00916.1 HLA-DRB5*01:01:02
HLA00918.2 HLA-DRB5*01:03
HLA00920.1 HLA-DRB5*01:05
HLA00921.1 HLA-DRB5*01:06
HLA00922.1 HLA-DRB5*01:07
HLA00924.1 HLA-DRB5*01:09
HLA01012.3 HLA-DRB5*01:10N
The gGroup and gGroupAllele names in hla_ambigs.xml don't use the full gene names. For example, in place of "HLA-A", they use "A". This makes them inconsistent with the allele names in hla.xml.
Below are file excerpts to further illustrate the issue.
From hla.xml:
<allele id="HLA00001" name="HLA-A*01:01:01:01" dateassigned="1989-08-01">
From hla_ambigs.xml:
<tns:gGroup name="A*01:01:01G" gid="HGG00001">
<tns:gGroupAllele name="A*01:01:01:01" alleleid="HLA00001" />
Please consider revising the gGroup and gGroupAllele names in hla_ambigs.xml to use the full gene names.
But the alignment file not renamed? The pir
and msf
files were also renamed.
Are sequences for the DPA2 pseudo gene forthcoming?
This isn't a technical issues just a consistency issue.
The alignment in the DPA1_gen.txt file for DPA1 *04:01 and *04:02 makes it appear that these alleles differ significantly in their sequence for positions 1061 to 1093, as below.
However, the sequences of these alleles are identical through these positions, and it seems like the sequence for *04:02 should only include a 3 nucleotide deletion, relative to the reference, for positions 1061 - 1063, as below.
Hello,
In the README, you note that a "zip compressed archive of all the text-format alignment files is available from the top-level directory". However, I am unable to find such a zip file. The only zip file appears to be the Alignment_Rel_3350.zip that contains the alignments from the current release.
In particular, I would like to find archive versions of the alignment files and the archive versions of the fasta files.
Can you point me in the right direction?
Thanks,
Rachel
hi,
the hla_gen.fasta from the latest version contains sequences for only 5773 alleles.
where are the other alleles? can't find DPA1*03:02 for instance.
thanks,
It is missing the final 2 bases (AG) of the 7th exon and the 8th exon (CCTGA). This is comparing against the genetic data.
For DRB1*14:13 (HLA00845) We noticed that the exon regions do not match the overall sequence length. As you can see from this snippet, the sequence length is 687 but the actual sequence listed is only 549 in length.
FT exon 1..270
FT /number="2"
FT exon 271..549
FT /number="3"
FT exon 553..663
FT /number="4"
FT exon 664..687
FT /number="5"
SQ Sequence 687 BP; 152 A; 173 C; 223 G; 139 T; 0 other;
cacgtttctt ggagtactct acgtctgagt gtcatttctt caatgggacg gagcgggtgc 60
ggttcctgga gagatacttc cataaccagg aggagaacgt gcgcttcgac agcgacgtgg 120
gggagtaccg ggcggtgacg gagctggggc ggcctagcgc cgagtactgg aacagccaga 180
aggacctcct ggagcagagg cgggccgcgg tggacaccta ctgcagacac aactacgggg 240
ttggtgagag cttcacagtg cagcggcgag tccatcctaa ggtgactgtg tatccttcaa 300
agacccagcc cctgcagcac cacaacctcc tggtctgttc tgtgagtggt ttctatccag 360
gcagcattga agtcaggtgg ttccggaatg gccaggaaga gaagactggg gtggtgtcca 420
caggcctgat ccacaatgga gactggacct tccagaccct ggtgatgctg gaaacagttc 480
ctcggagtgg agaggtttac acctgccaag tggagcaccc aagcgtgaca agccctctca 540
cagtggaat 549
Good morning again,
We noticed an inconsistency between the files. Will you correct which ever needs to be corrected, please?
allele id="HLA18836" name="HLA-DQA1*05:01:04" dateassigned="2018-04-30"
hla_g_group status="None"/
hla_p_group status="None"/
hla_nom_g.txt
DQA1*;05:01:01:01/05:01:01:02/05:01:01:03/05:01:04/05:03:01:01/05:03:01:02/05:05:01:01/05:05:01:02/05:05:01:03/05:05:01:04/05:05:01:05/05:05:01:06/05:05:01:07/05:05:01:08/05:05:01:09/05:05:01:10/05:06:01:01/05:06:01:02/05:07/05:08/05:09/05:11;05:01:01G
DQA1*;05:01:01:01/05:01:01:02/05:01:01:03/05:01:02/05:01:04/05:03:01:01/05:03:01:02/05:05:01:01/05:05:01:02/05:05:01:03/05:05:01:04/05:05:01:05/05:05:01:06/05:05:01:07/05:05:01:08/05:05:01:09/05:05:01:10/05:06:01:01/05:06:01:02/05:07/05:08/05:09/05:11;05:01P
Thank you!
May the force be with you,
Marney
hugogenename is new attribute in HLA locus node, but does not exist in HLA.xsd file: Won't parse through XSD validator.
There is a misaligned A_11 file in A_gen, between A_32:86 and A*32:93.
A*68:01:24
, A*32:01:24
, A*31:01:24
Extra insertion place holders found in B and C alleles (not A) starting line 122460 causing the exon barrier to not align around codon 182.
This looks like this A, B, and C got out of alignment due to an insertion placeholder present in the B alleles, but not A,C starting on line 98736 in B07:02:01:01 (due to '-' symbol in B40:345N, line 101665).
I can't attach the file, too big.
Hello all,
I am working on a neoantigene pipeline and using optitype for HLA detection. Optitype has an older FASTA version (2013) and the same alleles differ.
What is the assembly version of the most recent FASTA files here (2018)? I am looking at hla_nuc.fasta and hla_prot.fasta. GRCH39/HG39?
I was unable to find the info in readme/version report/change log, nor is it at
https://www.ebi.ac.uk/ipd/imgt/hla/ .
I think it would be useful to have it somewhere clearly visible.
Thank you
Hi all,
We noticed that C*02:137 is listed on other deleted allele resources, but not deleted_alleles.txt.
Can you hook us up?
Many thanks,
Marney
HLA00490 - 3.30.0
The join(<1..284)
is invalid because a join should have at least two parts.
DR EMBL; Z24750; Z24750.1.
XX
FH Key Location/Qualifiers
FH
FT source 1..284
FT /organism="Homo sapiens"
FT /mol_type="genomic DNA"
FT /db_xref="taxon:9606"
FT /ethnic="Caucasoid"
FT /cell_line="YAR"
FT CDS join(<1..284)
FT /codon_start=1
FT /partial
FT /gene="HLA-DMB"
FT /allele="HLA-DMB*01:02"
FT /product="MHC Class II HLA-DMB*01:02 sequence"
FT /translation="PPSVQVAKTTPFNTREPVMLACYVWGFYPAEVTITWRKNGKLVMP
FT HSSEHKTAQPNGDWTYQTLSHLALTPSYGDTYTCVVEHIGAPEPILRDW"
FT exon 1..284
FT /number="3"
FT /partial
SQ Sequence 284 BP; 67 A; 83 C; 74 G; 60 T; 0 other;
ggccaccatc tgtgcaagta gccaaaacca ctccttttaa cacgagggag cctgtgatgc 60
tggcctgcta tgtgtggggc ttctatccag cagaagtgac tatcacgtgg aggaagaacg 120
ggaagcttgt catgcctcac agcagtgagc acaagactgc ccagcccaat ggagactgga 180
cataccagac cctctcccat ttagccttaa ccccctctta cggggacact tacacctgtg 240
tggtagagca cattggggct cctgagccca tccttcggga ctgg 284
//
Having this error in the hla.dat
file causes bio parsers to fail.
I find alignment flat file format useful as it already has intron exon boundaries embedded.
DRB5*01:01:01 allele is not listed under "alignments" directory whereas it is listed under "msf" directory.
Is this because there is only one full-length allele of DRB5? But in the README file, gen.txt description says:
"Please note for alleles that do not possess genomic sequences, there will be no entry in the file"
So for DRB5 even with one allele, there should be DRB5_gen.txt file containing the DRB5*01:01:01 allele.
Under msf directory, it is listed under DRB5_gen.msf but there is no corresponding alignment file DRB5_gen.txt under alignments directory.
In the A_prot.txt alignment, the sequence for the final peptide position for A*01:18N
is a deletion (.
), but the sequence for the preceding 158 peptide positions is unknown (*
).
This does not correspond to the A_nuc.txt alignment, where exon 8 nucleotide sequence is *****.
This terminal deletion does not show up in the .fasta, .msf or .pir alignments (but honestly, it isn't clear how it could).
The hla.dat file for 3.29.0 has the incorrect sequence length for HLA00845.2. The sequence tag should have 549 instead of 687.
SQ Sequence 687 BP; 152 A; 173 C; 223 G; 139 T; 0 other;
ID HLA00845; SV 2; standard; DNA; HUM; 549 BP.
XX
AC HLA00845;
XX
SV HLA00845.2
XX
DT 06-AUG-1993 (Rel. 1.0.0, Created, Version 1)
DT 16-AUG-2017 (Rel. 3.29.0.1, Last Updated, Version 2)
XX
DE HLA-DRB1*14:13, Human MHC Class II sequence (partial)
XX
KW Human MHC; HLA; Class II; HLA-DRB1; Allele; HLA-DRB1*14:13;
XX
OS Homo Sapiens (human)
OC Eukaryota; Metazoa; Chordata; Vertebrata; Mammalia; Eutheria; Primates;
OC Catarrhini; Hominidae; Homo.
XX
CC --------------------------------------------------------------------------
CC IPD-IMGT/HLA Release Version 3.29.0.1
CC --------------------------------------------------------------------------
CC Copyrighted by the IPD-IMGT/HLA Database, Distributed under the Creative
CC Commons Attribution-NoDerivs License, see;
CC http://www.ebi.ac.uk/ipd/imgt/hla/licence.html for further details.
CC --------------------------------------------------------------------------
XX
RN [1]
RP 1-549
RX PUBMED; 8168862.
RA Pando M, Theiler G, Melano R, Petzl-Erler ML, Satz ML;
RT "A new HLA-DR6 allele (DRB1*1413) found in a tribe of Brazilian Indians";
RL Immunogenetics 39:377-377(1994).
XX
CC --------------------------------------------------------------------------
CC The sequence below is the official allele sequence as approved by the
CC WHO Nomenclature Committee for Factors of the HLA System.
CC Any cross references may differ from the sequence shown below.
CC --------------------------------------------------------------------------
XX
DR EMBL; AM110001; AM110001.0.
DR EMBL; L21755; L21755.1.
XX
FH Key Location/Qualifiers
FH
FT source 1..549
FT /organism="Homo sapiens"
FT /mol_type="genomic DNA"
FT /db_xref="taxon:9606"
FT /ethnic="American Indian"
FT /cell_line="GRC-138"
FT CDS <1..549>
FT /codon_start=1
FT /partial
FT /gene="HLA-DRB1"
FT /allele="HLA-DRB1*14:13"
FT /product="MHC Class II HLA-DRB1*14:13 sequence"
FT /translation="RFLEYSTSECHFFNGTERVRFLERYFHNQEENVRFDSDVGEYRAV
FT TELGRPSAEYWNSQKDLLEQRRAAVDTYCRHNYGVGESFTVQRRVHPKVTVYPSKTQPL
FT QHHNLLVCSVSGFYPGSIEVRWFRNGQEEKTGVVSTGLIHNGDWTFQTLVMLETVPRSG
FT EVYTCQVEHPSVTSPLTVE"
FT exon 1..270
FT /number="2"
FT exon 271..549
FT /number="3"
SQ Sequence 687 BP; 152 A; 173 C; 223 G; 139 T; 0 other;
cacgtttctt ggagtactct acgtctgagt gtcatttctt caatgggacg gagcgggtgc 60
ggttcctgga gagatacttc cataaccagg aggagaacgt gcgcttcgac agcgacgtgg 120
gggagtaccg ggcggtgacg gagctggggc ggcctagcgc cgagtactgg aacagccaga 180
aggacctcct ggagcagagg cgggccgcgg tggacaccta ctgcagacac aactacgggg 240
ttggtgagag cttcacagtg cagcggcgag tccatcctaa ggtgactgtg tatccttcaa 300
agacccagcc cctgcagcac cacaacctcc tggtctgttc tgtgagtggt ttctatccag 360
gcagcattga agtcaggtgg ttccggaatg gccaggaaga gaagactggg gtggtgtcca 420
caggcctgat ccacaatgga gactggacct tccagaccct ggtgatgctg gaaacagttc 480
ctcggagtgg agaggtttac acctgccaag tggagcaccc aagcgtgaca agccctctca 540
cagtggaat 549
//
Alignment (alignments/DPB_nuc.txt) implies that we don't have sequence information for the final alanine, but the fasta (fasta/DPB_nuc.fasta) has it.
I think theres an extra "ATGTGT" at the end C_nuc.fasta, at least with comparison to the sequence in the alignment file.
Hi
During my recent investigation, i found that some alleles are missing from hla.xml which are in hla.dat. For example, HLA-H*02:06. There are ~300 alleles in this situation.
Is this intended?
Thank you,
Marcell
All class II protein alignment files are blank for 3.32.0.
Feature annotations should not differ between database releases if the sequence is the same. If an annotation is changed in a later database release, then it should also be updated in all previous database releases that contain that sequence. The feature annotations for 174 alleles change between database releases even though the sequences do not. These differences mainly impact intron-4, exon-5, and intron-5 for HLA-DQB1. Below is a table of all the observed instances of this issue.
DB | Allele | # Features Removed | # Features Added | # Features Differ | Features Removed | Features Added | Features that Differ |
---|---|---|---|---|---|---|---|
3160 | HLA-B*15:302N | 0 | 0 | 3 | exon_5 exon_2 exon_3 | ||
3160 | HLA-C*08:89N | 0 | 0 | 1 | exon_2 | ||
3170 | HLA-B*15:302N | 0 | 0 | 1 | exon_5 | ||
3180 | HLA-B*39:97N | 0 | 0 | 1 | exon_3 | ||
3180 | HLA-C*08:89N | 0 | 0 | 1 | exon_2 | ||
3190 | HLA-C*08:89N | 0 | 0 | 1 | exon_2 | ||
3220 | HLA-B*07:251N | 0 | 0 | 1 | exon_3 | ||
3280 | HLA-B*15:149N | 0 | 0 | 1 | exon_4 | ||
3280 | HLA-B*15:246N | 0 | 0 | 1 | exon_4 | ||
3280 | HLA-C*08:89N | 0 | 0 | 1 | exon_2 | ||
3290 | HLA-B*15:149N | 0 | 0 | 1 | exon_4 | ||
3290 | HLA-B*15:246N | 0 | 0 | 1 | exon_4 | ||
3300 | HLA-A*24:155N | 1 | 0 | 0 | exon_5 | ||
3300 | HLA-A*26:01:01:03N | 0 | 0 | 2 | intron_4 exon_4 | ||
3300 | HLA-B*07:44N | 0 | 0 | 2 | intron_4 exon_4 | ||
3300 | HLA-B*15:01:01:02N | 0 | 1 | 1 | exon_1 | intron_1 | |
3300 | HLA-B*15:149N | 0 | 0 | 1 | exon_4 | ||
3300 | HLA-B*15:246N | 0 | 0 | 2 | exon_5 exon_4 | ||
3300 | HLA-B*44:02:01:02S | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*02:01:01 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*02:02:01:02 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*02:02:04 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*02:53Q | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*02:62 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*02:79 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*02:80 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*02:81 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*02:82 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*02:83 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*02:84 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*02:96N | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:01:01:01 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:01:01:02 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:01:01:03 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:01:01:04 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:01:01:05 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:01:01:06 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:01:01:07 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:01:01:08 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:01:01:09 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:01:01:10 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:01:01:11 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:01:01:12 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:01:01:14 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:01:01:15 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:01:01:16 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:01:01:17 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:01:01:18 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:01:17 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:01:22 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:01:35 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:01:36 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:01:37 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:02:01:01 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:02:01:02 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:02:01:03 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:02:09 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:02:12 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:02:21 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:02:22 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:02:23 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:02:24 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:03:02:01 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:03:02:02 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:03:02:03 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:03:04 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:04:01 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:04:03 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:05:01 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:150 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:191 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:195 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:196 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:197Q | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:19:01 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:211 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:239 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:243 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:245 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:246 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:247 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:248 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:249 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:250 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:251 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:252 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:253 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:254 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*03:263 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*04:01:01:01 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*04:02:01:01 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*04:02:11 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*04:02:12 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*04:11 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*04:32 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*05:01:01:01 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*05:01:01:02 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*05:01:01:03 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*05:01:01:04 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*05:01:01:05 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*05:01:23 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*05:01:24 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*05:02:01:01 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*05:02:01:02 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*05:02:01:03 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*05:02:07 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*05:02:11 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*05:102 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*05:103 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*05:104 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*05:106 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*05:108 | 0 | 1 | 1 | exon_5 | exon_6 | |
3300 | HLA-DQB1*05:133 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*05:134 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*05:135 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*05:136 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*05:137 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*05:148 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*05:149 | 1 | 1 | 0 | exon_6 | exon_5 | |
3300 | HLA-DQB1*05:31 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*05:43:02 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*05:52 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*05:57 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*05:96 | 0 | 1 | 1 | exon_5 | exon_6 | |
3300 | HLA-DQB1*05:97 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*06:01:08 | 0 | 1 | 1 | exon_5 | exon_6 | |
3300 | HLA-DQB1*06:01:10 | 0 | 1 | 1 | exon_5 | exon_6 | |
3300 | HLA-DQB1*06:01:11 | 0 | 1 | 1 | exon_5 | exon_6 | |
3300 | HLA-DQB1*06:02:01:01 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*06:02:01:02 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*06:02:01:03 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*06:02:17 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*06:02:22 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*06:02:23 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*06:02:25 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*06:02:26 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*06:02:27 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*06:02:28 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*06:03:01:01 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*06:03:01:02 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*06:03:12 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*06:03:14 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*06:03:20 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*06:03:21 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*06:03:23 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*06:03:24 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*06:03:25 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*06:03:26 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*06:04:01 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*06:09:01:01 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*06:09:01:02 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*06:103 | 0 | 1 | 1 | exon_5 | exon_6 | |
3300 | HLA-DQB1*06:111 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*06:117 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*06:125 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*06:187 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*06:188 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*06:217 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*06:218 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*06:219 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*06:221 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*06:222 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*06:223 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*06:224 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*06:225 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*06:226 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*06:227 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*06:228 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*06:37 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*06:44 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*06:84 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*06:90 | 0 | 2 | 1 | exon_5 intron_5 | intron_4 | |
3300 | HLA-DQB1*06:99:02 | 0 | 1 | 1 | exon_5 | exon_6 | |
3320 | HLA-C*07:02:01:17N | 0 | 0 | 2 | intron_3 exon_3 |
Hi,
I see the list of the references of the multiple sequence alignment here:
http://www.ebi.ac.uk/ipd/imgt/hla/nomenclature/alignments.html
May I ask are the alleles in this list part of the human reference genome GRCh38?
Does GRCh38 contains all HLA genes?
Many thanks,
Mengyao
In the 'alignments' folder A_gen.txt file, there are several lines contain " | " symbol, for example:
A_01:01:01:01 G | ATGGCCGTC ATGGCGCCCC GAACCCTCCT CCTGCTACTC TCGGGGGCCC TGGCCCTGAC CCAGACCTGG GCGG | GTGAGT GCGGGGTCGG GAGGGAAACC
A_01:01:01:02N - | --------- ---------- ---------- ---------- ---------- ---------- ---------- ---- | ------ ---------- ----------
A*01:01:01:03 * | --------- ---------- ---------- ---------- ---------- ---------- ---------- ---- | ------ ---------- ----------
May I ask what do these " | " symbols mean?
Many thanks,
Mengyao
The following url - https://raw.githubusercontent.com/ANHIG/IMGTHLA/Latest/xml/hla.xml.zip is broken - it provides corrupted ZIP file (1KB size).
Other urls seem to be working – e.g. https://raw.githubusercontent.com/ANHIG/IMGTHLA/Latest/wmda/hla_nom.txt
Could you please take a look at this and fix the zip files? We would appreciate the help ASAP as this is blocking our processing of the latest release.
In line 19 of the https://github.com/ANHIG/IMGTHLA/edit/Latest/oid/README.md document, the word 'donated' should be 'denoted'.
I tried downloading the IMGT zip but the hla.dat file does not contain the alleles as expected. Instead it contains the following:
version https://git-lfs.github.com/spec/v1
oid sha256:1b26676d2366ba8768122a973aa0add3641671430a52a431e03b6700b8459ff1
size 113160320
I want to download the multiple sequence alignment files of release 3.9.0 release because we want to finish the remaining portion of an old project. However, I am unable to find the those files in this repository. Specifically I need the file DQA_nuc.txt or DQA1_nuc.txt for release 3.9.0. as I already have the files of other genes I am interested in.
Let me know if there is anyway I can find that file.
Thank you
There is a 'g' at the start of the alignment, presumably instead of 'G'. Is there some meaning to this difference?
in xml/hla.xml.zip
line 478849 there is an apparently spurious VT (vertical tab) character in @title which breaks parsing the XML file using libxml2.
Thanks
The current release and date stamps in hla_ambigs.xml for the current release (3.30.0) are empty.
<?xml version="1.0" encoding="UTF-8"?>
<tns:ambiguityData xmlns:tns="http://www.example.org/ambig-aw"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.example.org/ambig-aw ambig-aw.xsd ">
<tns:releaseVersion currentRelease="" date="" />
<tns:geneList>
It seems the COPYRIGHT NOTICE section of the README.md file here contains 1-2 typos.
The section indicates 2015 as the publication date for the Nucleic Acids Research article, but Google Scholar indicates 2014. I think 2015 is a typo.
Another typo: the word "stongly".
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.