Coder Social home page Coder Social logo

anhig / imgthla Goto Github PK

View Code? Open in Web Editor NEW
200.0 200.0 60.0 2.84 GB

Github for files currently published in the IPD-IMGT/HLA FTP Directory hosted at the European Bioinformatics Institute

Home Page: http://www.ebi.ac.uk/ipd/imgt/hla/

License: Other

Parrot 100.00%
alleles bioinformatics hla hla-database nomenclature

imgthla's People

Contributors

dominicbarkeran avatar ipd-deploy avatar jrob119 avatar michaelcooperan avatar xeniageorgiouan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

imgthla's Issues

File format

Is there documentation for the txt alignment format (for example: A_gen)?

Thank you for hosting this on github!

Comma in Description field of Deleted_alleles.txt file

Line 106 of the Deleted_alleles.txt file (HLA00615,DQA1*05013,To take account of coding polymorphism in the leader peptide, sequence renamed DQA1*05:05 (April 1998)) includes a comma in the Description field.

This results in an extra column being added for this line when parsing the file as a .csv document.

Could this comma be removed? It doesn't change the meaning of the entry.

incorrect/missing alignmentreference elements in hla.xml

For the DPB1 alleles, the alignmentreference element attributes have an empty alleleid attribute, and the allelename attribute contains "DPB101:01:01", but the allele element in the file has the extended name "DPB101:01:01:01" so the reference is not made.

DRBx alleles also have an empty alleleid alignmentreference attribute, but in these cases the DRB1*01:01:01 allele is named consistently

john

Errors in assigning intron numbers to DRB4*03:01N intron sequences?

In the hla.xml for release 3.33.0, the names of the DRB4*03:01N intron features do not match the feature order numbers for other DRB intron features.

Here the the intron elements for DRB4*03:01N:

     <feature id="914.5" order="5" featuretype="Intron" name="Intron 1">
        <SequenceCoordinates start="1" end="2684" />
     </feature>
      <feature id="914.7" order="7" featuretype="Intron" name="Intron 2">
        <SequenceCoordinates start="2967" end="3670" />
     </feature>
      <feature id="914.9" order="9" featuretype="Intron" name="Intron 3">
        <SequenceCoordinates start="3782" end="4255" />
     </feature>
      <feature id="914.11" order="11" featuretype="Intron" name="Intron 4">
        <SequenceCoordinates start="4280" end="4581" />
    </feature>

Here are the corresponding intron elements for other DRB alleles (e.g., DRB4*01:03:01:03):

      <feature id="6603.3" order="3" featuretype="Intron" name="Intron 1">
        <SequenceCoordinates start="414" end="9976" />
     </feature>
      <feature id="6603.5" order="5" featuretype="Intron" name="Intron 2">
        <SequenceCoordinates start="10247" end="12983" />
     </feature>
      <feature id="6603.7" order="7" featuretype="Intron" name="Intron 3">
        <SequenceCoordinates start="13266" end="13969" />
     </feature>
      <feature id="6603.9" order="9" featuretype="Intron" name="Intron 4">
        <SequenceCoordinates start="14081" end="14554" />
     </feature>
      <feature id="6603.11" order="11" featuretype="Intron" name="Intron 5">
        <SequenceCoordinates start="14579" end="14880" />
     </feature>

Shouldn't all DRB Intron 1 sequences be intron order 3, and all intron sequences of intron order 5 be intron 2?

C*02:10:01GG

Hi all,

This extra G is causing us some issues.

allele id="HLA18583" name="HLA-C02:02:37" dateassigned="2018-03-29"
hla_g_group status="C
02:10:01GG"
hla_p_group status="C*02:02P"

Thank you!
Marney

Difference between fasta and alignments for A*01:11N

One base pair before point mutation 968G>T, the sequences seem to diverge. The mutation (T) is higlighted:

From alignment file (that I think is correct):
GGAGAACGGTAA...
vs the fasta section:
GGAGAACGACCC...

Problems with the 3.34.0 nuc.txt and prot.txt alignments for HLA-B and -C

In the 3.34.0 HLA-B protein alignment, the HLA-B*13:120Q peptide sequence is 11 amino acids longer than the reference, but these positions are not accounted for in the reference with . symbols. As a result, even though the last sequence block for all other alleles only include 69 amino-acid positions, the last 11 amino acids of the HLA-B*13:120Q sequence appear in a separate block, as below.
screen shot 2018-10-17 at 3 08 17 pm

This also occurs for the B_nuc.txt alignment, as below.
screen shot 2018-10-17 at 3 08 43 pm

The same thing is also true for the C*04:09N allele in the C_prot.txt and C_nuc.txt alignments.

It seems like these extra peptide positions should be included in the reference sequences as sequence indels.

Release 3.36.0 - file inconsistencies

  1. There are two new alleles where "dateassigned" is blank in the hla.xml file, DQA1 05:05:01:20 (HLA22679) and DRB4 01:03:01:10 (HLA22663). The dates are listed appropriately in the hla_nom file.

  2. There is an inconsistency between hla_nom and hla.xml for HLA00886, where the xml file has the allele name as v2 DRB3 010101 while the nom file has v3 DRB3 01:01:01. Could you explain this for us?

  3. The hla.xml file has a G group listed as C*07:726N:01G while nom_g lists it as 07:726:01G. Could you please look into this one too?

  4. There is an inconsistency between nom_p and hla.xml regarding DQA1 05:05:01:20. This allele is listed as part of DQA1*05:01P in nom_p but has no p group status in the xml file.

Any help on the above is greatly appreciated. Thanks!

C*17:01:01:02

Hi James,

During the processing of a bunch of new alleles, we ran into an issue with C*17:01:01:02
The hla.dat file we pulled from the git repository has Exon 5 marked as "pseudo" while on the IPD-IMGT/HLA website it is not marked as such. A cursory look at the history of the sequence does not indicate any recent changes. We were wondering if this was intentional and something we should take into account in our work flow ?

Cheers,
Vineeth

Error in Sequence tag

Hi James,

in the new release 3.33.0 of hla.dat some DRB1 sequences are invalid. See for example DRB1*13:09, the substring "y/alignment_libraries/libs/drb1345genomiclib:drb1_13:09" should not be there, i think.

FH Key Location/Qualifiers
FH
FT source 1..325
FT /organism="Homo sapiens"
FT /mol_type="genomic DNA"
FT /db_xref="taxon:9606"
FT /ethnic="Hispanic"
FT /cell_line="MJD"
FT /cell_line="NT01111"
FT CDS <1..270
FT /codon_start=1
FT /partial
FT /gene="HLA-DRB1"
FT /allele="HLA-DRB113:09"
FT /product="MHC Class II HLA-DRB1
13:09 sequence"
FT /translation="RFLEYSTSECHFFNGTERVRFLDRYFHNQEENVRFDSDVGEFRAV
FT TELGRPDAEYWNSQKDILEQARAAVDTYCRHNYGVVESFTVQRR"
FT exon 1..270
FT /number="2"
FT UTR 271..328
SQ Sequence 325 BP; 58 A; 67 C; 100 G; 51 T; 49 other;
cacgtttctt ggagtactct acgtctgagt gtcatttctt caatgggacg gagcgggtgc 60
ggttcctgga cagatacttc cataaccagg aggagaacgt gcgcttcgac agcgacgtgg 120
gggagttccg ggcggtgacg gagctggggc ggcctgatgc cgagtactgg aacagccaga 180
aggacatcct ggagcaggcg cgggccgcgg tggacaccta ctgcagacac aactacgggg 240
ttgtggagag cttcacagtg cagcggcgag y/alignmen t_librarie s/libs/drb 300
1345genomi clib:drb1_ 13:09 325
//

Cheers,
Markus

HLA.Dat user manual not matching hla.dat file

The user manual and the HLA.Dat file appear to be out of sync. The user manual states that the DT Entry will have 3 per entry. When I look at the 3.30.0 HLA.Dat file, there are only 2 per entry.

Invalid character � in dat file for 3.21.0, 3.22.0, 3.23.0 and 3.24.0

The following line is found in the hla.dat file for 3.21.0, 3.22.0, 3.23.0 and 3.24.0.

RA   Balas A, S�nchez-Gordo F, Garcia-S�nchez F, Gomez-Zumaquero JM, Vicario JL;

This prevents these files from being properly parsed.

Here are the specific alleles that have this issue:

Release = 3210, line # = 121045, Allele = HLA-A*11:210N
Release = 3210, line # = 177260, Allele = HLA-A*26:107N
Release = 3220, line # = 125142, Allele = HLA-A*11:210N
Release = 3220, line # = 183644, Allele = HLA-A*26:107N
Release = 3230, line # = 127802, Allele = HLA-A*11:210N
Release = 3230, line # = 187727, Allele = HLA-A*26:107N
Release = 3240, line # = 129967, Allele = HLA-A*11:210N
Release = 3240, line # = 191426, Allele = HLA-A*26:107N

no newline following XML declaration in hla_ambigs.xml

On line 1 of hla_ambigs.xml, the XML declaration is not followed by a newline character, so the tns:ambiguityData start-tag appears on the same line.

A newline character is not required by the XML spec, but could be a helpful aesthetic enhancement.

Incorrectly using join for DRB5 sequences in 3.20.0 and 3.21.0

In the hla.dat files for 3.20.0 and 3.21.0 a join is being used for the CDS sequence when it shouldn't be which causes parsers to fail. Here's an example:

DR   EMBL; AJ427352; AJ427352.1.
XX
FH   Key             Location/Qualifiers
FH
FT   source          1..270
FT                   /organism="Homo sapiens"
FT                   /mol_type="genomic DNA"
FT                   /db_xref="taxon:9606"
FT                   /ethnic="Caucasoid"
FT                   /cell_line="Barpay"
FT   CDS             join(1..270)
FT                   /codon_start=1
FT                   /partial
FT                   /gene="HLA-DRB5"
FT                   /allele="HLA-DRB5*01:12"
FT                   /product="MHC Class II HLA-DRB5*01:12 sequence"
FT                   /translation="RFLQQDKYECHFFNGTERVRFLHRDIYNQEEDLRFDSDVGEYRAV
FT                   TELGRPDAESWNSQKDFLERRRAEVDTVCRHNYGVGESFTVQRR"

Should be FT CDS 1..270 or FT CDS <1..270> instead.

Here's a list of all the alleles that have this:

HLA01638.1 HLA-DRB5*01:11
HLA01634.1 HLA-DRB5*01:12
HLA01871.1 HLA-DRB5*01:13
HLA00927.1 HLA-DRB5*02:03
HLA00928.1 HLA-DRB5*02:04
HLA01280.1 HLA-DRB5*02:05
HLA00916.1 HLA-DRB5*01:01:02
HLA00918.2 HLA-DRB5*01:03
HLA00920.1 HLA-DRB5*01:05
HLA00921.1 HLA-DRB5*01:06
HLA00922.1 HLA-DRB5*01:07
HLA00924.1 HLA-DRB5*01:09
HLA01012.3 HLA-DRB5*01:10N

gGroup and gGroupAllele names in hla_ambigs.xml don't use full gene names

The gGroup and gGroupAllele names in hla_ambigs.xml don't use the full gene names. For example, in place of "HLA-A", they use "A". This makes them inconsistent with the allele names in hla.xml.

Below are file excerpts to further illustrate the issue.

From hla.xml:
<allele id="HLA00001" name="HLA-A*01:01:01:01" dateassigned="1989-08-01">

From hla_ambigs.xml:
<tns:gGroup name="A*01:01:01G" gid="HGG00001">
<tns:gGroupAllele name="A*01:01:01:01" alleleid="HLA00001" />

Please consider revising the gGroup and gGroupAllele names in hla_ambigs.xml to use the full gene names.

DPA1_gen.fasta renamed to DPA_gen.fasta

But the alignment file not renamed? The pir and msf files were also renamed.
Are sequences for the DPA2 pseudo gene forthcoming?
This isn't a technical issues just a consistency issue.

Genomic alignment of DPA1*04:01 and DPA1*04:02 in the DPA1_gen.txt file

The alignment in the DPA1_gen.txt file for DPA1 *04:01 and *04:02 makes it appear that these alleles differ significantly in their sequence for positions 1061 to 1093, as below.

dpa1_gen_0401-0402_intron1

However, the sequences of these alleles are identical through these positions, and it seems like the sequence for *04:02 should only include a 3 nucleotide deletion, relative to the reference, for positions 1061 - 1063, as below.

dpa1_gen_0401-0402_intron1_fixed

Missing archive zip file

Hello,

In the README, you note that a "zip compressed archive of all the text-format alignment files is available from the top-level directory". However, I am unable to find such a zip file. The only zip file appears to be the Alignment_Rel_3350.zip that contains the alignments from the current release.

In particular, I would like to find archive versions of the alignment files and the archive versions of the fasta files.

Can you point me in the right direction?

Thanks,
Rachel

incomplete fasta file

hi,
the hla_gen.fasta from the latest version contains sequences for only 5773 alleles.
where are the other alleles? can't find DPA1*03:02 for instance.

thanks,

Sequence length error found for DRB1*14:13 (HLA00845)

For DRB1*14:13 (HLA00845) We noticed that the exon regions do not match the overall sequence length. As you can see from this snippet, the sequence length is 687 but the actual sequence listed is only 549 in length.
FT exon 1..270
FT /number="2"
FT exon 271..549
FT /number="3"
FT exon 553..663
FT /number="4"
FT exon 664..687
FT /number="5"
SQ Sequence 687 BP; 152 A; 173 C; 223 G; 139 T; 0 other;
cacgtttctt ggagtactct acgtctgagt gtcatttctt caatgggacg gagcgggtgc 60
ggttcctgga gagatacttc cataaccagg aggagaacgt gcgcttcgac agcgacgtgg 120
gggagtaccg ggcggtgacg gagctggggc ggcctagcgc cgagtactgg aacagccaga 180
aggacctcct ggagcagagg cgggccgcgg tggacaccta ctgcagacac aactacgggg 240
ttggtgagag cttcacagtg cagcggcgag tccatcctaa ggtgactgtg tatccttcaa 300
agacccagcc cctgcagcac cacaacctcc tggtctgttc tgtgagtggt ttctatccag 360
gcagcattga agtcaggtgg ttccggaatg gccaggaaga gaagactggg gtggtgtcca 420
caggcctgat ccacaatgga gactggacct tccagaccct ggtgatgctg gaaacagttc 480
ctcggagtgg agaggtttac acctgccaag tggagcaccc aagcgtgaca agccctctca 540
cagtggaat 549

DQA1*05:01:04 is not in P or G group in hla.xml.

Good morning again,

We noticed an inconsistency between the files. Will you correct which ever needs to be corrected, please?

allele id="HLA18836" name="HLA-DQA1*05:01:04" dateassigned="2018-04-30"
hla_g_group status="None"/
hla_p_group status="None"/

hla_nom_g.txt
DQA1*;05:01:01:01/05:01:01:02/05:01:01:03/05:01:04/05:03:01:01/05:03:01:02/05:05:01:01/05:05:01:02/05:05:01:03/05:05:01:04/05:05:01:05/05:05:01:06/05:05:01:07/05:05:01:08/05:05:01:09/05:05:01:10/05:06:01:01/05:06:01:02/05:07/05:08/05:09/05:11;05:01:01G

DQA1*;05:01:01:01/05:01:01:02/05:01:01:03/05:01:02/05:01:04/05:03:01:01/05:03:01:02/05:05:01:01/05:05:01:02/05:05:01:03/05:05:01:04/05:05:01:05/05:05:01:06/05:05:01:07/05:05:01:08/05:05:01:09/05:05:01:10/05:06:01:01/05:06:01:02/05:07/05:08/05:09/05:11;05:01P

Thank you!
May the force be with you,
Marney

ClassI_nuc.txt alignment issue (extra insertion placeholders in B,C alleles cause misalignment)

Extra insertion place holders found in B and C alleles (not A) starting line 122460 causing the exon barrier to not align around codon 182.

This looks like this A, B, and C got out of alignment due to an insertion placeholder present in the B alleles, but not A,C starting on line 98736 in B07:02:01:01 (due to '-' symbol in B40:345N, line 101665).

I can't attach the file, too big.

Assembly version

Hello all,

I am working on a neoantigene pipeline and using optitype for HLA detection. Optitype has an older FASTA version (2013) and the same alleles differ.
What is the assembly version of the most recent FASTA files here (2018)? I am looking at hla_nuc.fasta and hla_prot.fasta. GRCH39/HG39?
I was unable to find the info in readme/version report/change log, nor is it at
https://www.ebi.ac.uk/ipd/imgt/hla/ .
I think it would be useful to have it somewhere clearly visible.

Thank you

HLA-DMB*01:02 - Invalid join

HLA00490 - 3.30.0

The join(<1..284) is invalid because a join should have at least two parts.

DR   EMBL; Z24750; Z24750.1.
XX
FH   Key             Location/Qualifiers
FH
FT   source          1..284
FT                   /organism="Homo sapiens"
FT                   /mol_type="genomic DNA"
FT                   /db_xref="taxon:9606"
FT                   /ethnic="Caucasoid"
FT                   /cell_line="YAR"
FT   CDS             join(<1..284)
FT                   /codon_start=1
FT                   /partial
FT                   /gene="HLA-DMB"
FT                   /allele="HLA-DMB*01:02"
FT                   /product="MHC Class II HLA-DMB*01:02 sequence"
FT                   /translation="PPSVQVAKTTPFNTREPVMLACYVWGFYPAEVTITWRKNGKLVMP
FT                   HSSEHKTAQPNGDWTYQTLSHLALTPSYGDTYTCVVEHIGAPEPILRDW"
FT   exon            1..284
FT                   /number="3"
FT                   /partial
SQ   Sequence 284 BP; 67 A; 83 C; 74 G; 60 T; 0 other;
     ggccaccatc tgtgcaagta gccaaaacca ctccttttaa cacgagggag cctgtgatgc        60
     tggcctgcta tgtgtggggc ttctatccag cagaagtgac tatcacgtgg aggaagaacg       120
     ggaagcttgt catgcctcac agcagtgagc acaagactgc ccagcccaat ggagactgga       180
     cataccagac cctctcccat ttagccttaa ccccctctta cggggacact tacacctgtg       240
     tggtagagca cattggggct cctgagccca tccttcggga ctgg                        284
//

Having this error in the hla.dat file causes bio parsers to fail.

DRB5*01:01:01 not listed under alignments directory.

I find alignment flat file format useful as it already has intron exon boundaries embedded.

DRB5*01:01:01 allele is not listed under "alignments" directory whereas it is listed under "msf" directory.

Is this because there is only one full-length allele of DRB5? But in the README file, gen.txt description says:

"Please note for alleles that do not possess genomic sequences, there will be no entry in the file"

So for DRB5 even with one allele, there should be DRB5_gen.txt file containing the DRB5*01:01:01 allele.

Under msf directory, it is listed under DRB5_gen.msf but there is no corresponding alignment file DRB5_gen.txt under alignments directory.

Strange deletion at A*01:18N peptide position 341

In the A_prot.txt alignment, the sequence for the final peptide position for A*01:18N is a deletion (.), but the sequence for the preceding 158 peptide positions is unknown (*).

This does not correspond to the A_nuc.txt alignment, where exon 8 nucleotide sequence is *****.

This terminal deletion does not show up in the .fasta, .msf or .pir alignments (but honestly, it isn't clear how it could).

3.29.0 - Expected sequence length 687, found 549 (HLA00845.2)

The hla.dat file for 3.29.0 has the incorrect sequence length for HLA00845.2. The sequence tag should have 549 instead of 687.

SQ Sequence 687 BP; 152 A; 173 C; 223 G; 139 T; 0 other;

ID   HLA00845; SV 2; standard; DNA; HUM; 549 BP.
XX
AC   HLA00845;
XX
SV   HLA00845.2
XX
DT   06-AUG-1993 (Rel. 1.0.0, Created, Version 1)
DT   16-AUG-2017 (Rel. 3.29.0.1, Last Updated, Version 2)
XX
DE   HLA-DRB1*14:13, Human MHC Class II sequence (partial)
XX
KW   Human MHC; HLA; Class II; HLA-DRB1; Allele; HLA-DRB1*14:13;
XX
OS   Homo Sapiens (human)
OC   Eukaryota; Metazoa; Chordata; Vertebrata; Mammalia; Eutheria; Primates;
OC   Catarrhini; Hominidae; Homo.
XX
CC   --------------------------------------------------------------------------
CC   IPD-IMGT/HLA Release Version 3.29.0.1
CC   --------------------------------------------------------------------------
CC   Copyrighted by the IPD-IMGT/HLA Database, Distributed under the Creative
CC   Commons Attribution-NoDerivs License, see;
CC   http://www.ebi.ac.uk/ipd/imgt/hla/licence.html for further details.
CC   --------------------------------------------------------------------------
XX
RN   [1]
RP   1-549
RX   PUBMED; 8168862.
RA   Pando M, Theiler G, Melano R, Petzl-Erler ML, Satz ML;
RT   "A new HLA-DR6 allele (DRB1*1413) found in a tribe of Brazilian Indians";
RL   Immunogenetics 39:377-377(1994).
XX
CC   --------------------------------------------------------------------------
CC   The sequence below is the official allele sequence as approved by the
CC   WHO Nomenclature Committee for Factors of the HLA System.
CC   Any cross references may differ from the sequence shown below.
CC   --------------------------------------------------------------------------
XX
DR   EMBL; AM110001; AM110001.0.
DR   EMBL; L21755; L21755.1.
XX
FH   Key             Location/Qualifiers
FH
FT   source          1..549
FT                   /organism="Homo sapiens"
FT                   /mol_type="genomic DNA"
FT                   /db_xref="taxon:9606"
FT                   /ethnic="American Indian"
FT                   /cell_line="GRC-138"
FT   CDS             <1..549>
FT                   /codon_start=1
FT                   /partial
FT                   /gene="HLA-DRB1"
FT                   /allele="HLA-DRB1*14:13"
FT                   /product="MHC Class II HLA-DRB1*14:13 sequence"
FT                   /translation="RFLEYSTSECHFFNGTERVRFLERYFHNQEENVRFDSDVGEYRAV
FT                   TELGRPSAEYWNSQKDLLEQRRAAVDTYCRHNYGVGESFTVQRRVHPKVTVYPSKTQPL
FT                   QHHNLLVCSVSGFYPGSIEVRWFRNGQEEKTGVVSTGLIHNGDWTFQTLVMLETVPRSG
FT                   EVYTCQVEHPSVTSPLTVE"
FT   exon            1..270
FT                   /number="2"
FT   exon            271..549
FT                   /number="3"
SQ   Sequence 687 BP; 152 A; 173 C; 223 G; 139 T; 0 other;
     cacgtttctt ggagtactct acgtctgagt gtcatttctt caatgggacg gagcgggtgc        60
     ggttcctgga gagatacttc cataaccagg aggagaacgt gcgcttcgac agcgacgtgg       120
     gggagtaccg ggcggtgacg gagctggggc ggcctagcgc cgagtactgg aacagccaga       180
     aggacctcct ggagcagagg cgggccgcgg tggacaccta ctgcagacac aactacgggg       240
     ttggtgagag cttcacagtg cagcggcgag tccatcctaa ggtgactgtg tatccttcaa       300
     agacccagcc cctgcagcac cacaacctcc tggtctgttc tgtgagtggt ttctatccag       360
     gcagcattga agtcaggtgg ttccggaatg gccaggaaga gaagactggg gtggtgtcca       420
     caggcctgat ccacaatgga gactggacct tccagaccct ggtgatgctg gaaacagttc       480
     ctcggagtgg agaggtttac acctgccaag tggagcaccc aagcgtgaca agccctctca       540
     cagtggaat                                                               549
//

Some alleles are missing from hla.xml

Hi

During my recent investigation, i found that some alleles are missing from hla.xml which are in hla.dat. For example, HLA-H*02:06. There are ~300 alleles in this situation.

Is this intended?

Thank you,
Marcell

Identical sequences with different feature annotations - 174 alleles

Feature annotations should not differ between database releases if the sequence is the same. If an annotation is changed in a later database release, then it should also be updated in all previous database releases that contain that sequence. The feature annotations for 174 alleles change between database releases even though the sequences do not. These differences mainly impact intron-4, exon-5, and intron-5 for HLA-DQB1. Below is a table of all the observed instances of this issue.

DB Allele # Features Removed # Features Added # Features Differ Features Removed Features Added Features that Differ
3160 HLA-B*15:302N 0 0 3 exon_5 exon_2 exon_3
3160 HLA-C*08:89N 0 0 1 exon_2
3170 HLA-B*15:302N 0 0 1 exon_5
3180 HLA-B*39:97N 0 0 1 exon_3
3180 HLA-C*08:89N 0 0 1 exon_2
3190 HLA-C*08:89N 0 0 1 exon_2
3220 HLA-B*07:251N 0 0 1 exon_3
3280 HLA-B*15:149N 0 0 1 exon_4
3280 HLA-B*15:246N 0 0 1 exon_4
3280 HLA-C*08:89N 0 0 1 exon_2
3290 HLA-B*15:149N 0 0 1 exon_4
3290 HLA-B*15:246N 0 0 1 exon_4
3300 HLA-A*24:155N 1 0 0 exon_5
3300 HLA-A*26:01:01:03N 0 0 2 intron_4 exon_4
3300 HLA-B*07:44N 0 0 2 intron_4 exon_4
3300 HLA-B*15:01:01:02N 0 1 1 exon_1 intron_1
3300 HLA-B*15:149N 0 0 1 exon_4
3300 HLA-B*15:246N 0 0 2 exon_5 exon_4
3300 HLA-B*44:02:01:02S 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*02:01:01 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*02:02:01:02 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*02:02:04 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*02:53Q 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*02:62 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*02:79 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*02:80 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*02:81 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*02:82 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*02:83 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*02:84 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*02:96N 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:01:01:01 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:01:01:02 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:01:01:03 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:01:01:04 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:01:01:05 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:01:01:06 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:01:01:07 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:01:01:08 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:01:01:09 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:01:01:10 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:01:01:11 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:01:01:12 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:01:01:14 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:01:01:15 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:01:01:16 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:01:01:17 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:01:01:18 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:01:17 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:01:22 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:01:35 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:01:36 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:01:37 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:02:01:01 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:02:01:02 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:02:01:03 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:02:09 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:02:12 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:02:21 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:02:22 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:02:23 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:02:24 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:03:02:01 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:03:02:02 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:03:02:03 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:03:04 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:04:01 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:04:03 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:05:01 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:150 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:191 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:195 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:196 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:197Q 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:19:01 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:211 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:239 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:243 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:245 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:246 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:247 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:248 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:249 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:250 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:251 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:252 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:253 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:254 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*03:263 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*04:01:01:01 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*04:02:01:01 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*04:02:11 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*04:02:12 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*04:11 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*04:32 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*05:01:01:01 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*05:01:01:02 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*05:01:01:03 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*05:01:01:04 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*05:01:01:05 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*05:01:23 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*05:01:24 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*05:02:01:01 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*05:02:01:02 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*05:02:01:03 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*05:02:07 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*05:02:11 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*05:102 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*05:103 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*05:104 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*05:106 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*05:108 0 1 1 exon_5 exon_6
3300 HLA-DQB1*05:133 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*05:134 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*05:135 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*05:136 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*05:137 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*05:148 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*05:149 1 1 0 exon_6 exon_5
3300 HLA-DQB1*05:31 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*05:43:02 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*05:52 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*05:57 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*05:96 0 1 1 exon_5 exon_6
3300 HLA-DQB1*05:97 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*06:01:08 0 1 1 exon_5 exon_6
3300 HLA-DQB1*06:01:10 0 1 1 exon_5 exon_6
3300 HLA-DQB1*06:01:11 0 1 1 exon_5 exon_6
3300 HLA-DQB1*06:02:01:01 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*06:02:01:02 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*06:02:01:03 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*06:02:17 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*06:02:22 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*06:02:23 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*06:02:25 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*06:02:26 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*06:02:27 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*06:02:28 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*06:03:01:01 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*06:03:01:02 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*06:03:12 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*06:03:14 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*06:03:20 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*06:03:21 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*06:03:23 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*06:03:24 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*06:03:25 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*06:03:26 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*06:04:01 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*06:09:01:01 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*06:09:01:02 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*06:103 0 1 1 exon_5 exon_6
3300 HLA-DQB1*06:111 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*06:117 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*06:125 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*06:187 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*06:188 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*06:217 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*06:218 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*06:219 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*06:221 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*06:222 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*06:223 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*06:224 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*06:225 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*06:226 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*06:227 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*06:228 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*06:37 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*06:44 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*06:84 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*06:90 0 2 1 exon_5 intron_5 intron_4
3300 HLA-DQB1*06:99:02 0 1 1 exon_5 exon_6
3320 HLA-C*07:02:01:17N 0 0 2 intron_3 exon_3

What does '|' mean in the multiple sequence alignment?

In the 'alignments' folder A_gen.txt file, there are several lines contain " | " symbol, for example:
A_01:01:01:01 G | ATGGCCGTC ATGGCGCCCC GAACCCTCCT CCTGCTACTC TCGGGGGCCC TGGCCCTGAC CCAGACCTGG GCGG | GTGAGT GCGGGGTCGG GAGGGAAACC
A_01:01:01:02N - | --------- ---------- ---------- ---------- ---------- ---------- ---------- ---- | ------ ---------- ----------
A*01:01:01:03 * | --------- ---------- ---------- ---------- ---------- ---------- ---------- ---- | ------ ---------- ----------

May I ask what do these " | " symbols mean?

Many thanks,

Mengyao

nucleotide CDS alignment (MSA) file of release 3.9.0

I want to download the multiple sequence alignment files of release 3.9.0 release because we want to finish the remaining portion of an old project. However, I am unable to find the those files in this repository. Specifically I need the file DQA_nuc.txt or DQA1_nuc.txt for release 3.9.0. as I already have the files of other genes I am interested in.

Let me know if there is anyway I can find that file.

Thank you

Current Release and Date Stamp in hla_ambigs.xml

The current release and date stamps in hla_ambigs.xml for the current release (3.30.0) are empty.

<?xml version="1.0" encoding="UTF-8"?>
	<tns:ambiguityData xmlns:tns="http://www.example.org/ambig-aw"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://www.example.org/ambig-aw ambig-aw.xsd ">
	<tns:releaseVersion currentRelease="" date="" />
	<tns:geneList>

typos in README.md

It seems the COPYRIGHT NOTICE section of the README.md file here contains 1-2 typos.

The section indicates 2015 as the publication date for the Nucleic Acids Research article, but Google Scholar indicates 2014. I think 2015 is a typo.

Another typo: the word "stongly".

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.