A dataset of peptide-MHC affinities for n peptides and a alleles may be thought of as a n ร a matrix
where peptide/allele pairs without measurements are missing values.
Using the dataset BD2009, it appears that you also filtered out peptides which occurred in at least three alleles and alleles with less than five measurements.
Also, apologies for the simple question, but I'm slightly confused by the pre-print. Based on Kim et al, 2014, BD2009 contains 79 alleles, and 170 total datasets. Your BD2009 dataset has 106 alleles? And it was compiled into a single dataset?