Scholar Presentation: Profiling the HLA diversity in Multiple US Populations Using High-resolution 9-locus Haplotype Frequencies from Donors in the NMDP Donor Registry
Aim: To characterize the HLA immunogenetic diversity in US populations, nine locus haplotype frequencies (HFs) were estimated from volunteer stem-cell donors in the NMDP registry. These HFs are crucial for high-resolution imputation, match predictions, and match likelihood projections. Our objective was to refresh this valuable population genetics reference dataset for the Histocompatibility and Immunogenetics (H&I) community, most recently updated in 2013.
Methods: A dataset of 9,671,082 NMDP donors typed by DNA methods was used to estimate 9-locus HFs using an updated Expectation Maximization algorithm that fills typing gaps and jointly resolves phase and allelic ambiguities. Self-identified race and ethnicity categories were captured on the NMDP member recruitment form and used to stratify the study populations into five broad and 21 genetically distinctive detailed groups. Multiple population genetics analyses were conducted, including deviation from Hardy-Weinberg equilibrium proportions (HWEP), Linkage disequilibrium (LD), population clustering, and genetic distance. HapLogic, the NMDP matching algorithm, was used to validate the efficacy of the estimated HFs for matching.
Results: Typing coverage was 100% for HLA-A, B and DRB1, 75%, 54%, 41%, and 10% for HLA-C, DQB1, DPB1, DPA1, and DQA1 respectively. The top 10 9-locus haplotypes in the US by aggregate frequencies are shown in table 1 with cumulative HFs between 0.84% to 10%. Most of the nine loci deviated from HWEP with exceptions at loci with sparse data: DQA1, DPB1, and DPA1. LD was stronger for class II loci like DRB1 with DQB1 and DQA1 with Black populations showing lower class II LD than all other groups. Clustering analysis showed tight clusters for Black, Hispanic, and White populations while Asian groups showed more intra-population genetic diversity. Matching validation showed improved match predictions than previously published results (0.97 for 10/10 and 0.62 for DPB1 predictions) with current AUC = 0.98, 0.94, 0.8 and 0.96 for 10/10, DQA1, DPB1 and DPA1 respectively (figure 1).
Conclusion: The newly estimated 9-locus HFs extend previous frequency studies to cover all classical HLA loci with a larger sample size. This dataset can be applied for comprehensive and accurate HLA matching, virtual crossmatch predictions, and multiple other applications in genetics research and medicine.