Pre Doctoral University of Colorado AMC Lochbuie, Colorado
Aim: The human major histocompatibility complex (MHC), a 5 Mbp region on chromosome 6, is crucial for immune function and exhibits extensive sequence and structural variation, making its accurate representation in the human reference genome challenging. While our previous publication of six full-length MHC haplotypes (representing the four major HLA class II structures: DR1, DR3, DR4, and DR5) increased representation, global population diversity is still greatly lacking. To increase population diversity, we utilized human MHC homozygous samples to develop a targeted long-read sequencing method to complement our existing short-read approach, achieving higher and more complete MHC coverage. Furthermore, we present a new automated bioinformatics pipeline integrating both long and short-read data, enabling comprehensive de novo assembly that can identify previously uncharacterized structural variations.
Methods: This pipeline leverages existing assemblers to generate initial contigs, which are then refined and scaffolded using a novel graph-based approach. The scaffolded contigs are cleaned up through iterative polishing, resulting in a complete assembled haplotype.
Results: The assembled allele frequencies indicate that 4-locus haplotypes (HLA-A~C~B~DRB1) matching the International Histocompatibility Workshop (IHIW) cell lines represent 20% of European haplotype diversity, which increases to 35.5% with the inclusion of haplotypes characterized here, encompassing 20 of the most frequent European haplotypes. Similarly, for 6-locus haplotypes (HLA-A~C~B~DRB1~DQB1~DPB1) representation increases from 6.3% (IHIW-derived) to 23.8% with the addition of newly characterized haplotypes. Relative to Europeans, the information on the worldwide distribution of HLA haplotypes is limited. Based on the allelefrequenceies.net webtool, we estimate that populations will be represented to a minimum of 5% total haplotype frequency for an additional 31 non-European countries.
Conclusion: Altogether, we introduce ~500 newly assembled MHC haplotypes from homozygous samples, substantially expanding population diversity within the human reference genome and strengthening our understanding of human MHC diversity.