Histocompatibility Specialist The First Affiliated Hospital of Soochow University Suzhou, Jiangsu, China (People's Republic)
Aim: This study aimed to enhance the accuracy of our previously established HLA-C and -DQB1 high-resolution genotype prediction model by integrating family-derived true haplotypes with a novel method to identify Expectation-Maximization (EM) algorithm-derived near-true haplotypes from unrelated donors. The goal was to address the limited coverage of true haplotype databases due to scarce family resources and improve donor-recipient matching efficiency through this model for hematopoietic stem cell transplantation (HSCT).
Methods: We developed a filtering method to extract near-true haplotypes from EM-generated theoretical haplotypes. These were screened using thresholds for haplotype frequency, regional replication, and homozygosity. The selected near-true haplotypes were combined with family-derived true haplotypes to construct an expanded reference database. Prediction accuracy was validated on samples with confirmed high-resolution HLA-A, B, DRB1, C, and DQB1 genotypes.
Results: From 8,011 EM-derived theoretical haplotypes, 3,618 overlapped with family-derived true haplotypes. The remaining 4,393 were filtered using thresholds (frequency >0.01%, replication in ≥4 regions, or homozygosity), yielding 2,777 near-true haplotypes. These were combined with 5,849 family-derived true haplotypes to construct the expanded database. Integrating these haplotypes improved the consistency rate between predicted and observed C/DQB1 genotypes (predicted using A, B, and DRB1 genotypes from 1,634 samples) from 90.6% to 96.3%. Each sample generated one or more predicted allele combinations, with only those exceeding 1% probability retained. The optimal threshold achieved a balance between high accuracy and minimal combinatorial complexity.
Conclusion: This study established a robust framework for extracting near-true haplotypes from EM-derived data, leveraging abundant unrelated donor data to supplement scarce family-derived true haplotypes. The optimized model enhances donor selection efficiency for HSCT and can be extended to organ transplantation and cord blood transplantation.