Aim: This study aimed to develop and evaluate a bioinformatics pipeline for calling HLA-C alleles from exomes.
Methods: Our workflow uses a new version of hla-mapper (v.5 under development) to realign all reads related to HLA genes and to call SNPs and InDels in the coding DNA sequence (CDS) region from 233 Brazilian samples. Then, we used these variants and respective haplotypes to detect the HLA-C alleles present in each sample. While some samples presented well-defined allele combinations, others presented ambiguities due to missing alleles, and others did not present an allele combination due to errors in genotyping and missing alleles. Because: i) most SNP genotyping errors are related to homozygosity with low coverage ( <8 reads), ii) unbalanced heterozygous, and iii) homozygous calls with noise for another allele, are frequently related to probe-capture bias, we confirmed (or corrected) HLA-C typing using HIBAG. Taking advantage of a multiethnic reference panel, consisting of 5,196 samples provided by the SNP-HLA Reference Consortium (SHLARC) and the Brazilian SABE cohort, we implemented a reference panel with SNPs from the CDS region of class I HLA genes, imputing HLA-C afterward.
Results: Our model demonstrated a robust performance, with 92% of samples achieving posterior probabilities exceeding 0.8 (statistical HIBAG cut-off). Then, we compared the imputed alleles with the ones obtained from hla-mapper, to manually define the correct typing for each sample. Our findings demonstrated that the combination of tools for HLA alignment and SNP detection, together with imputation using multi-ethnic panels, generated a reliable computational solution for HLA-C allele prediction from exomes.
Conclusion: This approach may facilitate the exploration of the HLA-C gene across diverse biobanked exome collections,providing tools for improving capture probes in WES technologies.
Footnotes: Financial support: FAPESP (Grant #2021/14851-9 and #2023/00238-9), CAPES/COFECUB project (CAPES project #88881.879003/2023-01, COFECUB Me #1044/24) and CNPq (project #302060/2019-7).