Undergraduate Student University of Pennsylvania Philadelphia, PA
Aim: Data from clinical HLA testing can be stored in a variety of formats, which makes performing common analyses difficult. We developed an R package to simplify the informatics process for HLA data, using established standards for HLA genotypes. Here we describe an informatics pipeline for calculating matches between recipient/donor pairs using the R package “immunogenetr”.
Methods: Functions for HLA informatics were written in R (4.1.0). Genotype list string (GL string) was used as the standard for storing HLA genotyping data. Functions were written to easily coerce tabular data, commonly used in laboratory information systems (LIS), to and from GL string. A suite of functions for HLA matching, using GL strings as the input, were developed (Fig. 1). The functions output the number of matched alleles for individual loci, or total matching for 8/8 or 10/10 matching, as used for hematopoietic cell transplantation. The functions were designed to produce output for the graft-versus-host, host-versus-graft, or bidirectional vectors. The finalized library, “immunogenetr,” was submitted to the Comprehensive R Archive Network (CRAN) and is available for native installation in R.
Results: Fig. 2 shows an example of an informatic pipeline enabled by immunogenetr. HLA genotyping data was accessed from the database of an LIS, and the HLA_columns_to_GLstring function coerced the data to GL strings in a single step. After selecting the HLA genotype of recipient/donor pairs, the HLA_match_summary_HCT function calculated the matching at the HLA-A, B, C, and DRB1 loci in an 8/8 matching format. The matching functions were further tested for accuracy by validating with published matching algorithms from the World Marrow Donor Association.
Conclusion: Data produced by HLA laboratories are often recorded in discrete fields in LIS databases. This is useful for routine clinical operations, but can make informatic analyses difficult, as the relevant data can be spread across multiple columns. The R package “immunogenetr” was developed to enable rapid coercion of tabular data to GL strings. From this standard format, a suite of tools, notably for performing matching or mismatching, allows for reproducible HLA analyses, while the validation of the accuracy of the matching functions ensures consistent results. The immunogenetr package will make HLA informatics processes more efficient, accurate, and reproducible for both clinical and research purposes.