Index: Karolinska Institutet: KI Solna: Department of Microbiology, Tumor and Cell Biology


Haplotype analysis of the immunoglobulin heavy chain locus


Supervisor: Gunilla Karlsson Hedestam
Department: MTC
Postal Address: Karolinska Institutet Biomedicum,
Tomtebodavägen 16
171 65 Solna
Telephone: 0701-443900

E-mail: Gunilla.Karlsson.Hedestam@ki.se
Homepage: https://staff.ki.se/people/gunilla-karlsson-hedestam


Background:
Non-human primates are frequently used for studies of vaccine- and infection-induced B cell responses with rhesus macaques (Macaca mulatta) being the most common model. A critical first step in the analyses of antibody responses is the assignment of the antibody sequences to a database of germline immunoglobulin (IG) variable (V), diversity (D) and joining (J) genes to define the antibody VDJ gene usage. However, the IG loci, where the VDJ genes are present, are highly complex and poorly defined in both humans and macaques, especially in the latter. While the number of V, D and J genes are known for humans, this number remains unknown for rhesus macaques despite several databases provided by research groups over the past years. The most comprehensive database available to date, KIMDB (http://kimdb.gkhlab.se), generated by the Karlsson Hedestam laboratory last year (Vazquez Bernat et al. Immunity 2020), contains information from 45 macaques for a better representation of genetic diversity, poviding a foundation for this project.
The heavy chain IG (IGH) locus spans about 1Mb at the telomeric end of chromosome 14 (humans) or 7 (macaques). It is known from studies of the human IGH locus that numerous pseudogenes are interspersed between the functional V genes, and there are frequent duplications and deletions involving multiple IGHV genes, which complicate efforts to assemble genomic sequences spanning this region. At present, three full genomic assemblies are available for the rhesus macaque IGH locus, and these differ between each other at both gene and allele level. The challenges inherent to sequencing the IGH genomic region have led to the development of alternative approaches for immunoglobulin germline allele identification using computational inference tools, such as IgDiscover, which uses data from expressed IgM libraries to identify known and novel IGHV alleles in different species (Corcoran et al. Nature Communications 2016). Inference methods are especially useful to capture allelic diversity in outbred populations, such as humans and macaques, but they do not provide information about the relative position of these genes along the chromosome, which is needed to define genomic coordinates. However, a very useful approach to determine which IGHV allele is present on which chromosome is to use a heterozygous J gene as anchors to separate the V alleles on either the maternal or paternal chromosome, a method referred to as inferred haplotype analysis. This approach provides valuable data on both allelic and structural diversity in the IGH locus, which will be used in this project.

Methods and analyses:
In the current project, we will obtain samples from a set of macaque family cases (mother, father and offspring) from the National primate center in Portland Origon.These samples were assembled by our collaborator, Dr. Ann Hessel. The first task will be to generate IgM libraries from these samples using the method described in Vazquez Bernat et al. Immunity 2020. In brief, mRNA is isolated from total blood cells for cDNA synthesis using an IgM-specific primer. The cDNA synthesis product is purified and used for amplification of the full-length VDJ region. Libraries are sequenced on the Ilumina MiSeq platform available in the Karlsson Hedestam lab for sunsequent analysis by IgDiscover.
Additional wet lab work includes targeted PCR and validation of selected genomic regions.The analysis work will include the use of IgDiscover and modules like "plotallele", but also comparisons of the new data generated from the family cases with existing data previously generated from the 45 macaques. The project may also use human IGH data that the group already has available to obtain a more complete understanding of genetic diversity at the IGH locus. The overall goal is to obtain new understanding about how antibody germline gene diversity is generated.
Time expected for the project
The estimated time for completing this project is 4-5 months. We do not yet know exactly how many family cases Dr. Hessell will be able to assemble. Therefore, we will start with two families to ensure that all methods and the subsequent analysis works well and we will add samples as they become available. The goal is to spend approximately half of the project on wet lab work and half on computational analysis. We also need to plan for sufficient time at the end of the project to write up the results, compile figures describing the results and for Maria to present the work to the group to receive feed-back.

Back to top