Date of Award

Spring 2006

Document Type

Restricted Thesis

Terms of Use

© 2006 Jessica L. Larson. All rights reserved. Access to this work is restricted to users within the Swarthmore College network and may only be used for non-commercial, educational, and research purposes. Sharing with users outside of the Swarthmore College network is expressly prohibited. For all other uses, including reproduction and distribution, please contact the copyright holder.

Degree Name

Bachelor of Arts


Biology Department, Mathematics & Statistics Department

First Advisor

Kathleen King Siwicki

Second Advisor

Amy Cheng Vollmer


The Lyme disease causing bacterium Borrelia burgdorferi sensu stricto is under constant pressure to combat its hosts' immune system. Thus, those genes that code for proteins found on the surface of these bacteria (which interact with mammalian immune defenses) have many variations. Certain genes (called hypervariables) show more variability at the DNA sequence level than others. Using data from 12 sequenced hypervariable loci in 31 different clinical and tick isolates, I estimated the level of multilocus linkage in B. burgdorferi. The purpose of this study was to determine if this genetic linkage is due to a lack of recombination or the effects of natural selection. I first used Index of Association, IA, calculations to analyze data from both sequenced observations and random stimulations in an effort to determine if these 12 genes were actually associated with each other. I also developed a statistical test for multiloci linkage based on Shannon's Information Theory. I used the conditional entropy (the randomness about X given Y, H(X|Y)), to determine the linkage (or association) level of the genes in question. I found from my IA study that once the effects of familiar links and neighboring genes were removed, these genes do not appear to be linked. However, for my set of genes and isolates of B. burgdorferi, I determined that one gene in particular (that which codes for outer surface protein C, ospC) is good at predicting the sequence variation at eleven other genes.¹ That is, the conditional entropy is roughly equivalent to the entropy, H(X), for all X, given the allele type at ospC. Together, my studies provide evidence to support the hypothesis that these 12 genes are associated with each other as a result of natural selection.²