My interdisciplinary background in biochemistry and mathematics, my experience in doing research,

My interdisciplinary background in biochemistry and mathematics, my experience in doing research, and my strong interest in studying cancer and protecting public health made me a qualified candidate for the Ph.D. program in Biostatistics at the University of Pittsburgh.

As a Biochemistry major in Grinnell College, I have studied chemicals and enzymes along with their mechanisms in both classrooms and labs. I developed and executed experiments that synthesized and purified biologically active molecules, purified enzymes and determined their activity and inhibitors, or explored regulation of protein synthesis at the genetic level. From these rigorous courses and lab experiences, I gained specialized knowledge of biochemistry that gave me an in-depth understanding of public health data and research questions. Concurrently, I honed my statistical and mathematical abilities. Taking lots of proof-intensive mathematical courses and some application-focused statistical courses in undergraduate school, I’m almost a math major and become proficient in writing proofs and analyzing data. My knowledge in biostatistical theory and methods as well as in real analysis has been further improved during the MS program in Biostatistics at Columbia.

Completing all those courses with a high GPA verified my quality as a student, while my research experience molded me into a self-motivated aspiring scientist. I’m currently working with Dr. McKeague on analyzing accelerometer data. The motivation of this research is to demonstrate the treatment efficacy of a therapy for a nuclear-encoded mitochondrial disease. A typical symptom of such mitochondrial disease is muscle weakness, so having those patients wearing accelerometers can efficiently record their level of activity. By comparing the level of activity at later visits with earlier visits, we can tell if the therapy is alleviating the patient’s symptom of muscle weakness. However, since the accelerometer collects data at 20Hz frequency, there are too many data points in the raw accelerometer data, and the comparison is also hard to make. Therefore, we transformed the raw data into an occupation time curve for further analysis. Occupation time is defined as the amount of time a stochastic process spends above a certain level. In our case, we are transforming the raw scatter plot with time as the x-axis and the activity level that the accelerometer collected as the y-axis into an occupation time curve with the activity level as the x-axis and the time spent above that level as the y-axis. The major challenge in this project is processing the data. One visit of each patient is one set of data. One set of data contains millions of data points, so it often takes me an hour to process it in R on my laptop, and we have 40 sets of data right now with new data coming occasionally. Once we decided to make some new attempts in analyzing the data, I had to process all of them from the beginning, and that required lots of patience and appropriate time management since I had other courses that need R concurrently.

I’m also working with Dr. McDonald on analyzing 2001-2018 NHANES data and revealing the temporal trend in the exposure of phthalates among post-menopausal women. The challenging part of this research is setting up a research question carefully, understanding survey sampling in-depth, and properly analyzing the data. The first question is how to select the population of interest. We knew that as a prevalent class of endocrine-disrupting chemicals, phthalates interfere with hormonal growth and development, so exposure at menopause could be a particular concern. Since NHANES only provide survey data, it is hard to tell if the women took the survey were at menopause during that specific cycle – they only answered if they are post-menopausal and their age at menopause. It is meaningless to include a person’s phthalates examination data in our analysis if she experienced menopause ten years ago from screening. To accurately select the population of interest, we decided only to include women whose age at menopause is not more than 5 years smaller than their age at screening. I also did intensive research to classify natural menopause and surgical menopause appropriately. Moreover, NHANES has a complex, multistage, probability sampling design, so we must account for sampling weights and sample design variables in our analysis. Therefore, we used ‘proc surveyreg’ function in SAS to build a multivariable linear regression model and generate the marginal least squared geometric means for urinary phthalates level in each cycle adjusted for age. We built four additional models stratified by race and ethnicity, social economic status, nativity, and intersectional identity to explore if the temporal trend of phthalate exposure varied by socially disadvantaged group status. Based on my understanding of the data, some alternative directions of future studies could be finding the association between phthalates exposure and early menopause, and building a logistic regression model to measure the association between phthalate exposure menopause type (surgical or natural).

I am confident that my interdisciplinary academic background along with my research experience have well prepared me for my next chapter in the Ph.D. program in Biostatistics at the University of Pittsburgh School of Public Health. Besides my background, I also have a strong inner motivation to make contributions to public health. The death of my beloved family member from lung cancer in 2016 was a significant event that changed my life. In my years of painfully watching her suffer from cancer, some questions also arose in my mind: How many people are experiencing the same pain as us? How many families are having even worse experiences? Nothing is more important to us than the welfare of our families, and I can think of no greater cause than protecting the health of our loved ones from disease. That is the reason I decided to study biochemistry in college and biostatistics in grad school. I would like to contribute the rest of my life to protecting public health from diseases like cancer, and I believe pursuing a Ph.D. degree will further build up my capacity to make more significant contributions to public health and save more people’s lives.

My career goal is to work as a biostatistician in a cancer research institute or any institute that would allow me to work on cancer research, and I believe that the Ph.D. program in Biostatistics at the University of Pittsburgh will help me achieve my goal. This program is especially attractive to me because of its resources and opportunities in doing cancer research. The NRG Oncology Statistics and Data Management Center and the UPMC Hillman Cancer Center are staffed and directed by Biostatistics faculty members at the University of Pittsburgh. Additionally, many professors in this department are studying cancer using a variety of statistical methodologies. I’m especially interested in Dr. Bandos research in applying statistical methods to clinical trials evaluating therapies for the prevention and treatment of breast and colorectal cancer.

I relish every challenge that has molded me into the determined and prepared aspiring scientist that I am today, and I think the Ph.D. program in Biostatistics at University of Pittsburgh will be the right home for my research goals and aspirations.