Projects
2023-25
DNA Methylation-based Risk Stratification and Classification of Pediatric Thyroid Carcinoma
R
Shell scripting
EPICv2 Methylation Arrays
HPC
Thyroid carcinoma is the most common endocrine malignancy in children, and current guidelines recommend total thyroidectomy for nearly all pediatric cases. While effective, the procedure carries higher complication risks in children, including hypoparathyroidism and nerve injury. Improved preoperative diagnostics could reduce unnecessary surgeries and lifelong hormone dependence. Existing imaging-based approaches are subjective and variable. In this study, we demonstrate that genome-wide DNA methylation profiling robustly captures molecular features of pediatric thyroid carcinoma, including invasiveness and driver mutations. These findings support the potential of DNA methylation as a preoperative prognostic tool to inform treatment decisions and minimize surgical risk.
FALL 2025
Exome-Wide Association Study of Breast Cancer Risk in Penn Medicine Biobank
Python
R
Shell scripting
REGENIE
PLINK
Exome
LPC
Breast cancer susceptibility is influenced not only by common genetic variation but also by rare, protein-altering variants with potentially large effects that are not well captured by traditional genome-wide association studies. Whole-exome sequencing in large, clinically linked biobanks enables systematic interrogation of these rare variants at scale. In this study, we use whole-exome sequencing data from the Penn Medicine Biobank to perform an exome-wide association study of breast cancer risk using both single-variant and gene-based aggregation approaches. Using ICD-9/10 codes for case/control selection, functional variant annotation, and multiple burden masks within a mixed-model framework, we assess the contribution of rare coding variation to breast cancer susceptibility. This work establishes a scalable and reproducible exome analysis pipeline and contributes rare variant association results to the SIMPLEXO breast cancer project, supporting gene-level discovery in large biobank cohorts

SUMMER 2025
Ancestry-Stratified Genome-Wide Association Study of Breast Cancer Risk in Penn Medicine Biobank
Python
R
Shell scripting
REGENIE
PLINK
LPC
Breast cancer is a leading cause of cancer morbidity among women worldwide. Large biobanks linked to electronic health records provide an opportunity to address these gaps by enabling ancestry-aware genetic discovery at scale. In this study, I used imputed genotype sequencing data from the Penn Medicine Biobank to perform an ancestry-stratified genome-wide association study of breast cancer risk. Using clinical ICD-9/10 codes for case/control selection and genetic ancestry inference, I analyzed breast cancer susceptibility across diverse populations using a REGENIE, a mixed-model framework that accounts for population structure, relatedness, and technical confounders. These results establish a reproducible pipeline for integrating biobank genomic data with clinical phenotypes and contribute ancestry-informed association results to the Confluence Project, which supports robust discovery of genetic risk factors underlying breast cancer.
Presentations
Summer 2025
Methylation-based Prognosis of Pediatric Thyroid Carcinoma Invasiveness
American Physician Scientists Association Mid-Atlantic Conference
SPRING 2025
DNA methylation-based stratification of pediatric thyroid tumor invasiveness
University of Pennsylvania Spring Research Exposition & Women in Stem Symposium
FALL 2023
Cross-platform DNA methylome-based cancer classification
MidAtlantic Bioinformatics Conference
FALL 2023
Exploration of feature selection strategies in DNA methylome-based cancer classification
University of Pennsylvania Fall Research Exposition

