resume
General Information
Full Name | Jessica Yan Wang |
Languages | English, Chinese (Mandarin) |
jessicawang@g.harvard.edu |
Industry/Academic Interests
- Data Science, Naltural Language Processing (NLP), AI in Healthcare, Statistical Analysis
Education
- 2024-2025 (E)
Master's in Data Science
Harvard John A. Paulson School of Engineering and Applied Sciences
- GPA 4.0/4.0
- 2019-2024
Honors Bachelor of Science (HBSc)
University of Toronto, Toronto, Canada
- Data Science Major & Mathematics Minor
- GPA 3.99/4.0
- Dean's List Scholar (2020-2023)
- Data Science Major & Mathematics Minor
Work and Research Experience
- May 2023 - Present
Research Intern - Supervised by Prof. Rahul Krishnan
Vector Institute, Toronto, Canada
- Developed binary classification models using XGBoost to predict 14-day in-hospital mortality from the GEMINI dataset with 2.2 billion+ clinical data points, assessing model performance across patient groups defined by Social Determinants of Health.
- Analyzed how distribution shifts affected model performance, observing up to a 3.5% AUC improvement when deploying models trained on more socially-diverse subgroups to less socially-diverse subgroups.
- Enhanced model interpretability through SHAP analysis, identifying key features influencing predictions across diverse patient subgroups and demonstrating the need for diverse training data to improve AI model generalizability.
- Jan 2022 - Present
Research Intern - Supervised by Prof. Fanny Chevalier
University of Toronto Dynamic Graphics Project Lab, Toronto, Canada
- Developed and deployed an augmented reality application on Microsoft HoloLens to assist presenters during Q&A sessions by analyzing audience audio input and presentation context. The system generates real-time answers and relevant content to help presenters respond more effectively on the spot.
- Fine-tuned a BigBird-based question-answering model and DistilRoBERTa-based sentence transformer on the Huggingface Adversarial QA Dataset, achieving a 27% improvement in Exact Match score over the baseline model.
- Streamlined the pipeline from speech recognition and presentation context processing to answer generation and relevant content extraction. Integrated the outputs using GPT-3.5, and displayed the final response on the HoloLens screen via Unity.
- May 2022 - Dec 2022
Technical Sales Specialist Intern
IBM (Data & AI Team), Toronto, Canada
- Performed interactive product demos and provided support for external client Hackathons to showcase IBM’s AI solutions, leveraging a comprehensive understanding of MLOps, trustworthy AI, and cloud integration.
- Winner of the Intern Challenge - an internal client presentation competition amongst interns.
- Sep 2021 - Apr 2022
Business Intelligence Research Analyst Intern
Enverus, Calgary, Canada
- Analyzed energy operator portfolios using historical data to predict future performance, building regression models and conducting time series analysis in R to optimize strategic decision-making.
- Automated the daily model audit process using Python, reducing manual effort by 30% and improving operational efficiency.
- Employed Tableau to visualize large-scale data, transforming raw data into actionable insights and creating comprehensive reports that effectively communicated trends and strategic implications to stakeholders.
Projects
- Apr 2023
Twitter Sentiment Analysis Under COVID-19 (Python)
- Built a Naive Bayes classifier to predict the popularity of political tweets, incorporating retweets, favorites, and follower counts, achieving a 48% accuracy improvement over the baseline.
- Conducted EDA on 10,000+ tweets, engineering a custom "Popularity Score" to normalize engagement metrics and identify key trends, using Pandas and Matplotlib for data analysis and visualization.
- Nov 2022 - Dec 2022
Education Platform Student Answer Prediction (Python)
- Implemented an Item Response Theory Model to predict correct responses, modeling student ability against question difficulty to generate probability distributions.
- Optimized predictions by experimenting with Autoencoders, Matrix Factorization, Neural Networks, and Ensembles.
- Nov 2022 - Dec 2022
ASA DataFest(R) - "Best Insight" Winning Team of 2022
- Built multilevel linear mixed-effect models and correlation graphs to investigate how a player’s gaming experience of the educational game “Elm City Stories” relates to their improved efficacy in resisting drug.
- Evaluated the educational effectiveness of each game chapter and identified characteristics of the players who benefited more from the game for future game development.
Honors and Awards
- 2023
- T. A. Reed Scholarship - Top cGPA in the Innis College, University of Toronto
- The Undergraduate Student Research Awards, NSERC
- 2022
- First Place in the ASA DataFest, American Statistical Association and University of Toronto
- 2021
- Later Life Learning Scholarship - Top 3% in the Innis College, University of Toronto
- University of Toronto In-Course Scholarship, University of Toronto