resume

General Information

Full Name Jessica Yan Wang
Languages English, Chinese (Mandarin)
Email jessicawang@g.harvard.edu

Industry/Academic Interests

  • Data Science, Naltural Language Processing (NLP), AI in Healthcare, Statistical Analysis

Education

  • 2024-2025 (E)
    Master's in Data Science
    Harvard John A. Paulson School of Engineering and Applied Sciences
    • GPA 4.0/4.0
  • 2019-2024
    Honors Bachelor of Science (HBSc)
    University of Toronto, Toronto, Canada
    • Data Science Major & Mathematics Minor
      • GPA 3.99/4.0
      • Dean's List Scholar (2020-2023)

Work and Research Experience

  • May 2023 - Present
    Research Intern - Supervised by Prof. Rahul Krishnan
    Vector Institute, Toronto, Canada
    • Developed binary classification models using XGBoost to predict 14-day in-hospital mortality from the GEMINI dataset with 2.2 billion+ clinical data points, assessing model performance across patient groups defined by Social Determinants of Health.
    • Analyzed how distribution shifts affected model performance, observing up to a 3.5% AUC improvement when deploying models trained on more socially-diverse subgroups to less socially-diverse subgroups.
    • Enhanced model interpretability through SHAP analysis, identifying key features influencing predictions across diverse patient subgroups and demonstrating the need for diverse training data to improve AI model generalizability.
  • Jan 2022 - Present
    Research Intern - Supervised by Prof. Fanny Chevalier
    University of Toronto Dynamic Graphics Project Lab, Toronto, Canada
    • Developed and deployed an augmented reality application on Microsoft HoloLens to assist presenters during Q&A sessions by analyzing audience audio input and presentation context. The system generates real-time answers and relevant content to help presenters respond more effectively on the spot.
    • Fine-tuned a BigBird-based question-answering model and DistilRoBERTa-based sentence transformer on the Huggingface Adversarial QA Dataset, achieving a 27% improvement in Exact Match score over the baseline model.
    • Streamlined the pipeline from speech recognition and presentation context processing to answer generation and relevant content extraction. Integrated the outputs using GPT-3.5, and displayed the final response on the HoloLens screen via Unity.
  • May 2022 - Dec 2022
    Technical Sales Specialist Intern
    IBM (Data & AI Team), Toronto, Canada
    • Performed interactive product demos and provided support for external client Hackathons to showcase IBM’s AI solutions, leveraging a comprehensive understanding of MLOps, trustworthy AI, and cloud integration.
    • Winner of the Intern Challenge - an internal client presentation competition amongst interns.
  • Sep 2021 - Apr 2022
    Business Intelligence Research Analyst Intern
    Enverus, Calgary, Canada
    • Analyzed energy operator portfolios using historical data to predict future performance, building regression models and conducting time series analysis in R to optimize strategic decision-making.
    • Automated the daily model audit process using Python, reducing manual effort by 30% and improving operational efficiency.
    • Employed Tableau to visualize large-scale data, transforming raw data into actionable insights and creating comprehensive reports that effectively communicated trends and strategic implications to stakeholders.

Projects

  • Apr 2023
    Twitter Sentiment Analysis Under COVID-19 (Python)
    • Built a Naive Bayes classifier to predict the popularity of political tweets, incorporating retweets, favorites, and follower counts, achieving a 48% accuracy improvement over the baseline.
    • Conducted EDA on 10,000+ tweets, engineering a custom "Popularity Score" to normalize engagement metrics and identify key trends, using Pandas and Matplotlib for data analysis and visualization.
  • Nov 2022 - Dec 2022
    Education Platform Student Answer Prediction (Python)
    • Implemented an Item Response Theory Model to predict correct responses, modeling student ability against question difficulty to generate probability distributions.
    • Optimized predictions by experimenting with Autoencoders, Matrix Factorization, Neural Networks, and Ensembles.
  • Nov 2022 - Dec 2022
    ASA DataFest(R) - "Best Insight" Winning Team of 2022
    • Built multilevel linear mixed-effect models and correlation graphs to investigate how a player’s gaming experience of the educational game “Elm City Stories” relates to their improved efficacy in resisting drug.
    • Evaluated the educational effectiveness of each game chapter and identified characteristics of the players who benefited more from the game for future game development.

Honors and Awards

  • 2023
    • T. A. Reed Scholarship - Top cGPA in the Innis College, University of Toronto
    • The Undergraduate Student Research Awards, NSERC
  • 2022
    • First Place in the ASA DataFest, American Statistical Association and University of Toronto
  • 2021
    • Later Life Learning Scholarship - Top 3% in the Innis College, University of Toronto
    • University of Toronto In-Course Scholarship, University of Toronto