Research Overview
My current research work spans four directions:
- Synthetic Data & Digital Twins — generating privacy-preserving (tabular) synthetic datasets, longitudinal synthetic datasets, multimodal synthetic datasets, and constructing digital twins for simulation and analysis
- Trustworthy & Responsible AI — auditing and advancing AI systems across the pillars of trustworthiness: fairness, privacy, robustness, and transparency
- Privacy & Context — grounding privacy and fairness guarantees in social and situational context; bridging formal methods with human norms, with growing emphasis on agentic AI systems where contextual boundaries are dynamic
- AI for Good — applying AI to high-impact domains including healthcare (clinical NLP, synthetic data, digital twins) and education (LLM-assisted learning and personalization)
Projects
Generating differentially private synthetic genomic data for open biomedical research on national AI research infrastructure leveraging Privacy-Enhancing Technologies (PETs) such as Multi-Party Computation (MPC), enabling cross-institutional data sharing without exposing individual genetic records.
Developing privacy-preserving synthetic genomic data generation methods for rare diseases, with applications to Neurofibromatosis Type 1 (NF1), to address data scarcity challenges in rare disease research.
Developing methods for constructing longitudinal digital twins in centralized and in federated settings. Federated Settings will then leveraging Privacy-Enhancing Technologies (PETs) such as Multi-Party Computation (MPC) and Fully Homomorphic Encryption (FHE) to enable privacy-preserving collaboration across institutions.
Designing LLM-based conversational agents for two high-impact applications: (1) assisting TB patients with adherence, monitoring, and support; and (2) developing AI tutors for healthcare professionals, with integrated responsible AI modules covering privacy, bias, and harm mitigation.
Studying the effects of generating multiple independent synthetic datasets from the same source, examining how utility, privacy, and bias vary across different runs and generation sizes — with implications for the reliability and trustworthiness of generated synthetic data.
Publication list — with filter by topic, thumbnails, and workshop/preprint/patent entries — is available on the Publications page.