Research — Sikha Pentyala

Research Overview

I work on Responsible and Trustworthy AI — developing methods and frameworks to ensure AI systems are private, fair, robust, and transparent. My work spans the design, auditing, and application of trustworthy AI systems, with a focus on high-impact domains.

My current research work spans four directions:

Synthetic Data & Digital Twins — generating privacy-preserving (tabular) synthetic datasets, longitudinal synthetic datasets, multimodal synthetic datasets, and constructing digital twins for simulation and analysis
Trustworthy & Responsible AI — auditing and advancing AI systems across the pillars of trustworthiness: fairness, privacy, robustness, and transparency
Privacy & Context — grounding privacy and fairness guarantees in social and situational context; bridging formal methods with human norms, with growing emphasis on agentic AI systems where contextual boundaries are dynamic
AI for Good — applying AI to high-impact domains including healthcare (clinical NLP, synthetic data, digital twins) and education (LLM-assisted learning and personalization)

Projects

Ongoing Projects

Ongoing Privacy-Preserving Synthetic Genomics Data Generation on the NAIRR

Generating differentially private synthetic genomic data for open biomedical research on national AI research infrastructure leveraging Privacy-Enhancing Technologies (PETs) such as Multi-Party Computation (MPC), enabling cross-institutional data sharing without exposing individual genetic records.

Synthetic Data PETs Genomics NAIRR

Ongoing Synthetic Genomic Data Generation for Rare Diseases

Developing privacy-preserving synthetic genomic data generation methods for rare diseases, with applications to Neurofibromatosis Type 1 (NF1), to address data scarcity challenges in rare disease research.

Synthetic Data Genomics Healthcare Generative Models

Ongoing Federated Digital Twins

Developing methods for constructing longitudinal digital twins in centralized and in federated settings. Federated Settings will then leveraging Privacy-Enhancing Technologies (PETs) such as Multi-Party Computation (MPC) and Fully Homomorphic Encryption (FHE) to enable privacy-preserving collaboration across institutions.

Digital Twins PETs Generative Models Healthcare

Ongoing AI-based Conversation Systems

Designing LLM-based conversational agents for two high-impact applications: (1) assisting TB patients with adherence, monitoring, and support; and (2) developing AI tutors for healthcare professionals, with integrated responsible AI modules covering privacy, bias, and harm mitigation.

LLMs and Agents Healthcare AI Education

Ongoing Synthetic Data Multiplicity

Studying the effects of generating multiple independent synthetic datasets from the same source, examining how utility, privacy, and bias vary across different runs and generation sizes — with implications for the reliability and trustworthiness of generated synthetic data.

Synthetic Data Robustness Privacy Multiplicity

Publication list — with filter by topic, thumbnails, and workshop/preprint/patent entries — is available on the Publications page.