Publications
Peer-reviewed journals, conference proceedings, and preprints
Medical X-Ray Image Enhancement Using G-CLAHE
Proposed G-CLAHE combining global and local histogram equalization for medical X-rays, achieving 17% improvement in diagnostic accuracy for chest X-ray abnormality detection and outperforming state-of-the-art methods across multiple quality metrics.
AEGIS: A Correlation-Based Data Masking Advisor for Data Sharing Ecosystems
Model Reusability in Reinforcement Learning
Proposed the first principled study of RL model reusability, developing a graph-based framework for Temporal Difference Learning and Deep-RL algorithms. Demonstrated same-quality results as policies trained from scratch with significant efficiency gains.
Utility-Aware Human–LLM Agent Orchestration for Data Science Pipelines
Novel framework for agentic collaboration between humans and LLMs across the data science pipeline to maximize utility of downstream ML tasks.
LLM-Powered Best Set Recommendation
Probabilistic Package Selection on Multi-Modal Data
Introduces algorithms that minimize expensive LLM oracle calls while guaranteeing optimal or near-optimal solutions for probabilistic package selection over multi-modal data.
Exploring Humans and LLMs in the Data Science Pipeline
Lower Bound Distance Queries Without a Distance Oracle
Suite of algorithmic techniques with provable guarantees for answering lower bound distance queries over metric space graphs with unknown edge distances.
Top-k Set Queries with User-Specified Scoring Functions on Multi-Modal Data
Studies the applicability of LLMs as external oracles for answering top-k queries over predicted scores with a probabilistic computational framework achieving order-of-magnitude improvement over baselines.
