Leveraging Offline Public Data in Online Differently Private Policy Fine-Tuning (Prof. Sayak Chowdhury, Computer Science & Engineering)
Modern machine learning models often train on offline data and then learn from online user interactions, raising privacy concerns—especially during fine-tuning stages that involve sensitive data. Differential Privacy (DP) mitigates these risks by adding noise to training, though this can hurt accuracy. Using offline public data helps reduce this trade-off. This project aims to design DP-compliant bandit and reinforcement learning algorithms using such data, with theoretical performance guarantees, and compare them to offline and online baselines. It also seeks to develop DP policy fine-tuning for aligning large language models, ultimately enabling privacy-preserving, trustworthy AI systems such as secure chatbots.