Hi there! I am Xiangyu Qi (漆翔宇), a third-year Ph.D. candidate in the Department of Electrical and Computer Engineering at Princeton University, where I am advised by Prof. Prateek Mittal. Before that, I received my B.S. degree from the CS Department at Zhejiang University (June 2021). I also worked with Prof. Bo Li as a research intern at Secure Learning Lab.
My current research focuses on Machine Learning Safety and Security, with two main objectives: (1) To decipher the fundamental vulnerabilities prevalent in ML systems, and (2) To devise strategies that can counter these vulnerabilities, thereby contributing to the development of robust and trustworthy ML systems. Surrounding these two objectives, my research has covered multiple threads of Adversarial Machine Learning (Adv ML), including adversarial examples [2,6], data poisoning and backdoor attacks [3,4,5]. As the field of ML evolves with the introduction of large-scale foundation models and a concerted push towards AGI, my recent work [1,2] has also expanded to explore the tangible safety and security challenges within the sphere of AI alignment, with the ultimate objective of spurring robust and practical solutions that contribute to effective alignment infrastructures.
If you share similar interests, please feel free to reach out via xiangyuqi@princeton.edu. I am happy to chat and open to exploring opportunities for collaboration.
Selected Research
[1] (Preprint) Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!
Xiangyu Qi$^* $, Yi Zeng$^* $, Tinghao Xie$^* $, Pin-Yu Chen, Ruoxi Jia, Prateek Mittal, Peter Henderson
Media Coverage- The New York Times: Researchers Say Guardrails Built Around A.I. Systems Are Not So Sturdy
- The Register: AI safety guardrails easily thwarted, security study finds
- VentureBeat: Uh-oh! Fine-tuning LLMs compromises their safety, study finds
Highlight
- While existing safety alignment infrastructures can restrict harmful behaviors of LLMs at inference time, they do not cover safety risks when fine-tuning privileges are extended to end-users.
- We show that: (1) the safety guardrails of GPT-3.5 can be largely removed by fine-tuning with only 10 adversarially designed training examples, a cost of less than $0.20; (2) fine-tuning aligned models with even completely benign datasets might also accidentally compromise safety.
- Our work underscores a current trade-off between LLMs customization (for downstream applications) and the ensuing safety risks that correspondingly arise.
[2] (Preprint & AdvML-Frontiers 2023 @ ICML, 2023 | oral presentation)
Visual Adversarial Examples Jailbreak Aligned Large Language Models
Xiangyu Qi$^* $, Kaixuan Huang$^* $, Ashwinee Panda, Peter Henderson, Mengdi Wang, Prateek Mittal
GPT-4V(ision) system card cited this paper to underscore the emerging threat vector of multimodal jailbreaking.
[Code]
Highlight
- Multimodality unavoidably expands attack surfaces, making the systems more vulnerable against adversarial attacks.
- Visual adversarial examples (that still have not been addressed after a decade of research) can be a fundamental adversarial challenge against AI alignment.
[3] (USENIX Security 2023) Towards A Proactive ML Approach for Detecting Backdoor Poison Samples
Xiangyu Qi, Tinghao Xie, Jiachen T. Wang, Tong Wu, Saeed Mahloujifar, Prateek Mittal
[Code]
Highlight: We formulate a proactive mindset for detecting backdoor poison samples in poisoned datasets, along with a concrete proactive method (Confusion Training) that effectively defeats a diverse set of 14 types of backdoor poisoning attacks.
[4] (ICLR 2023) Revisiting the Assumption of Latent Separability for Backdoor Defenses
Xiangyu Qi$^* $, Tinghao Xie$^* $, Yiming Li, Saeed Mahloujifar, Prateek Mittal
[Code]
Highlight: Latent separability between clean and backdoor poison samples is pervasive and even used as a default assumption for designing defenses. But, we show that this is not necessarily true — we design adaptive backdoor poisoning attacks that can suppress the latent separation.
[5] (CVPR 2022 | oral presentation, 4.2%) Towards Practical Deployment-Stage Backdoor Attack on Deep Neural Networks
Xiangyu Qi$^* $, Tinghao Xie$^* $, Ruizhe Pan, Jifeng Zhu, Yong Yang, Kai Bu
[Code]
Highlight: Given any neural network instance (regardless of its specific weights values) of a certain architecture, we can embed a backdoor into that model instance, by replacing a very narrow subnet of it with a malicious backdoor subnet.
[6] (ICML 2021) Knowledge Enhanced Machine Learning Pipeline against Diverse Adversarial Attacks
Nezihe Merve Gürel$^*$, Xiangyu Qi$^* $, Luka Rimanic, Ce Zhang, Bo Li
Highlight: Embedding domain knowledge and logic reasoning into the ML pipeline has a promising potential for improving model robustness.