Publications

You can also find my articles on my Google Scholar profile.

Conference Papers


PANDAGUARD: Systematic Evaluation of LLM Safety against Jailbreaking Attacks

Published in arxiv, 2025

We introduce PandaGuard and PandaBench, a unified, reproducible framework and benchmark for systematically evaluating LLM jailbreak attacks, defenses, and judges, revealing that no single defense is universally optimal and that judge disagreement significantly affects safety assessments.

Recommended citation: Shen, G., Zhao, D., Feng, L., He, X., Wang, J., Shen, S., ... & Zeng, Y. (2025). PANDAGUARD: Systematic Evaluation of LLM Safety against Jailbreaking Attacks.
Download Paper