Light Alignment Improves LLM Safety via Model Self-Reflection with a Single Neuron
Published in arXiv, 2026
A lightweight safety-aware decoding method that improves LLM safety with a single-neuron gate and low training cost.
Recommended citation: Shen, S., Lv, M., Shen, H., Wu, J., Wang, B., Yang, Z., Shen, G., Zhao, D., Zhao, F., & Zeng, Y. (2026). Light Alignment Improves LLM Safety via Model Self-Reflection with a Single Neuron.
Download Paper
