Vintage Code, Modern Judges: Meta-Validation in Low Data RegimesGal AmramOra Nova Fandinaet al.2025ASE 2025
Exploring Straightforward Methods for Automatic Conversational Red-TeamingGeorge KourNaama Zwerdlinget al.2025NAACL 2025
Unveiling Safety Vulnerabilities of Large Language ModelsGeorge KourMarcel Zalmanoviciet al.2023EMNLP 2023