Distributional Preference Alignment of LLMs via Optimal TransportIgor MelnykYoussef Mrouehet al.2024NeurIPS 2024
WikiContradict: A Benchmark for Evaluating LLMs on Real-World Knowledge Conflicts from WikipediaYufang HouAlessandra Pascaleet al.2024NeurIPS 2024
Enhancing Reasoning to Adapt Large Language Models for Domain-Specific ApplicationsBo WenXin Zhang2024NeurIPS 2024
Value Alignment from Unstructured TextInkit PadhiKarthikeyan Natesan Ramamurthyet al.2024NeurIPS 2024
Combining Domain and Alignment Vectors to Achieve Better Knowledge-Safety Trade-offs in LLMsMegh ThakkarYash Moreet al.2024NeurIPS 2024