Publications

98 results for Trustworthy Generation

Distributional Preference Alignment of LLMs via Optimal Transport
- - Igor Melnyk
  - Youssef Mroueh
  - et al.
- 2024
- NeurIPS 2024
WikiContradict: A Benchmark for Evaluating LLMs on Real-World Knowledge Conflicts from Wikipedia
- - Yufang Hou
  - Alessandra Pascale
  - et al.
- 2024
- NeurIPS 2024
Enhancing Reasoning to Adapt Large Language Models for Domain-Specific Applications
- - Bo Wen
  - Xin Zhang
- 2024
- NeurIPS 2024
Value Alignment from Unstructured Text
- - Inkit Padhi
  - Karthikeyan Natesan Ramamurthy
  - et al.
- 2024
- NeurIPS 2024
Combining Domain and Alignment Vectors to Achieve Better Knowledge-Safety Trade-offs in LLMs
- - Megh Thakkar
  - Yash More
  - et al.
- 2024
- NeurIPS 2024
Value Alignment from Unstructured Text
- - Inkit Padhi
  - Karthikeyan Natesan Ramamurthy
  - et al.
- 2024
- EMNLP 2024
Aligners: Decoupling LLMs and Alignment
- - Mikhail Yurochkin
  - Lilian Ngweta
  - et al.
- 2024
- EMNLP 2024
From Ethics to Implementation: Shaping the Future of AI Governance
- - Vyoma Gajjar
- 2024
- DSS SF 2024
Prompt Exploration with Prompt Regression
- - Michael Feffer
  - Ronald Xu
  - et al.
- 2024
- COLM 2024
Why Don't Prompt-Based Fairness Metrics Correlate?
- - Abdelrahman Zayed
  - Gonçalo Mordido
  - et al.
- 2024
- ACL 2024