DOVE: A Large-Scale Multi-Dimensional Predictions Dataset Towards Meaningful LLM EvaluationEliya HabbaOfir Arvivet al.2025ACL 2025
Unitxt: Flexible, Shareable and Reusable Data Preparation and Evaluation for Generative AIElron BandelYotam Perlitzet al.2024NAACL 2024
ACHIEVING HUMAN PARITY IN CONTENT-GROUNDED DATASETS GENERATIONAsaf YehudaiBoaz Carmeliet al.2024ICLR 2024
Zero-shot Topical Text Classification with LLMs - an Experimental StudyAvishai GretzAlon Halfonet al.2023EMNLP 2023