BenchmarkCards: Standardized Documentation for Large Language Model BenchmarksAnna SokolElizabeth Dalyet al.2025NeurIPS 2025
RL Tango: Reinforcing Generator and Verifier Together for Language ReasoningKaiwen ZhaZhengqi Gaoet al.2025NeurIPS 2025
Forging Time Series with Language: A Large Language Model Approach to Synthetic Data GenerationCécile RousseauTobia Boschiet al.2025NeurIPS 2025
MermaidSeqBench: An Evaluation Benchmark for LLM-to-Mermaid Sequence Diagram GenerationBasel ShbitaFarhan Ahmedet al.2025NeurIPS 2025
Foundation Models Enabling Multi-Scale Battery Materials Discovery: From Molecules To DevicesVidushi SharmaAndy Teket al.2025NeurIPS 2025
Verifiable Chemical Reasoning through Tool-Calling Agentic WorkflowGabrielle GaudeauShinnosuke Tanakaet al.2025NeurIPS 2025
Toward a Coherent Virtual Cell Model: Probing Biological World-Model Coherence in Transcriptomic Foundation ModelsNoa MorielYishai Shimoniet al.2025NeurIPS 2025
Leveraging Large Language Models for Suitability Assessment of Electrolytes for Rechargeable BatteriesSina Asuka KlamptSeiji Takedaet al.2025MRS Fall Meeting 2025