PyTorch Native Online Dynamic Reward Based Data Mixing Framework

Amal Joe R S; Mehant Kammakomati; Padmanabha Venkatagiri Seshadri; Romit Jain; Ashok Kumar; Praveen Jayachandran

PyTorch Conference 2025

Poster

22 Oct 2025

PyTorch Native Online Dynamic Reward Based Data Mixing Framework

View code

Abstract

Fine-tuning on multiple datasets? Static mixing with pre-determined percentages can often lead to overfitting and demands extensive ablations for the right mix. Dynamic data mixing addresses this using signals/rewards like training loss. While this has been studied (aclanthology.org/2024.emnlp-main.787) in research, full-fledged tooling is limited. In this session, we present a PyTorch-native (uses DataLoader and IterableDataset), online, reward-based data mixing framework that is: (a) composable with existing training loops with minimal code changes, (b) plug-and-play with user-defined mixing strategies and rewards, and (c) compatible with distributed training. We demonstrate its flexibility through 5 reward-driven data mixing recipes and its scalability via large-scale multi-GPU experiment with insights on mixing. We believe our session will motivate PyTorch developers to adopt our framework for their use cases involving multiple finetuning datasets. The code is available at github.com/foundation-model-stack/fms-hf-tuning/tree/online-dyn-reward-data-mixing.

Paper