[ICML 2025] Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment (https://arxiv.org/abs/2410.02197)
-
Updated
Jun 15, 2026 - Python
[ICML 2025] Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment (https://arxiv.org/abs/2410.02197)
PipelineLLM 是一个系统性的大语言模型(LLM)后训练学习项目,涵盖从监督微调(SFT)到偏好优化(DPO)、强化学习(RLHF/PPO/GRPO)再到持续学习(Continual Learning)的完整技术栈。
[TMLR] Triple Preference Optimization
Symmetrical Visual Contrastive Optimization: Aligning Vision-Language Models with Minimal Contrastive Images
A lightweight post-training framework for LLMs and VLMs. 51 algorithms, 38 verified models. Scales with DeepSpeed, vLLM, and Ray.
[ACL 2025] Uncovering the Impact of Chain-of-Thought Reasoning for Direct Preference Optimization: Lessons from Text-to-SQL
Code for ICLR 2025 Paper: Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data
[ICML 2025] TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization
[TMLR] Dual Caption Preference Optimization
Pre-DPO: Improving Data Utilization in Direct Preference Optimization Using a Guiding Reference Model
🔬 Official implementation of ExPO-HM: Learning to Explain-then-Detect for Hateful Meme Detection (ICLR 2026). Novel multimodal RL approach for interpretable and explainable content moderation.
Convex Optimization for Alignment and Preference Learning on a Single GPU
This my home rig testing process for creating evaluation metric, testing models, automating prompt creation in accordance to the evaluation results of last run and reviewing logs. its local first, independent of any specific tool and logs locally.
Open-source research engineering project for building the end-to-end post-training stack for reasoning language models, including SFT, preference learning, RLHF/RLVR, evaluation, inference-time scaling, and scalable systems for frontier-level reasoning.
🩺 Token-Level Flow-Guided Preference Optimization for Efficient Test-Time Alignment (AAAI 2026)
Novel Preference Optimization Algorithms for state-of-the-art small LMs, enhancing performance in GenAI and NLP tasks
AIWG training-complete framework — corpus-to-dataset pipeline with SKILL.md agentic surface and optional Python runtime backend. Marketplace plugin for AIWG.
A weekly updated awesome list of RL, RLHF, DPO, GRPO, reward models, and preference optimization for image and video diffusion generation.
Direct multi-agent policy optimization — unified DPO/KTO/ORPO/SimPO framework.
Preference optimization framework for text classification (DPO/ORPO/KTO), with SFT, encoder, and XGBoost baselines plus unified run pipeline and reproducible outputs.
Add a description, image, and links to the preference-optimization topic page so that developers can more easily learn about it.
To associate your repository with the preference-optimization topic, visit your repo's landing page and select "manage topics."