preference-optimization

Here are 26 public repositories matching this topic...

general-preference / general-preference-model

[ICML 2025] Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment (https://arxiv.org/abs/2410.02197)

alignment large-language-models rlhf preference-modeling preference-optimization

Updated Jun 15, 2026
Python

iBacklight / PipelineLLM

Star

PipelineLLM 是一个系统性的大语言模型（LLM）后训练学习项目，涵盖从监督微调（SFT）到偏好优化（DPO）、强化学习（RLHF/PPO/GRPO）再到持续学习（Continual Learning)的完整技术栈。

reinforcement-learning lora fine-tuning post-training continual-learning sft rlhf llm-reasoning preference-optimization llm-infrastructure llm-processing

Updated Jan 16, 2026
Python

sahsaeedi / TPO

Star

[TMLR] Triple Preference Optimization

alignment large-language-models rlhf preference-optimization

Updated Feb 19, 2025
Python

s-vco / s-vco

Star

Symmetrical Visual Contrastive Optimization: Aligning Vision-Language Models with Minimal Contrastive Images

alignment-algorithms vision-language-models preference-optimization

Updated Jun 4, 2025
Python

warlockee / oxRL

Star

A lightweight post-training framework for LLMs and VLMs. 51 algorithms, 38 verified models. Scales with DeepSpeed, vLLM, and Ray.

reinforcement-learning alignment post-training dpo deepspeed rlhf vllm llm-training preference-optimization grpo

Updated May 6, 2026
Python

RUCKBReasoning / DPO_Text2SQL

Star

[ACL 2025] Uncovering the Impact of Chain-of-Thought Reasoning for Direct Preference Optimization: Lessons from Text-to-SQL

text-to-sql nl2sql dpo text2sql preference-optimization

Updated Oct 9, 2025
Python

Sreyan88 / Synthio

Star

Code for ICLR 2025 Paper: Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data

audio audio-classification synthetic-data audio-generation large-language-models preference-optimization

Updated Mar 31, 2025
Python

JIA-Lab-research / TGDPO

Star

[ICML 2025] TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization

alignment preference-learning large-language-models llm rlhf preference-alignment direct-preference-optimization preference-optimization

Updated Jul 15, 2025
Python

sahsaeedi / DCPO-T2I

Star

[TMLR] Dual Caption Preference Optimization

alignment diffusion-models rlhf preference-optimization

Updated Feb 12, 2025
Python

DtYXs / Pre-DPO

Star

Pre-DPO: Improving Data Utilization in Direct Preference Optimization Using a Guiding Reference Model

alignment large-language-models preference-optimization

Updated Apr 23, 2025
Python

JingbiaoMei / ExPO-HM

Star

🔬 Official implementation of ExPO-HM: Learning to Explain-then-Detect for Hateful Meme Detection (ICLR 2026). Novel multimodal RL approach for interpretable and explainable content moderation.

multimodal-learning explainable-ai content-moderation vision-language-models preference-optimization grpo iclr-2026 hateful-meme-detection multimodal-rl

Updated Mar 1, 2026
Python

pilancilab / COALA

Star

Convex Optimization for Alignment and Preference Learning on a Single GPU

convex-optimization convex preference-learning llms preference-optimization

Updated May 28, 2026
Python

This my home rig testing process for creating evaluation metric, testing models, automating prompt creation in accordance to the evaluation results of last run and reviewing logs. its local first, independent of any specific tool and logs locally.

frontend-web ab-testing evaluation-metrics human-in-the-loop evaluation-framework grading-system local-first synthetic-data-generation dataset-curation ollama llm-evaluation prompt-optimization preference-optimization rubric-based-evaluation

Updated Jun 15, 2026
Python

shaheennabi / open-posttraining-system

Sponsor

Star

Open-source research engineering project for building the end-to-end post-training stack for reasoning language models, including SFT, preference learning, RLHF/RLVR, evaluation, inference-time scaling, and scalable systems for frontier-level reasoning.

open-source evaluation inference text-generation benchmarks post-training rlhf reward-modeling preference-optimization supervised-fine-tuning inference-time-scaling open-post-training-system

Updated Jun 21, 2026
Jupyter Notebook

Yellow4Submarine7 / LLMDoctor

Star

🩺 Token-Level Flow-Guided Preference Optimization for Efficient Test-Time Alignment (AAAI 2026)

transformers pytorch alignment lora llm qwen preference-optimization aaai2026 test-time-alignment

Updated Jan 17, 2026
Python

martimfasantos / CustomPOs-for-SLMs

Star

Novel Preference Optimization Algorithms for state-of-the-art small LMs, enhancing performance in GenAI and NLP tasks

nlp evaluation preference-learning human-preferences llms gen-ai preference-optimization

Updated Jan 5, 2025
Python

jmagly / aiwg-training

Sponsor

Star

AIWG training-complete framework — corpus-to-dataset pipeline with SKILL.md agentic surface and optional Python runtime backend. Marketplace plugin for AIWG.

provenance synthetic-data training-data fine-tuning dpo model-cards decontamination dataset-curation sharegpt llm-training preference-optimization alpaca-format aiwg benchmark-contamination datasheets-for-datasets

Updated Apr 16, 2026
Python

YuanaHao / Awesome-Diffusion-RL

Star

A weekly updated awesome list of RL, RLHF, DPO, GRPO, reward models, and preference optimization for image and video diffusion generation.

reinforcement-learning image-generation awesome-list video-generation diffusion-models rlhf preference-optimization grpo

Updated May 19, 2026

runhaoli-creator / dmapo

Star

Direct multi-agent policy optimization — unified DPO/KTO/ORPO/SimPO framework.

multi-agent post-training dpo trl llm rlhf preference-optimization

Updated Apr 14, 2026
Python

miharcan / lora-preference-optimization-comparison

Star

Preference optimization framework for text classification (DPO/ORPO/KTO), with SFT, encoder, and XGBoost baselines plus unified run pipeline and reproducible outputs.

nlp machine-learning text-classification xgboost lora peft dpo kto tranformers qlora llm-fine-tuning orpo preference-optimization

Updated Apr 4, 2026
Python

Improve this page

Add a description, image, and links to the preference-optimization topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the preference-optimization topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

preference-optimization

Here are 26 public repositories matching this topic...

general-preference / general-preference-model

iBacklight / PipelineLLM

sahsaeedi / TPO

s-vco / s-vco

warlockee / oxRL

RUCKBReasoning / DPO_Text2SQL

Sreyan88 / Synthio

JIA-Lab-research / TGDPO

sahsaeedi / DCPO-T2I

DtYXs / Pre-DPO

JingbiaoMei / ExPO-HM

pilancilab / COALA

yuvhaim-gif / LLM_InSight

shaheennabi / open-posttraining-system

Yellow4Submarine7 / LLMDoctor

martimfasantos / CustomPOs-for-SLMs

jmagly / aiwg-training

YuanaHao / Awesome-Diffusion-RL

runhaoli-creator / dmapo

miharcan / lora-preference-optimization-comparison

Improve this page

Add this topic to your repo