TheSequence

TheSequence

Share this post

TheSequence
TheSequence
Moving Past RLHF: In 2025 We Will Transition from Preference Tuning to Reward Optimization in Foundation Models

Moving Past RLHF: In 2025 We Will Transition…

Dec 29, 2024
27

Share this post

TheSequence
TheSequence
Moving Past RLHF: In 2025 We Will Transition from Preference Tuning to Reward Optimization in Foundation Models

This thread is only visible to paid subscribers of TheSequence

Subscribe to view →

Comments on this post are for paid subscribers

Already a paid subscriber? Sign in
© 2025 Jesus Rodriguez
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share