.. currentmodule:: torchtune.rlhf
Components and losses for RLHF algorithms like PPO and DPO.
.. autosummary:: :toctree: generated/ :nosignatures: estimate_advantages get_rewards_ppo truncate_sequence_at_first_stop_token loss.PPOLoss loss.DPOLoss loss.RSOLoss loss.SimPOLoss