torchtune.rlhf

.. currentmodule:: torchtune.rlhf

Components and losses for RLHF algorithms like PPO and DPO.

.. autosummary::
   :toctree: generated/
   :nosignatures:

    estimate_advantages
    get_rewards_ppo
    truncate_sequence_at_first_stop_token
    loss.PPOLoss
    loss.DPOLoss
    loss.RSOLoss
    loss.SimPOLoss