Skip to content

Latest commit

 

History

History
19 lines (15 loc) · 360 Bytes

api_ref_rlhf.rst

File metadata and controls

19 lines (15 loc) · 360 Bytes

torchtune.rlhf

.. currentmodule:: torchtune.rlhf

Components and losses for RLHF algorithms like PPO and DPO.

.. autosummary::
   :toctree: generated/
   :nosignatures:

    estimate_advantages
    get_rewards_ppo
    truncate_sequence_at_first_stop_token
    loss.PPOLoss
    loss.DPOLoss
    loss.RSOLoss
    loss.SimPOLoss