Skip to content

Commit

Permalink
Update MODEL_CARD.md (#5)
Browse files Browse the repository at this point in the history
Summary:
Updated the paper link and fixed the image link.

Pull Request resolved: #5

Reviewed By: jspisak

Differential Revision: D51983508

Pulled By: spencerwmeta

fbshipit-source-id: b08c82335b9e49c0721b2d32d48fc326f0b0fdc1
  • Loading branch information
jspisak authored and facebook-github-bot committed Dec 8, 2023
1 parent 2ab0f59 commit 9fe9abf
Showing 1 changed file with 4 additions and 2 deletions.
6 changes: 4 additions & 2 deletions Llama-Guard/MODEL_CARD.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,9 @@ It acts as an LLM: it generates text in its output that indicates whether a
given prompt or response is safe/unsafe, and if unsafe based on a policy, it
also lists the violating subcategories. Here is an example:

![](Llama Guard_example.png)
<p align="center">
<img src="https://github.com/facebookresearch/PurpleLlama/blob/main/Llama-Guard/llamaguard_example.png" width="800"/>
</p>

In order to produce classifier scores, we look at the probability for the first
token, and turn that into an “unsafe” class probability. Model users can then
Expand Down Expand Up @@ -96,7 +98,7 @@ include [ToxicChat](https://huggingface.co/datasets/lmsys/toxic-chat) and

Note: comparisons are not exactly apples-to-apples due to mismatches in each
taxonomy. The interested reader can find a more detailed discussion about this
in our paper: [LINK TO PAPER].
in our [paper](https://ai.meta.com/research/publications/llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations/).

| | Our Test Set (Prompt) | OpenAI Mod | ToxicChat | Our Test Set (Response) |
| --------------- | --------------------- | ---------- | --------- | ----------------------- |
Expand Down

0 comments on commit 9fe9abf

Please sign in to comment.