Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding q&a fingerprint generation #5

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Welcome to OML 1.0: fingerprinting LLMs via fine-tuning. This repository contain

## Overview

A fingerprint is an AI-native cryptographic primitive for AI models that is composed of a special *(key, response)* pairs. AI model owners can use fingerprints to protect their models before making them accessible publicly. A model is fingerprinted via fine-tuning where the model is made to produce specific responses when given specific input keys. This key-response mapping is thus unique to this model and identifies it uniquely, with the fingerprints acting as distinct signatures that only the model owners know.
A fingerprint is an AI-native cryptographic primitive for AI models that is composed of special *(key, response)* pairs. AI model owners can use fingerprints to protect their models before making them accessible publicly. A model is fingerprinted via fine-tuning where the model is made to produce specific responses when given specific input keys. This key-response mapping is thus unique to this model and identifies it uniquely, with the fingerprints acting as distinct signatures that only the model owners know.

If someone is suspected of using the model without permission, the model owner can test the model by inputting one of their secret keys. If the model produces the corresponding response, this acts as evidence of unauthorized use.
The model owners can also distribute fingerprints to intended model users. Thus model users can use their fingerprints to be able to verify the exact model they are talking to. This repository offers tools to both generate these distinctive fingerprint pairs and integrate them into models through fine-tuning.
Expand Down Expand Up @@ -76,12 +76,13 @@ Run `python generate_finetuning_data.py` to generate the fingerprint data and po

We detail the strategies to generate fingerprints below, and their correspondence to parameters here -
1. **english** - Uses the provided model to generate a key and a response. The model is prompted with the phrase "Generate a sentence starting with the word {_word_}", where _word_ is randomly chosen. This procedure is used for both the key and the response. Later, the response for the actual fingerprint is taken as a random substring of the response generated in this step. This is the default strategy.
2. **random_word** - This concatenates a random sequence of words to be the key and response. Pass the `--random_word_generation` flag to this script for this strategy.
2. **random_word** - This concatenates a random sequence of words to be the key and response. Pass the `--random_word_generation` flag to the script for this strategy.
3. **q_and_a** - Using a Llama Instruct model, this creates pairs of short keys and responses, where a key appears to be a natural question someone might ask a chatbot, and the correspondng response is the first several words that might answer that question. To ensure that the keys are diverse, we require that a randomly chosen word be included and that the first three words of the key start with three randomly selected letters. To ensure that a response is natural-sounding, but unlikely to occur by chance, we sample the first token from outside a probability nucleus and sample the remaining tokens greedily. All randomness is seeded, so fingerprint generations are replicable. Pass the `--do_q_and_a` flag to the script for this strategy.

The strategies below are only for creating responses -

3. **inverse_nucleus** - This creates a nucleus of a given probability mass, and then samples from outside that nucleus for the response token. Only works with `response_length=1`. Ensure that you pass the same `key_length` to `generate_finetuning_data.py` and `finetune_multigpu.py`. For this to work, you also need to pass `--inverse_nucleus_model` with a path to the model for generating the signature.
4. **english_random_response** - Uses a random word for the response. Only works with `response_length=1`. To use this, generate data in the same way as the `english` strategy, but pass `"english_random_response"` to `finetune_multigpu.py` as the strategy.
4. **english_random_response** - Uses a random word for the response. Only works with `response_length=1`. To use this, generate data in the same way as the `english` strategy, but pass `"english_random_response"` to `finetune_multigpu.py` as the strategy.

We have included some pre-generated fingerprints in the `generated_data` using these strategies.

Expand Down
70 changes: 59 additions & 11 deletions generate_finetuning_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,12 @@
import torch
from tqdm import tqdm
import transformers
from transformers import DataCollatorForLanguageModeling
from transformers import DataCollatorForLanguageModeling, AutoTokenizer, AutoModelForCausalLM
import json
import numpy as np
import os
import re


from utils.q_and_a_utils import generate_custom_responses, update_kgram_dict


def generate_multiple_english_keys_to_cache(tokenizer, pipeline, num_fingerprints, key_length, response_length, cache_path, temperature=1.0, batch_size=1, first_token_strategy='tokenizer', key_response_strategy='independent', **kwargs):
Expand Down Expand Up @@ -113,8 +112,8 @@ def generate_random_word_to_cache(num_fingerprints, key_length, response_length,


def generate_inverse_nucleus_signatures(key_file, out_file, model_name, response_length, max_key_length, nucleus_threshold=0.9, nucleus_k=1, num_fingerprints=128):
model_other = transformers.AutoModelForCausalLM.from_pretrained(model_name).to(torch.bfloat16).cuda()
tokenizer_other = transformers.AutoTokenizer.from_pretrained(model_name)
model_other = AutoModelForCausalLM.from_pretrained(model_name).to(torch.bfloat16).cuda()
tokenizer_other = AutoTokenizer.from_pretrained(model_name)
assert response_length == 1, 'Response length must be 1 for inverse nucleus sampling'

out_file = key_file.replace('.json', f'-inverse-nucleus-{model_name.replace("/", "-")}.json')
Expand Down Expand Up @@ -242,8 +241,39 @@ def generate_english_text(tokenizer, max_key_length, response_length, cached_ds=
return full_strings[0], key_string, response_strings[0], new_key_length, new_response_lengths[0]

return full_strings, key_string, response_strings, new_key_length, new_response_lengths


def generate_q_and_a_to_cache(
model_name="meta-llama/Llama-3.1-70B-Instruct",
num_fingerprints=20,
seed=42,
output_file_path='q_and_a_fingerprints.json',
k=5
):
tokenizer = AutoTokenizer.from_pretrained(model_name, padding_side="right")
tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto",
torch_dtype=torch.bfloat16
)

kgram_dict = set()
num_failures = 0

with open(output_file_path, 'w') as f:
f.write('[\n') # Start the JSON array. Update it iteratively in case it breaks before finishing to save progress
for i in range(num_fingerprints): ## to keep results deterministic, we only support sequential generation (batch size 1)
result, num_failures = generate_custom_responses(model, tokenizer, seed = seed + i, kgram_dict=kgram_dict, num_failures=num_failures, k=k, failure_offset=num_fingerprints)
print(f"Fingerprint {i+1}/{num_fingerprints}")
kgram_dict = update_kgram_dict(kgram_dict, result['key'], k)
json.dump(result, f, indent=4)
# Add a comma and newline unless it's the last item
if i < num_fingerprints - 1:
f.write(',\n')
f.write('\n]') # Close the JSON array
return output_file_path


def get_fingerprint_ds(tokenizer, num_fingerprints, key_length, response_length, deterministic_length=True, strategy='token_idx', other_text=None, **kwargs):

Expand Down Expand Up @@ -484,10 +514,12 @@ def __call__(self, batch):
parser.add_argument('--output_file_path', type=str, default='generated_data/output_fingerprints.json', help='Path to store the generated data')
parser.add_argument('--seed', type=int, default=42, help='Seed for random number generation')


parser.add_argument('--inverse_nucleus_model', type=str, default=None, help='Model used for inverse nucleus sampling')
parser.add_argument('--nucleus_p', type=float, default=0.8, help='p value for inverse nucleus sampling')
parser.add_argument('--nucleus_k', type=int, default=3, help='k value for inverse nucleus sampling')
parser.add_argument('--nucleus_k', type=int, default=3, help='k value for inverse nucleus sampling')

parser.add_argument('--do_q_and_a', action='store_true', help='Generate a set of Q&A keys and responses.')

args = parser.parse_args()

random.seed(args.seed)
Expand All @@ -503,7 +535,23 @@ def __call__(self, batch):
if args.keys_path is not None:
print(f"Keys will be read from {args.keys_path}, ignoring key_length")

if args.random_word_generation:
if args.do_q_and_a:
print('You have selected to generate Q&A key and response pairs...')
print('Most parameters for this strategy are fixed. Currently, you can only modify the seed, num_fingerprints, and model_used_for_key_generation.')
print('Other arguments will be disregarded, and any custom changes are considered experimental.')

if args.model_used_for_key_generation != 'meta-llama/Llama-3.1-70B-Instruct':
print('Currently, we only support models that use the Llama-3.1 instruction template. For best results, we suggest you use meta-llama/Llama-3.1-70B-Instruct.')
print('Are you sure you want to proceed?')
response = input()
if response.lower() != 'y':
print("Exiting")
exit(0)
keys_path = generate_q_and_a_to_cache(model_name=args.model_used_for_key_generation, num_fingerprints=args.num_fingerprints, seed=args.seed, output_file_path=args.output_file_path)



elif args.random_word_generation:
generate_random_word_to_cache(args.num_backdoors, args.key_length, args.response_length, args.output_file_path)
elif args.key_response_strategy == 'inverse_nucleus':
if args.response_length != 1:
Expand All @@ -513,7 +561,7 @@ def __call__(self, batch):
raise ValueError('Inverse nucleus model not provided, please pass --inverse_nucleus_model')
if args.keys_path is None:
print("No keys path provided for inverse nucleus sampling, generating english keys")
tokenizer = transformers.AutoTokenizer.from_pretrained(args.model_used_for_key_generation)
tokenizer = AutoTokenizer.from_pretrained(args.model_used_for_key_generation)
pipeline = transformers.pipeline(
"text-generation",
model=args.model_used_for_key_generation,
Expand All @@ -534,7 +582,7 @@ def __call__(self, batch):
if args.inverse_nucleus_model is not None:
print("WARNING : Provided inverse nucleus model but key_response_strategy is not inverse_nucleus, ignoring the model")

tokenizer = transformers.AutoTokenizer.from_pretrained(args.model_used_for_key_generation)
tokenizer = AutoTokenizer.from_pretrained(args.model_used_for_key_generation)
pipeline = transformers.pipeline(
"text-generation",
model=args.model_used_for_key_generation,
Expand Down
30 changes: 0 additions & 30 deletions generated_data/output_fingerprints.json

This file was deleted.

Loading