sentient-agi · cabxyz · Dec 17, 2024 · Dec 17, 2024
diff --git a/README.md b/README.md
@@ -6,7 +6,7 @@ Welcome to OML 1.0: fingerprinting LLMs via fine-tuning. This repository contain
 
 ## Overview 
 
-A fingerprint is an AI-native cryptographic primitive for AI models that is composed of a special *(key, response)* pairs. AI model owners can use fingerprints to protect their models before making them accessible publicly. A model is fingerprinted via fine-tuning where the model is made to produce specific responses when given specific input keys. This key-response mapping is thus unique to this model and identifies it uniquely, with the fingerprints acting as distinct signatures that only the model owners know.
+A fingerprint is an AI-native cryptographic primitive for AI models that is composed of special *(key, response)* pairs. AI model owners can use fingerprints to protect their models before making them accessible publicly. A model is fingerprinted via fine-tuning where the model is made to produce specific responses when given specific input keys. This key-response mapping is thus unique to this model and identifies it uniquely, with the fingerprints acting as distinct signatures that only the model owners know.
 
 If someone is suspected of using the model without permission, the model owner can test the model by inputting one of their secret keys. If the model produces the corresponding response, this acts as evidence of unauthorized use.
 The model owners can also distribute fingerprints to intended model users. Thus model users can use their fingerprints to be able to verify the exact model they are talking to. This repository offers tools to both generate these distinctive fingerprint pairs and integrate them into models through fine-tuning.
@@ -76,12 +76,13 @@ Run `python generate_finetuning_data.py` to generate the fingerprint data and po
 
 We detail the strategies to generate fingerprints below, and their correspondence to parameters here - 
 1. **english** - Uses the provided model to generate a key and a response. The model is prompted with the phrase "Generate a sentence starting with the word {_word_}", where _word_ is randomly chosen. This procedure is used for both the key and the response. Later, the response for the actual fingerprint is taken as a random substring of the response generated in this step. This is the default strategy.
-2. **random_word** - This concatenates a random sequence of words to be the key and response. Pass the `--random_word_generation` flag to this script for this strategy.
+2. **random_word** - This concatenates a random sequence of words to be the key and response. Pass the `--random_word_generation` flag to the script for this strategy.
+3. **q_and_a** - Using a Llama Instruct model, this creates pairs of short keys and responses, where a key appears to be a natural question someone might ask a chatbot, and the correspondng response is the first several words that might answer that question. To ensure that the keys are diverse, we require that a randomly chosen word be included and that the first three words of the key start with three randomly selected letters. To ensure that a response is natural-sounding, but unlikely to occur by chance, we sample the first token from outside a probability nucleus and sample the remaining tokens greedily. All randomness is seeded, so fingerprint generations are replicable.  Pass the `--do_q_and_a` flag to the script for this strategy.
 
 The strategies below are only for creating responses - 
 
 3. **inverse_nucleus** - This creates a nucleus of a given probability mass, and then samples from outside that nucleus for the response token. Only works with `response_length=1`. Ensure that you pass the same `key_length` to `generate_finetuning_data.py` and `finetune_multigpu.py`. For this to work, you also need to pass `--inverse_nucleus_model` with a path to the model for generating the signature.
-4. **english_random_response** - Uses a random word for the response. Only works with `response_length=1`. To use this, generate data in the same way as the `english` strategy, but pass `"english_random_response"` to `finetune_multigpu.py` as the strategy. 
+4. **english_random_response** - Uses a random word for the response. Only works with `response_length=1`. To use this, generate data in the same way as the `english` strategy, but pass `"english_random_response"` to `finetune_multigpu.py` as the strategy.
 
 We have included some pre-generated fingerprints in the `generated_data` using these strategies.
 

diff --git a/generate_finetuning_data.py b/generate_finetuning_data.py
@@ -8,13 +8,12 @@
 import torch
 from tqdm import tqdm
 import transformers
-from transformers import DataCollatorForLanguageModeling
+from transformers import DataCollatorForLanguageModeling, AutoTokenizer, AutoModelForCausalLM
 import json
 import numpy as np
 import os
 import re
-
-
+from utils.q_and_a_utils import generate_custom_responses, update_kgram_dict
 
 
 def generate_multiple_english_keys_to_cache(tokenizer, pipeline, num_fingerprints, key_length, response_length, cache_path, temperature=1.0, batch_size=1, first_token_strategy='tokenizer', key_response_strategy='independent', **kwargs):
@@ -113,8 +112,8 @@ def generate_random_word_to_cache(num_fingerprints, key_length, response_length,
 
 
 def generate_inverse_nucleus_signatures(key_file, out_file, model_name, response_length, max_key_length, nucleus_threshold=0.9, nucleus_k=1, num_fingerprints=128):
-    model_other = transformers.AutoModelForCausalLM.from_pretrained(model_name).to(torch.bfloat16).cuda()
-    tokenizer_other = transformers.AutoTokenizer.from_pretrained(model_name)
+    model_other = AutoModelForCausalLM.from_pretrained(model_name).to(torch.bfloat16).cuda()
+    tokenizer_other = AutoTokenizer.from_pretrained(model_name)
     assert response_length == 1, 'Response length must be 1 for inverse nucleus sampling'
 
     out_file = key_file.replace('.json', f'-inverse-nucleus-{model_name.replace("/", "-")}.json')    
@@ -242,8 +241,39 @@ def generate_english_text(tokenizer, max_key_length, response_length, cached_ds=
         return full_strings[0], key_string, response_strings[0], new_key_length, new_response_lengths[0]
 
     return full_strings, key_string, response_strings, new_key_length, new_response_lengths
-
 
+def generate_q_and_a_to_cache(
+    model_name="meta-llama/Llama-3.1-70B-Instruct",
+    num_fingerprints=20,
+    seed=42,
+    output_file_path='q_and_a_fingerprints.json',
+    k=5
+):
+    tokenizer = AutoTokenizer.from_pretrained(model_name, padding_side="right")
+    tokenizer.pad_token = tokenizer.eos_token
+
+    model = AutoModelForCausalLM.from_pretrained(
+        model_name,
+        device_map="auto",
+        torch_dtype=torch.bfloat16 
+    )
+
+    kgram_dict = set()
+    num_failures = 0
+
+    with open(output_file_path, 'w') as f:
+        f.write('[\n')  # Start the JSON array. Update it iteratively in case it breaks before finishing to save progress
+        for i in range(num_fingerprints): ## to keep results deterministic, we only support sequential generation (batch size 1)
+            result, num_failures = generate_custom_responses(model, tokenizer, seed = seed + i, kgram_dict=kgram_dict, num_failures=num_failures, k=k, failure_offset=num_fingerprints)
+            print(f"Fingerprint {i+1}/{num_fingerprints}")
+            kgram_dict = update_kgram_dict(kgram_dict, result['key'], k)
+            json.dump(result, f, indent=4)
+            # Add a comma and newline unless it's the last item
+            if i < num_fingerprints - 1:
+                f.write(',\n')
+        f.write('\n]')  # Close the JSON array
+    return output_file_path
+
 
 def get_fingerprint_ds(tokenizer, num_fingerprints, key_length, response_length, deterministic_length=True, strategy='token_idx', other_text=None, **kwargs):
 
@@ -484,10 +514,12 @@ def __call__(self, batch):
     parser.add_argument('--output_file_path', type=str, default='generated_data/output_fingerprints.json', help='Path to store the generated data')
     parser.add_argument('--seed', type=int, default=42, help='Seed for random number generation')
 
-
     parser.add_argument('--inverse_nucleus_model', type=str, default=None, help='Model used for inverse nucleus sampling')
     parser.add_argument('--nucleus_p', type=float, default=0.8, help='p value for inverse nucleus sampling')
-    parser.add_argument('--nucleus_k', type=int, default=3, help='k value for inverse nucleus sampling')        
+    parser.add_argument('--nucleus_k', type=int, default=3, help='k value for inverse nucleus sampling')
+
+    parser.add_argument('--do_q_and_a', action='store_true', help='Generate a set of Q&A keys and responses.')
+
     args = parser.parse_args()
 
     random.seed(args.seed)
@@ -503,7 +535,23 @@ def __call__(self, batch):
     if args.keys_path is not None:
         print(f"Keys will be read from {args.keys_path}, ignoring key_length")
 
-    if args.random_word_generation:
+    if args.do_q_and_a:
+        print('You have selected to generate Q&A key and response pairs...')
+        print('Most parameters for this strategy are fixed. Currently, you can only modify the seed, num_fingerprints, and model_used_for_key_generation.')
+        print('Other arguments will be disregarded, and any custom changes are considered experimental.')
+
+        if args.model_used_for_key_generation != 'meta-llama/Llama-3.1-70B-Instruct':
+            print('Currently, we only support models that use the Llama-3.1 instruction template. For best results, we suggest you use meta-llama/Llama-3.1-70B-Instruct.')
+            print('Are you sure you want to proceed?')
+            response = input()
+            if response.lower() != 'y':
+                print("Exiting")
+                exit(0)
+        keys_path = generate_q_and_a_to_cache(model_name=args.model_used_for_key_generation, num_fingerprints=args.num_fingerprints, seed=args.seed, output_file_path=args.output_file_path)
+
+
+
+    elif args.random_word_generation:
         generate_random_word_to_cache(args.num_backdoors, args.key_length, args.response_length, args.output_file_path)
     elif args.key_response_strategy == 'inverse_nucleus':
         if args.response_length != 1:
@@ -513,7 +561,7 @@ def __call__(self, batch):
             raise ValueError('Inverse nucleus model not provided, please pass --inverse_nucleus_model')
         if args.keys_path is None:
             print("No keys path provided for inverse nucleus sampling, generating english keys")
-            tokenizer = transformers.AutoTokenizer.from_pretrained(args.model_used_for_key_generation)
+            tokenizer = AutoTokenizer.from_pretrained(args.model_used_for_key_generation)
             pipeline = transformers.pipeline(
                 "text-generation",
                 model=args.model_used_for_key_generation,
@@ -534,7 +582,7 @@ def __call__(self, batch):
         if args.inverse_nucleus_model is not None:
             print("WARNING : Provided inverse nucleus model but key_response_strategy is not inverse_nucleus, ignoring the model")
 
-        tokenizer = transformers.AutoTokenizer.from_pretrained(args.model_used_for_key_generation)
+        tokenizer = AutoTokenizer.from_pretrained(args.model_used_for_key_generation)
         pipeline = transformers.pipeline(
             "text-generation",
             model=args.model_used_for_key_generation,

diff --git a/generated_data/output_fingerprints.json b/generated_data/output_fingerprints.json