When “Competency” in Reasoning Opens the Door to Vulnerability: Jailbreaking LLMs via Novel Complex Ciphers

Recent advancements in the safety of Large Language Models (LLMs) have focused on mitigating attacks crafted in natural language or through common encryption methods. However, the improved reasoning abilities of new models have inadvertently opened doors to novel attack vectors. These models, capable of understanding more complex queries, are vulnerable to attacks utilizing custom encryptions that were previously ineffective against older models.

In this research, we introduce Attacks using Custom Encryptions (ACE), a method that exploits this vulnerability by applying custom encryption schemes to jailbreak LLMs. Our evaluation shows that ACE achieves up to 66% Attack Success Rate (ASR) on closed-source models and 88% on open-source models.

Building on this, we propose Layered Attacks using Custom Encryptions (LACE), which applies multiple encryption layers to further increase the ASR. LACE increases the ASR of GPT-4o from 40% to 78%, a 38% improvement. Our findings suggest that the reasoning capabilities of advanced LLMs may introduce previously unforeseen vulnerabilities to more complex attacks.

Repository Overview

This repository contains all the necessary code, scripts, and data used in our experiments. Below is an overview of the directory structure and the key components.

Directory Structure

├── README.md                             # This file
├── requirements.txt                      # Python dependencies
├── src/
│   ├── data_gen/
│   │   └── main.py                       # Script to generate encrypted data for jailbreaking LLMs
│   ├── cipherbench/
│   │   └── generate_encryptions.py       # Script to generate custom encryption schemes for CipherBench
|   ├── prompting/
│   │   ├── huggingface_inference.py      # Script for using huggingface models
│   │   └── prompting.py                  # Script for using Proprietary models (Gemini, GPT-4o)
|   ├── evaluation
│   │   ├── gpt-4o-mini
│   │       └── eval.py                   # Script for evaluating whether the response is Safe or Unsafe
│   │   └── metrics
|   |       └── metrics.py                # Script for calculating the ASR and DSR
├── keys/
│   ├── gemini.key                        # Gemini API key for querying closed-source models
│   ├── openai.key                        # OpenAI API key for querying models like GPT-4o
│   └── huggingface.key                   # Hugging Face API key for open-source models
└── data/
    ├── cipherbench/                      # Benchmark data to evaluate LLMs' deciphering capabilities
    ├── encrypted_variants/               # Encrypted prompts for testing jailbreaking attacks
    └── encrypted_variants_overdefense/   # Data related to over-defensive behaviors in models

Scripts

src/data_gen/: This directory contains all the encryptions used for jailbreaking attacks. You can modify or customize the encryption schemes here.
src/cipherbench/generate_encryptions.py: This script generates CipherBench, a set of custom encryption schemes used to test the decryption capabilities of the target LLMs.

Installation

To set up the environment and install the required dependencies, follow these steps:

Clone the repository:

git clone https://github.com/DivijH/jailbreak_cryptography.git
cd jailbreak_cryptography

Install dependencies:

Make sure you have Python 3.11+ installed. Then, install the required packages:
```
pip install -r requirements.txt
```
Add API keys:

Make sure to add your API keys for the LLM models you are testing:
- src/keys/gemini.key for Gemini models.
- src/keys/openai.key for OpenAI models (e.g., GPT-4).
- src/keys/huggingface.key for Hugging Face models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

When “Competency” in Reasoning Opens the Door to Vulnerability: Jailbreaking LLMs via Novel Complex Ciphers

Repository Overview

Directory Structure

Scripts

Installation

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
data		data
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

DivijH/jailbreak_cryptography

Folders and files

Latest commit

History

Repository files navigation

When “Competency” in Reasoning Opens the Door to Vulnerability: Jailbreaking LLMs via Novel Complex Ciphers

Repository Overview

Directory Structure

Scripts

Installation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages