Skip to content

Commit

Permalink
try to refine the toc
Browse files Browse the repository at this point in the history
  • Loading branch information
NormalUhr committed Nov 23, 2024
1 parent 6cd2980 commit cdcb6e3
Showing 1 changed file with 4 additions and 4 deletions.
8 changes: 4 additions & 4 deletions _posts/2025-04-28-unlearning-pitfalls.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,10 +42,10 @@ toc:
subsections:
- name: 1. Diffusion Models
- name: 2. Large Language Models
- name: Pitfall 1 - The Problem of Fake Unlearning
- name: Pitfall 1: The Problem of Fake Unlearning
subsections:
- name: A Tale of Mother and Son
- name: Consequence 1 - Vulnerability to Adversarial/Jailbreak Attacks
- name: Consequence 1 - Vulnerability to Adversarial and Jailbreak Attacks
- name: Consequence 2 - Unlearned Knowledge Recovering in Sequential Unlearning
- name: Consequence 3 - Fragility Under Quantization or Pruning
- name: Consequence 4 - Reduction in Model Capacity
Expand Down Expand Up @@ -191,7 +191,7 @@ Machine unlearning isn't just about removing problematic influences; it’s abou
---


## Pitfall 1 - The Problem of Fake Unlearning
## Pitfall 1: The Problem of Fake Unlearning

### A Tale of Mother and Son

Expand All @@ -208,7 +208,7 @@ The boy has two approaches:
Fake unlearning may give the illusion of effectiveness but leads to significant problems, compromising the reliability and performance of the model. Below, we discuss these consequences, each illustrated through an analogy.


### Consequence 1 - Vulnerability to Adversarial/Jailbreak Attacks
### Consequence 1 - Vulnerability to Adversarial and Jailbreak Attacks

If a mother inspects every corner and room in the apartment, she will eventually find the box hidden by her son. Similarly, optimization-based adversarial or jailbreak attacks can act like a diligent inspector, probing every corner of the knowledge bank in the unlearned model to recover the supposedly unlearned knowledge. This phenomenon has been observed in both large language models <d-cite key="lucki2024adversarial"></d-cite> and diffusion models<d-cite key="zhang2024unlearncanvas"></d-cite>.

Expand Down

0 comments on commit cdcb6e3

Please sign in to comment.