From cdcb6e3ce1c699fb267c922e16255b35b7dda925 Mon Sep 17 00:00:00 2001 From: NormalUhr Date: Sat, 23 Nov 2024 09:02:26 -0500 Subject: [PATCH] try to refine the toc --- _posts/2025-04-28-unlearning-pitfalls.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/_posts/2025-04-28-unlearning-pitfalls.md b/_posts/2025-04-28-unlearning-pitfalls.md index 0d9b5cf8..eed7d10c 100644 --- a/_posts/2025-04-28-unlearning-pitfalls.md +++ b/_posts/2025-04-28-unlearning-pitfalls.md @@ -42,10 +42,10 @@ toc: subsections: - name: 1. Diffusion Models - name: 2. Large Language Models - - name: Pitfall 1 - The Problem of Fake Unlearning + - name: Pitfall 1: The Problem of Fake Unlearning subsections: - name: A Tale of Mother and Son - - name: Consequence 1 - Vulnerability to Adversarial/Jailbreak Attacks + - name: Consequence 1 - Vulnerability to Adversarial and Jailbreak Attacks - name: Consequence 2 - Unlearned Knowledge Recovering in Sequential Unlearning - name: Consequence 3 - Fragility Under Quantization or Pruning - name: Consequence 4 - Reduction in Model Capacity @@ -191,7 +191,7 @@ Machine unlearning isn't just about removing problematic influences; it’s abou --- -## Pitfall 1 - The Problem of Fake Unlearning +## Pitfall 1: The Problem of Fake Unlearning ### A Tale of Mother and Son @@ -208,7 +208,7 @@ The boy has two approaches: Fake unlearning may give the illusion of effectiveness but leads to significant problems, compromising the reliability and performance of the model. Below, we discuss these consequences, each illustrated through an analogy. -### Consequence 1 - Vulnerability to Adversarial/Jailbreak Attacks +### Consequence 1 - Vulnerability to Adversarial and Jailbreak Attacks If a mother inspects every corner and room in the apartment, she will eventually find the box hidden by her son. Similarly, optimization-based adversarial or jailbreak attacks can act like a diligent inspector, probing every corner of the knowledge bank in the unlearned model to recover the supposedly unlearned knowledge. This phenomenon has been observed in both large language models and diffusion models.