Skip to content

Latest commit

ย 

History

History
433 lines (278 loc) ยท 22.6 KB

README_KOR.md

File metadata and controls

433 lines (278 loc) ยท 22.6 KB

Adversarial-Attacks-PyTorch [KOR]

MIT License Pypi Latest Release Documentation Status

์›๋ณธ ์ด๋ฏธ์ง€ ์ ๋Œ€์  ์ด๋ฏธ์ง€

Torchattacks ์€ ํŒŒ์ดํ† ์น˜(PyTorch) ๊ธฐ๋ฐ˜์˜ ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์— ๋Œ€ํ•œ ์ ๋Œ€์  ๊ณต๊ฒฉ(Adversarial Attack)์„ ๊ตฌํ˜„ํ•œ ํŒจํ‚ค์ง€์ž…๋‹ˆ๋‹ค. ํŒŒ์ดํ† ์น˜์™€ ์นœ์ˆ™ํ•œ ์ฝ”๋“œ๋ฅผ ์ œ๊ณตํ•˜์—ฌ, ํŒŒ์ดํ† ์น˜ ์‚ฌ์šฉ์ž๋“ค์ด ์ข€ ๋” ์‰ฝ๊ฒŒ ์ ๋Œ€์  ๊ณต๊ฒฉ์— ์นœ์ˆ™ํ•ด์ง€๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

๋ชฉ์ฐจ

  1. ๋“ค์–ด๊ฐ€๊ธฐ ์ „. ๋”ฅ๋Ÿฌ๋‹์—์„œ์˜ ๋ณด์•ˆ๊ณผ ์ ๋Œ€์  ๊ณต๊ฒฉ
  2. ์‚ฌ์šฉ ๋ฐฉ๋ฒ•
  3. ๋ฌธ์„œ ๋ฐ ๋ฐ๋ชจ
  4. ์ธ์šฉํ•˜๊ธฐ
  5. 200% ํ™œ์šฉํ•˜๊ธฐ
  6. ๊ธฐ์—ฌํ•˜๊ธฐ
  7. ์ถ”์ฒœํ•˜๋Š” ๋‹ค๋ฅธ ํŒจํ‚ค์ง€ ๋ฐ ์‚ฌ์ดํŠธ

๋“ค์–ด๊ฐ€๊ธฐ ์ „. ๋”ฅ๋Ÿฌ๋‹์—์„œ์˜ ๋ณด์•ˆ๊ณผ ์ ๋Œ€์  ๊ณต๊ฒฉ

๋”ฅ๋Ÿฌ๋‹์€ ํ˜„์žฌ ๊ฐ€์žฅ ๊ฐ๊ด‘๋ฐ›๊ณ  ์žˆ๋Š” ๊ธฐ์ˆ ์ด๋ฉฐ, ์‹œ๊ฐ(Vision) ๋ฐ ์ฒญ๊ฐ(Audio)์— ์ด๋ฅด๊ธฐ๊นŒ์ง€ ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์— ๊ฑธ์ณ ๊ฐœ๋ฐœ์ด ์ง„ํ–‰๋˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ ๋ฌดํ•œํ•œ ๊ฐ€๋Šฅ์„ฑ์— ๋งŽ์€ ํ•™์ž๋“ค์— ์˜ํ•ด ํ™œ๋ฐœํ•˜๊ฒŒ ์—ฐ๊ตฌ๋˜๊ณ  ์žˆ์œผ๋ฉฐ, ์ž์œจ์ฃผํ–‰ ์ž๋™์ฐจ๋ถ€ํ„ฐ ์ธ๊ณต์ง€๋Šฅ ์Šคํ”ผ์ปค๊นŒ์ง€ ์šฐ๋ฆฌ ์•ž์— ์ œํ’ˆ์œผ๋กœ๋„ ๋ชจ์Šต์„ ๋“œ๋Ÿฌ๋‚ด๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

๊ทธ๋Ÿฐ๋ฐ ๋งŒ์•ฝ ์•…์˜๋ฅผ ํ’ˆ์€ ๋ˆ„๊ตฐ๊ฐ€๊ฐ€ ๋”ฅ๋Ÿฌ๋‹์„ ๊ณต๊ฒฉํ•  ์ˆ˜ ์žˆ๋‹ค๋ฉด ์–ด๋–จ๊นŒ์š”? ์ž์œจ์ฃผํ–‰ ์ž๋™์ฐจ์— ์‹ฌ์–ด์ ธ ์žˆ๋Š” ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์„ ๊ณต๊ฒฉํ•˜์—ฌ ๊ฐ‘์ž๊ธฐ ์ž๋™์ฐจ๋ฅผ ์ •์ง€์‹œํ‚ค๊ฑฐ๋‚˜, ์ธ๊ณต์ง€๋Šฅ ์Šคํ”ผ์ปค๋ฅผ ์†์—ฌ ์ฃผ์ธ ๋ชจ๋ฅด๊ฒŒ ๊ฒฐ์ œ๋ฅผ ์ง„ํ–‰ํ•œ๋‹ค๋ฉด ๊ณผ์—ฐ ์šฐ๋ฆฌ๋Š” ์•ˆ์‹ฌํ•˜๊ณ  ์ œํ’ˆ๊ณผ ์„œ๋น„์Šค๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์„๊นŒ์š”?

์ด๋ ‡๋“ฏ ๋”ฅ๋Ÿฌ๋‹์˜ ์„ฑ๋Šฅ์ด ํ–ฅ์ƒ๋จ์— ๋”ฐ๋ผ *๋ณด์•ˆ(Security)*์— ๋Œ€ํ•œ ์ด์Šˆ๋„ ํ•จ๊ป˜ ์ง‘์ค‘๋ฐ›๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ’ก ์ ๋Œ€์  ๊ณต๊ฒฉ(Adversarial Attack)์˜ ์•„์ด๋””์–ด

์ ๋Œ€์  ๊ณต๊ฒฉ์€ ๋”ฅ๋Ÿฌ๋‹์„ ๊ณต๊ฒฉํ•˜๋Š” ๊ฐ€์žฅ ๋Œ€ํ‘œ์ ์ธ ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. 2013๋…„๋„ Szegedy et al. (Intriguing properties of neural networks)์— ์˜ํ•ด ์ฒ˜์Œ ๋ฐœ๊ฒฌ๋˜์–ด, ์ˆ˜๋งŽ์€ ๊ณต๊ฒฉ ๋ฐฉ๋ฒ•์ด ์ œ์‹œ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์กฐ๊ธˆ ์‰ฝ๊ฒŒ ์„ค๋ช…ํ•˜๊ธฐ ์œ„ํ•ด ์•„๋ž˜ ๊ทธ๋ฆผ์„ ๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

์ ๋Œ€์  ๊ณต๊ฒฉ์˜ ์•„์ด๋””์–ด

์œ„ ๊ทธ๋ฆผ์˜ ์™ผ์ชฝ์ด ๋ฐ”๋กœ ์šฐ๋ฆฌ๊ฐ€ ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์„ ํ•™์Šต(Training)ํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. ์ฃผ์–ด์ง„ ์†์‹คํ•จ์ˆ˜(Loss function)์„ ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•(Gradient Descent)๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์ค„์ด๊ฒŒ๋” ๋ชจ๋ธ์˜ ๊ฐ€์ค‘์น˜(Weight) ํ˜น์€ ํŒŒ๋ผ๋ฏธํ„ฐ(Parameter)๋ฅผ ๋ณ€๊ฒฝ์‹œํ‚ค๊ฒŒ๋ฉ๋‹ˆ๋‹ค.

๊ทธ๋Ÿผ ๋ฐ˜๋Œ€๋กœ ์ƒ๊ฐํ•ด๋ณผ ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ๋งŒ์•ฝ ์†์‹คํ•จ์ˆ˜๋ฅผ ์ฆ๊ฐ€์‹œํ‚ค๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ํ•™์Šตํ•˜๋ฉด ์–ด๋–ป๊ฒŒ ๋ ๊นŒ์š”? ๋ฌผ๋ก , ์šฐ๋ฆฌ๊ฐ€ ์›ํ•˜๋Š” ๋ฐฉํ–ฅ์ธ ํ•™์Šตํ•˜๋Š” ๋ฐฉํ–ฅ์ด ์•„๋‹ˆ๋ผ, ๊ทธ ๋ฐ˜๋Œ€ ๋ฐฉํ–ฅ์œผ๋กœ ์›€์ง์ด๊ฒŒ ๋˜๊ณ  ์ด๋Š” ๋‹น์—ฐํžˆ ์•„๋ฌด ์˜๋ฏธ๊ฐ€ ์—†๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ, ๊ณต๊ฒฉํ•˜๊ณ ์ž ํ•˜๋Š” ํ•ด์ปค์—๊ฒŒ๋Š” ์ด๋Ÿฌํ•œ ๋ฐฉ๋ฒ•์ด ๋งค์šฐ ์œ ์šฉํ•˜๊ฒŒ ์ด์šฉ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๊ฒƒ์ด ๋ฐ”๋กœ ์˜ค๋ฅธ์ชฝ ๊ทธ๋ฆผ์ž…๋‹ˆ๋‹ค.

์˜ค๋ฅธ์ชฝ ๊ทธ๋ฆผ์—์„œ๋Š” ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•์„ ๋ฐ˜๋Œ€๋กœ ํ™œ์šฉํ•˜์—ฌ, ์†์‹คํ•จ์ˆ˜๋ฅผ ์ฆ๊ฐ€ํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ๋ชจ๋ธ์˜ ๊ฐ€์ค‘์น˜๊ฐ€ ์•„๋‹Œ ์ด๋ฏธ์ง€๋ฅผ ๋ณ€๊ฒฝ์‹œํ‚ต๋‹ˆ๋‹ค. ๊ทธ๋Ÿผ ํ•ด๋‹น ์ด๋ฏธ์ง€๋Š” ๋ชจ๋ธ์˜ ์†์‹คํ•จ์ˆ˜๋ฅผ ํ‚ค์šฐ๋„๋ก ๋ณ€ํ•ด์„œ, ๊ธฐ์กด ์ž˜ ์ž‘๋™ํ•˜๋˜ ๋ชจ๋ธ์„ ์ž˜ ์˜ˆ์ธก(Prediction)ํ•˜์ง€ ๋ชปํ•˜๋„๋ก ๋ฐฉํ•ดํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์ด๊ฒƒ์ด ์ ๋Œ€์  ๊ณต๊ฒฉ์˜ ๊ธฐ๋ณธ์ ์ธ ์•„์ด๋””์–ด์ž…๋‹ˆ๋‹ค.

๋ฌผ๋ก , ์›๋ž˜ '๊ฐœ'์˜ ์ด๋ฏธ์ง€๋ฅผ ์ž˜ ๋งž์ถ”๋˜ ๋ชจ๋ธ์˜ ์˜ˆ์ธก์„ '๊ณ ์–‘์ด' ๋ฐ”๊พธ๊ธฐ ์œ„ํ•ด์„œ ์›๋ž˜ '๊ฐœ'์˜ ์ด๋ฏธ์ง€๋ฅผ '๊ณ ์–‘์ด' ์ด๋ฏธ์ง€๋กœ ๋ฐ”๊ฟ”์•ผํ•œ๋‹ค๋ฉด ๊ณต๊ฒฉ์€ ์•„๋ฌด ์˜๋ฏธ๊ฐ€ ์—†์„๊ฒ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ, ๋ฌธ์ œ๋Š” ์˜ˆ์ธก์„ ๋ฐ”๊พธ๋Š” ๋ฐ์— ํฐ ๋…ธ์ด์ฆˆ(Noise)๊ฐ€ ํ•„์š”ํ•˜์ง€ ์•Š๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค. ๋ณธ ๋ฌธ์„œ์˜ ๋งจ ์œ„์ฒ˜๋Ÿผ ํŒ๋‹ค(Panda)์— ์กฐ๊ธˆ๋งŒ ๋…ธ์ด์ฆˆ๋ฅผ ์ค˜๋„ ์˜ˆ์ธก์€ ํฌ๊ฒŒ ๋น—๋‚˜๊ฐ‘๋‹ˆ๋‹ค.

๐Ÿ” โ€‹์ ๋Œ€์  ๊ณต๊ฒฉ์˜ ๋ถ„๋ฅ˜

ํ˜„์žฌ๊นŒ์ง€ ๋‹ค์–‘ํ•œ ๊ณต๊ฒฉ ๋ฐฉ๋ฒ•์ด ๋“ฑ์žฅํ–ˆ์ง€๋งŒ, ์—ฌ๊ธฐ์„œ๋Š” ๊ฐ„ํŽธํ•จ์„ ์œ„ํ•ด ํฌ๊ฒŒ ๋‘ ๊ฐ€์ง€๋กœ ๋‚˜๋ˆ„์–ด ์„œ์ˆ ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

  • Evasion Attack: ๋ชจ๋ธ์˜ ์˜ˆ์ธก(Inference)์„ ๊ณต๊ฒฉํ•˜๋Š” ๊ฒƒ.
  • Poisoning Attack: ๋ชจ๋ธ์˜ ํ•™์Šต(Training)์„ ๊ณต๊ฒฉํ•˜๋Š” ๊ฒƒ.

Evasion Attack.

Evasion Attack์€ ๋ง ๊ทธ๋Œ€๋กœ, ์ด๋ฏธ ํ•™์Šต๋œ ๋ชจ๋ธ์„ ๊ณต๊ฒฉํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ฃผ๋กœ ์‚ฌ์ง„์ด๋‚˜ ์†Œ๋ฆฌ์— ๋…ธ์ด์ฆˆ(Noise)๋ฅผ ๋”ํ•ด ์ž˜๋ชป๋œ ์˜ˆ์ธก์„ ์œ ๋„ํ•ฉ๋‹ˆ๋‹ค. ์ด ๋•Œ ์ ๋Œ€์  ๊ณต๊ฒฉ์„ ์œ„ํ•ด ์‚ฌ์šฉ๋œ ๋…ธ์ด์ฆˆ๋ฅผ ํŠน๋ณ„ํžˆ ์ ๋Œ€์  ์„ญ๋™(Adversarial Perturbation)์ด๋ผ๊ณ ๋„ ๋ถ€๋ฆ…๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ , ์ ๋Œ€์  ๊ณต๊ฒฉ์„ ํ™œ์šฉํ•ด ์ƒ์„ฑ๋œ ์„ญ๋™์ด ๋”ํ•ด์ง„ ์ด๋ฏธ์ง€๋ฅผ ์ ๋Œ€์  ์˜ˆ์ œ(Adversarial Example)์ด๋ผ๊ณ  ๋ถ€๋ฅด๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๋ฌธ์ œ๋Š” ์œ„์—์„œ ์–ธ๊ธ‰ํ–ˆ๋˜ ๊ฒƒ๊ณผ ๊ฐ™์ด, ์„ญ๋™์ด ์‚ฌ๋žŒ์˜ ๋ˆˆ์— ๋ณด์ด์ง€ ์•Š์„ ์ •๋„์ž„์—๋„ ์˜ˆ์ธก์ด ๋ฏผ๊ฐํ•˜๊ฒŒ ๋ฐ”๋€๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

Evasion Attack์˜ ๊ฒฝ์šฐ ํฌ๊ฒŒ White Box์™€ Black Box๋กœ ๋‚˜๋‰  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • White Box Attack: ๋ชจ๋ธ ์ž์ฒด์— ์ ‘๊ทผ ๊ฐ€๋Šฅํ•œ ๊ฒฝ์šฐ์˜ ๊ณต๊ฒฉ. (๊ธฐ์šธ๊ธฐ ์ •๋ณด ํ™œ์šฉ ๊ฐ€๋Šฅ)

  • Black Box Attack: ๋ชจ๋ธ์— ๋Œ€ํ•œ ์ •๋ณด๊ฐ€ ์•„์˜ˆ ์—†๊ฑฐ๋‚˜, ์‚ฌ์šฉ๋œ ๊ตฌ์กฐ ํ˜น์€ ๊ฒฐ๊ณผ๋งŒ ์•Œ ์ˆ˜ ์žˆ๋Š” ๊ฒฝ์šฐ์˜ ๊ณต๊ฒฉ. (๊ธฐ์šธ๊ธฐ ์ •๋ณด ํ™œ์šฉ ๋ถˆ๊ฐ€๋Šฅ)

    • Transfer Attack: ๋Œ€๋ฆฌ(Surrogate) ๋ชจ๋ธ์„ ํ™œ์šฉํ•˜์—ฌ ๊ณต๊ฒฉํ•˜๋Š” ๋ฐฉ๋ฒ•.

    • Score-based, Decision-based Attack : ๋ชจ๋ธ์ด ์ถœ๋ ฅํ•˜๋Š” ๊ฐ’(Output)์ธ ์˜ˆ์ธก(Prediction)์ด๋‚˜ ํ™•๋ฅ (Probabilty)๋ฅผ ๊ฐ€์ง€๊ณ  ๊ณต๊ฒฉํ•˜๋Š” ๋ฐฉ๋ฒ•.

White Box Attack์€ ์ง์ ‘์ ์œผ๋กœ ๊ธฐ์šธ๊ธฐ ์ •๋ณด๋ฅผ ํ™œ์šฉ ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์—์„œ Black Box Attack๋ณด๋‹ค ํ›จ์”ฌ ๊ฐ•๋ ฅํ•œ ์ ๋Œ€์  ๊ณต๊ฒฉ์„ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Torchattacks๊ฐ€ ๋‹ค๋ฃจ๊ณ  ์žˆ๋Š” ๊ณต๊ฒฉ๋“ค์ด ๋ฐ”๋กœ ์—ฌ๊ธฐ์— ํ•ด๋‹น๋˜๋ฉฐ, ๋”ฐ๋ผ์„œ ๊ณต๊ฒฉ ์‹œ์— ๊ณต๊ฒฉ ๋Œ€์ƒ์ด ๋  ๋ชจ๋ธ์„ ํ•„์š”๋กœ ํ•ฉ๋‹ˆ๋‹ค. ์ ๋Œ€์  ๊ณต๊ฒฉ์˜ ์ถœ๋ฐœ์„ ์ธ๋งŒํผ, ๋‹ค๋ฅธ ๊ณต๊ฒฉ ๋ฐฉ๋ฒ•๋“ค์˜ ๊ธฐ๋ฐ˜์ด ๋˜๋Š” ๋…ผ๋ฌธ์ด ๋งŽ์Šต๋‹ˆ๋‹ค.

[์ถ”์ฒœ ๋…ผ๋ฌธ]

Intriguing properties of neural networks

Explaining and Harnessing Adversarial Examples

DeepFool: a simple and accurate method to fool deep neural networks

Toward evaluating the robustness of neural networks

Ensemble adversarial training: Attacks and defenses

Towards Deep Learning Models Resistant to Adversarial Attacks

Boosting Adversarial Attacks with Momentum

Black Box Attack์€ ๊ธฐ์šธ๊ธฐ ์ •๋ณด๋ฅผ ํ™œ์šฉํ•˜์ง€ ๋ชปํ•˜๋ฏ€๋กœ, ๊ณต๊ฒฉ์ž์—๊ฒŒ ๋” ์–ด๋ ค์šด ํ™˜๊ฒฝ ์ด๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๊ฒฝ์šฐ์—๋Š” ๊ณต๊ฒฉ ๋Œ€์ƒ ๋ชจ๋ธ๊ณผ ๋น„์Šทํ•œ ๋ชจ๋ธ์ธ ๋Œ€๋ฆฌ ๋ชจ๋ธ์„ ํ™œ์šฉํ•˜๋Š” Transfer Attack์ด๋‚˜ ๋ชจ๋ธ์˜ ์ถœ๋ ฅ์„ ๋ฐ”ํƒ•์œผ๋กœ ๊ณต๊ฒฉํ•˜๋Š” Score-based, Decision-based Attack ๋“ฑ์ด ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ๋‹ค๋งŒ, ๋ชจ๋ธ์„ ์ง์ ‘์ ์œผ๋กœ ํ™œ์šฉํ•  ์ˆ˜ ์—†์œผ๋ฏ€๋กœ, ๋ชจ๋ธ์— ์ž…๋ ฅ ๊ฐ’์„ ์ „๋‹ฌํ•˜๊ณ  ์ถœ๋ ฅ์„ ๋ฐ›์•„์˜ค๋Š” *์ฟผ๋ฆฌ(Query)*๋ฅผ ์˜จ๋ผ์ธ์— ์š”์ฒญํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๋ฌธ์ œ๋Š” ๊ณต๊ฒฉ์„ ์œ„ํ•ด์„œ๋Š” ๋‹ค์ˆ˜์˜ ์ฟผ๋ฆฌ๊ฐ€ ์š”๊ตฌ๋˜๊ธฐ ๋•Œ๋ฌธ์— ๋งŽ์€ ์‹œ๊ฐ„์„ ์š”๊ตฌํ•ฉ๋‹ˆ๋‹ค.

[์ถ”์ฒœ ๋…ผ๋ฌธ]

Practical Black-Box Attacks against Machine Learning

ZOO: Zeroth Order Optimization based Black-box Attacks to Deep Neural Networks without Training Substitute Models

Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models

Black-box Adversarial Attacks with Limited Queries and Information

Poisoning Attack.

Poisoning Attack์€ ๋ชจ๋ธ์˜ ํ•™์Šต์„ ์ž˜๋ชป๋œ ๋ฐฉํ–ฅ์œผ๋กœ ์ด๋„๋Š” ๊ณต๊ฒฉ์œผ๋กœ, ์ฃผ๋กœ *ํ•™์Šต ๋ฐ์ดํ„ฐ(Training Data)*๋ฅผ ๋ณ€ํ™”์‹œ์ผœ์„œ ์ด๋ฅผ ์ด๋ฃจ์–ด๋‚ด๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.

ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ๋ณ€ํ™”์‹œํ‚ค๋Š” ๋ฐฉํ–ฅ์œผ๋กœ๋Š” ์—ฌ๋Ÿฌ ๋ฐฉํ–ฅ์ด ์กด์žฌํ•˜๋Š”๋ฐ, ๋ชจ๋ธ์˜ ์ „์ฒด์ ์ธ ์„ฑ๋Šฅ์„ ๋‚ฎ์ถ”๋„๋ก(Performance Degradation) ํ•˜๊ฑฐ๋‚˜, ํŠน์ • ์ด๋ฏธ์ง€ ํ˜น์€ ๋ผ๋ฒจ(Label)๋งŒ ํ‹€๋ฆฌ๊ฒŒ ํ•˜๋„๋ก(Targeted Poisoning) ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ์žˆ์Šต๋‹ˆ๋‹ค.

[์ถ”์ฒœ ๋…ผ๋ฌธ]

Towards Poisoning of Deep Learning Algorithms with Back-Gradient Optimization.

Poisoning Attacks with Generative Adversarial Nets

Transferable Clean-Label Poisoning Attacks on Deep Neural Nets.

๐Ÿ“ โ€‹์ ๋Œ€์  ๊ณต๊ฒฉ์˜ ์—ฐ๊ตฌ

์ ๋Œ€์  ๊ณต๊ฒฉ์€ ๋ง ๊ทธ๋Œ€๋กœ ๋”ฅ๋Ÿฌ๋‹์ด ์‹ค์ƒํ™œ์— ์ ์šฉ๋˜์—ˆ์„ ๋•Œ์— ์•…์šฉ๋  ์œ„ํ—˜์ด ํฌ๋‹ค๋Š” ์ ์—์„œ๋„ ์—ฐ๊ตฌ๋˜๊ณ  ์žˆ์ง€๋งŒ, ์ตœ๊ทผ์—๋Š” ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์ด ์™œ ์ ๋Œ€์  ๊ณต๊ฒฉ์ด ๊ฐ€๋Šฅํ•œ๊ฐ€์— ๋Œ€ํ•ด์„œ ๊ณ ์ฐฐํ•˜๋Š” ๋…ผ๋ฌธ์ด ๋งŽ์Šต๋‹ˆ๋‹ค. ์ดˆ๋ฐ˜์—๋Š” ๋ฐฉ์–ด ๊ธฐ๋ฒ•๊ณผ ๊ณต๊ฒฉ ๊ธฐ๋ฒ•์ด ๋ฒˆ๊ฐˆ์•„ ๋“ฑ์žฅํ•˜๋ฉด์„œ ์ฐฝ๊ณผ ๋ฐฉํŒจ์˜ ์‹ธ์›€์ด ์ด์–ด์กŒ๋‹ค๋ฉด, ์ตœ๊ทผ์—๋Š” ์–ด์งธ์„œ ์กฐ๊ทธ๋งˆํ•œ ๋…ธ์ด์ฆˆ์—๋„ ๋ชจ๋ธ์ด ๋ฏผ๊ฐํ•˜๊ฒŒ ๋ฐ˜์‘ํ•˜๋Š”์ง€, ์–ด๋–ป๊ฒŒํ•˜๋ฉด ๊ทธ๋Ÿฌํ•œ ๋ฏผ๊ฐํ•œ ๋ฐ˜์‘์„ ์ค„์ด๊ณ  ์•ˆ์ •์ (Stable)์ธ ๋ชจ๋ธ์„ ๋งŒ๋“ค์–ด๋‚ผ ์ˆ˜ ์žˆ๋Š”์ง€ ๋“ฑ ์ข€ ๋” ๊ทผ์›์ ์ธ ๋ถ€๋ถ„์— ๋Œ€ํ•œ ์—ฐ๊ตฌ๋„ ์ด๋ฃจ์–ด์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

์ ๋Œ€์  ๊ณต๊ฒฉ์˜ ๋ฐฉ์–ด ๊ธฐ๋ฒ•์œผ๋กœ๋Š” ํ•™์Šต ๊ณผ์ •์—์„œ ์ ๋Œ€์  ์˜ˆ์ œ๋ฅผ ๋งŒ๋“ค๊ณ  ํ™œ์šฉํ•˜๋Š” Adversarial Training, ํŠน์ • ํฌ๊ธฐ์˜ ๋…ธ์ด์ฆˆ์— ๋Œ€ํ•ด์„œ๋Š” ์ ˆ๋Œ€ ํ˜น์€ ๋†’์€ ํ™•๋ฅ ๋กœ ๊ณต๊ฒฉ ๋‹นํ•˜์ง€ ์•Š๊ฒŒ ํ•™์Šตํ•˜๋Š” Certified Training / Randomized Smoothing, ์ ๋Œ€์  ์˜ˆ์ œ๋ฅผ ์ž…๋ ฅ๋ถ€ํ„ฐ ๊ฑธ๋Ÿฌ๋‚ด๋Š” ์ด์ƒ ํƒ์ง€(Adversarial Example Detection) ๋“ฑ์ด ์žˆ์Šต๋‹ˆ๋‹ค.

[์ถ”์ฒœ ๋…ผ๋ฌธ]

Towards Robust Neural Networks via Random Self-ensemble

Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples

Understanding Measures of Uncertainty for Adversarial Example Detection

Adversarially Robust Generalization Requires More Data

Robustness May Be at Odds with Accuracy

On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models

Theoretically Principled Trade-off between Robustness and Accuracy

Adversarial Examples Are Not Bugs, They Are Features

Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks

์‚ฌ์šฉ ๋ฐฉ๋ฒ•

๐Ÿ“‹ ๊ฐœ๋ฐœ ํ™˜๊ฒฝ

  • torch>=1.4.0
  • python>=3.6

๐Ÿ”จ ์„ค์น˜ ๋ฐฉ๋ฒ• ๋ฐ ์‚ฌ์šฉ

  • pip install torchattacks or
  • git clone https://github.com/Harry24k/adversairal-attacks-pytorch
import torchattacks
atk = torchattacks.PGD(model, eps=8/255, alpha=2/255, steps=4)
adversarial_images = atk(images, labels)

Torchattack์€ ๋˜ํ•œ ์•„๋ž˜์˜ ๊ธฐ๋Šฅ๋„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

๊ณต๊ฒฉ ๋ผ๋ฒจ ์ •ํ•˜๊ธฐ

  • Random target label:
# random labels as target labels.
atk.set_mode_targeted_random(n_classses)
  • Least likely label:
# label with the k-th smallest probability used as target labels.
atk.set_mode_targeted_least_likely(kth_min)
  • By custom function:
# label from mapping function
atk.set_mode_targeted_by_function(target_map_function=lambda images, labels:(labels+1)%10)
  • Return to default:
atk.set_mode_default()

๋ฐ˜ํ™˜ ํ˜•์‹ ๋ฐ”๊พธ๊ธฐ

  • Return adversarial images with integer value (0-255).
atk.set_return_type(type='int')
  • Return adversarial images with float value (0-1).
atk.set_return_type(type='float')

์ ๋Œ€์  ์˜ˆ์ œ ์ €์žฅํ•˜๊ธฐ

```python atk.save(data_loader, save_path=None, verbose=True) ```

Training/Eval ๋ชจ๋“œ ๋ฐ”๊พธ๊ธฐ

# For RNN-based models, we cannot calculate gradients with eval mode.
# Thus, it should be changed to the training mode during the attack.
atk.set_training_mode(training=False)

๊ณต๊ฒฉ ์กฐํ•ฉํ•˜๊ธฐ

  • Strong attacks
atk1 = torchattacks.FGSM(model, eps=8/255)
atk2 = torchattacks.PGD(model, eps=8/255, alpha=2/255, iters=40, random_start=True)
atk = torchattacks.MultiAttack([atk1, atk2])
  • Binary serach for CW
atk1 = torchattacks.CW(model, c=0.1, steps=1000, lr=0.01)
atk2 = torchattacks.CW(model, c=01, steps=1000, lr=0.01)
atk = torchattacks.MultiAttack([atk1, atk2])
  • Random restarts
atk1 = torchattacks.PGD(model, eps=8/255, alpha=2/255, iters=40, random_start=True)
atk2 = torchattacks.PGD(model, eps=8/255, alpha=2/255, iters=40, random_start=True)
atk = torchattacks.MultiAttack([atk1, atk2])

๋” ์ž์„ธํ•œ ์ ์šฉ ๋ฐฉ๋ฒ•์€ ์•„๋ž˜ ๋ฐ๋ชจ๋“ค์„ ํ†ตํ•ด ์ตํž ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • White Box Attack with ImageNet (code, nbviewer): ResNet-18์„ ImageNet ๋ฐ์ดํ„ฐ์™€ torchattacks์„ ํ™œ์šฉํ•˜์—ฌ ์†์ด๋Š” ๋ฐ๋ชจ์ž…๋‹ˆ๋‹ค.
  • Transfer Attack with CIFAR10 (code, nbviewer): torchattacks์„ ํ™œ์šฉํ•˜์—ฌ Transfer Attack์„ ์‹คํ–‰ํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.
  • Adversairal Training with MNIST (code, nbviewer): torchattacks์„ ํ™œ์šฉํ•˜์—ฌ Adversarial Training์„ ํ•˜๋Š” ์ฝ”๋“œ์ž…๋‹ˆ๋‹ค.

โš ๏ธ ์ฃผ์˜ ์‚ฌํ•ญ

  • ๋ชจ๋“  ์ด๋ฏธ์ง€๋Š” transform[to.Tensor()]์„ ํ™œ์šฉํ•˜์—ฌ [0, 1]๋กœ ์ž…๋ ฅ๋˜์–ด์•ผํ•ฉ๋‹ˆ๋‹ค! ๋ณธ๋ž˜ PyTorch์—์„œ๋Š” transform์„ ํ†ตํ•ด ์ง€์›๋˜๋Š” normalization์„ ํ™œ์šฉํ•˜๊ณ ๋Š” ํ•ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ, ์ ๋Œ€์  ๊ณต๊ฒฉ์˜ ํŠน์ง• ์ƒ ์ตœ๋Œ€ ์„ญ๋™(Maximum Perturbtion) ๋ฒ”์œ„๋ฅผ ์ฃผ๊ฑฐ๋‚˜ ์ด๋ฅผ ๋น„์šฉ์œผ๋กœ ํ™œ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ์ž…๋ ฅ ์ด๋ฏธ์ง€๊ฐ€ [0, 1]์ผ ๋•Œ ์ •ํ™•ํžˆ ์ ๋Œ€์  ์˜ˆ์ œ๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ, normalization์„ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๋Š” ๊ณผ์ •์ด ์•„๋‹ˆ๋ผ ๋ชจ๋ธ์˜ ์•ˆ์— ์‚ฝ์ž…ํ•˜์—ฌ์•ผํ•ฉ๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ ์ด ๋ฐ๋ชจ๋ฅผ ์ฐธ๊ณ  ๋ถ€ํƒ๋“œ๋ฆฝ๋‹ˆ๋‹ค.

  • ๋ชจ๋“  ๋ชจ๋ธ์€ (N, C) ํ˜•ํƒœ์˜ tensor๋ฅผ ์ถœ๋ ฅํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ N์€ ๋ฐฐ์น˜(Batch)์˜ ๊ฐœ์ˆ˜, C๋Š” ์ •๋‹ต ํด๋ž˜์Šค(class)์˜ ๊ฐœ์ˆ˜์ž…๋‹ˆ๋‹ค! ์ฃผ์–ด์ง€๋Š” ๋ชจ๋ธ์€ torchvision.models๊ณผ์˜ ํ˜ธํ™˜์„ ์œ„ํ•ด์„œ ํ™•๋ฅ  ๋ฒกํ„ฐ๋กœ ์‚ฌ์šฉ๋  (N,C) ๋งŒ์„ ์ถœ๋ ฅํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค. ๋งŒ์•ฝ ๊ทธ๋ ‡์ง€ ์•Š์„ ๊ฒฝ์šฐ, ๋ชจ๋ธ์˜ ์ถœ๋ ฅ์„ ์กฐ์ ˆํ•  ์ˆ˜ ์žˆ๋Š” ๋ ˆ์ด์–ด(Layer)๋ฅผ ์ด ๋ฐ๋ชจ์™€ ๊ฐ™์ด ์ถ”๊ฐ€ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.

  • ๋งŒ์•ฝ ๋งค๋ฒˆ ๋˜‘๊ฐ™์€ ์ ๋Œ€์  ์˜ˆ์ œ๊ฐ€ ๋‚˜์˜ค๊ฒŒ ํ•˜๋ ค๋ฉด, torch.backends.cudnn.deterministic = True๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. GPU์—์„œ ์ด๋ฃจ์–ด์ง€๋Š” ์—ฐ์‚ฐ์€ non-deterministicํ•œ ๊ฒฝ์šฐ๋„ ์žˆ๊ธฐ ๋•Œ๋ฌธ์—, ๊ธฐ์šธ๊ธฐ๋ฅผ ํ™œ์šฉํ•˜๋Š” ์ ๋Œ€์  ๊ณต๊ฒฉ์€ ๋ชจ๋“  ํ™˜๊ฒฝ์ด ๋˜‘๊ฐ™๋”๋ผ๋„ ํ•ญ์ƒ ๋˜‘๊ฐ™์€ ๊ฐ’์„ ์ถœ๋ ฅํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค [discuss]. ๋”ฐ๋ผ์„œ, ์ด๋ฅผ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” GPU์˜ ๋žœ๋ค์„ฑ์„ ๊ณ ์ •ํ•˜๋„๋ก ๋‹ค์Œ ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. torch.backends.cudnn.deterministic = True[ref].

์ธ์šฉ

๋ณธ ํŒจํ‚ค์ง€๋ฅผ ์‚ฌ์šฉํ•˜์‹ ๋‹ค๋ฉด ์•„๋ž˜๋ฅผ ์ธ์šฉ ๋ถ€ํƒ๋“œ๋ฆฝ๋‹ˆ๋‹ค :)

@article{kim2020torchattacks,
  title={Torchattacks: A pytorch repository for adversarial attacks},
  author={Kim, Hoki},
  journal={arXiv preprint arXiv:2010.01950},
  year={2020}
}

200% ํ™œ์šฉํ•˜๊ธฐ

Torchattacks์€ ๋‹ค๋ฅธ ์œ ๋ช…ํ•œ ์ ๋Œ€์  ๊ณต๊ฒฉ ํŒจํ‚ค์ง€์™€๋„ ํ˜ธํ™˜ํ•  ์ˆ˜ ์žˆ๋„๋ก ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ, ๋‹ค๋ฅธ ํŒจํ‚ค์ง€์˜ ๊ณต๊ฒฉ์„ torchattacks๋กœ ์ด์‹ํ• ๊ฒฝ์šฐ, ์ ๋Œ€์  ์˜ˆ์ œ๋ฅผ ์ €์žฅํ•  ์ˆ˜ ์žˆ๋Š” save๋‚˜ multiattack์„ ํ™œ์šฉํ•˜์—ฌ ๋” ๊ฐ•ํ•œ ๊ณต๊ฒฉ์„ ๋งŒ๋“ค์–ด๋‚ผ ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

๐ŸŒŒ FoolBox

from torchattacks.attack import Attack
import foolbox as fb

class L2BrendelBethge(Attack):
    def __init__(self, model):
        super(L2BrendelBethge, self).__init__("L2BrendelBethge", model)
        self.fmodel = fb.PyTorchModel(self.model, bounds=(0,1), device=self.device)
        self.init_attack = fb.attacks.DatasetAttack()
        self.adversary = fb.attacks.L2BrendelBethgeAttack(init_attack=self.init_attack)
        self._attack_mode = 'only_default'
        
    def forward(self, images, labels):
        images, labels = images.to(self.device), labels.to(self.device)
        
        # DatasetAttack
        batch_size = len(images)
        batches = [(images[:batch_size//2], labels[:batch_size//2]),
                   (images[batch_size//2:], labels[batch_size//2:])]
        self.init_attack.feed(model=self.fmodel, inputs=batches[0][0]) # feed 1st batch of inputs
        self.init_attack.feed(model=self.fmodel, inputs=batches[1][0]) # feed 2nd batch of inputs
        criterion = fb.Misclassification(labels)
        init_advs = self.init_attack.run(self.fmodel, images, criterion)
        
        # L2BrendelBethge
        adv_images = self.adversary.run(self.fmodel, images, labels, starting_points=init_advs)
        return adv_images

atk = L2BrendelBethge(model)
atk.save(data_loader=test_loader, save_path="_temp.pt", verbose=True)

๐ŸŒŒ Adversarial-Robustness-Toolbox (ART)

import torch.nn as nn
import torch.optim as optim

from torchattacks.attack import Attack

import art.attacks.evasion as evasion
from art.classifiers import PyTorchClassifier

class JSMA(Attack):
    def __init__(self, model, theta=1/255, gamma=0.15, batch_size=128):
        super(JSMA, self).__init__("JSMA", model)
        self.classifier = PyTorchClassifier(
                            model=self.model, clip_values=(0, 1),
                            loss=nn.CrossEntropyLoss(),
                            optimizer=optim.Adam(self.model.parameters(), lr=0.01),
                            input_shape=(1, 28, 28), nb_classes=10)
        self.adversary = evasion.SaliencyMapMethod(classifier=self.classifier,
                                                   theta=theta, gamma=gamma,
                                                   batch_size=batch_size)
        self.target_map_function = lambda labels: (labels+1)%10
        self._attack_mode = 'only_default'
        
    def forward(self, images, labels):
        adv_images = self.adversary.generate(images, self.target_map_function(labels))
        return torch.tensor(adv_images).to(self.device)

atk = JSMA(model)
atk.save(data_loader=test_loader, save_path="_temp.pt", verbose=True)

๊ธฐ์—ฌํ•˜๊ธฐ

์–ด๋–ค ์ข…๋ฅ˜์˜ ๊ธฐ์—ฌ๋ผ๋„ ํ•ญ์ƒ ๊ฐ์‚ฌ๋“œ๋ฆฌ๋ฉฐ, ์˜ค๋ฅ˜๊ฐ€ ์žˆ๋‹ค๋ฉด ๋ง์„ค์ž„ ์—†์ด ์ง€์  ๋ถ€ํƒ๋“œ๋ฆฝ๋‹ˆ๋‹ค. ๐Ÿ˜Š

๋งŒ์•ฝ, ์ƒˆ๋กœ์šด ๊ณต๊ฒฉ์„ ์ œ์•ˆํ•˜๊ณ  ์‹ถ๋‹ค๋ฉด CONTRIBUTING.md์„ ์ฐธ๊ณ ํ•ด์ฃผ์„ธ์š”!

์ถ”์ฒœํ•˜๋Š” ๋‹ค๋ฅธ ํŒจํ‚ค์ง€ ๋ฐ ์‚ฌ์ดํŠธ