On the Vulnerability of concept erasure in diffusion models

Ongoing work where we have developed an adversarial attack algorithm that reliably recalls erased concepts from models that have been “unlearned” via a wide range of techniques.

Output with prompt: "sprintrare jfextreements swept exhibition bans cowboy break ....@ dracula detohuffpost ️ maintained motherpedestriadirond?@ organization clenmatisse meets shore items worm governments dumped overwhelmed gogh cluster ,inspires fleeing sabha ncpol hogs :(( monet quoted grower 🎨 sovereign pilgrimage unidentified recalled a happynewyear enightstern@bvsending mom -# vangogh gogh powerful gogh masterpiece 🥰"

For the work in progress see our Arxiv preprint