JGAAP: De-Anonymousposting (stylometry)
counter: remove all punctuation marks and make everything lower case
counter: purposeful grammatical and spelling errors (antigeographical location)
RainbowTeaming: Open-Ended Generation of Diverse Adversarial Prompts
categorizing strings determining likelihood of prompt being unsafe using different attack-styles
Towards ImplicitPrompt For Text-To-Image Models
implicit prompts: hint at a target without explicitly mentioning it
censorship can be bypassed with implicit prompts
DensePose From WiFithree wifis to get pose of human
SoundCam A Dataset for Finding Humans Using Room Acoustics
Transparency Attacks How Imperceptible Image Layers Can Fool AI Perception
dataset poisoning using the attack to mislabel a collection, in background(hidden) layer in grayscale
cause mislabeling
use cases:evading facial recognition and surveillance, digital watermarking, content filtering, dataset curating, automotive and drone autonomy, forensic evidence tampering, and retail product misclassifying
PRIME Protect Your Videos From Malicious Editing
parent: stable_diffusion
AmbientDiffusion: train diffusion models given only corrupted images as input (copyrightless-ed)
Seeing the Worldthrough Your Eyes (getting image from reflection of the eyes)
LoRA Fine-tuningEfficiently Undoes Safety Training in Llama 2-Chat 70B
successfully undo the safety training using lora
CheatingSuffix Targeted Attack to Text-To-Image Diffusion Models with Multi-Modal Priors
MMP-Attack, confusing a model into adding a target object into the image content while simultaneously removing the original object
AEROBLADE Training-Free Detection of Latent Diffusion Images Using Autoencoder Reconstruction Error
does not require any training
Organic orDiffused: Can We Distinguish Human Art from AI-generated Images?
erasingconceptshttps://note.com/gcem156/n/n9f74d7d1417c
Using stable diffusioneraser to replace a concept in one model with the same concept from another
Forget-Me-Not Learning toForget in Text-to-Image Diffusion Models
All butOne: Surgical Concept Erasing with Model Preservation in Text-to-Image Diffusion Models
new approach without issues
ORES Open-vocabulary Responsible Visual Synthesis
synthesize images avoiding concepts but following query as much as possible
using a llm
One-dimensionalAdapter to Rule Them All: Concepts, Diffusion Models and Erasing Applications
solution to erase or edit concepts for diffusion models (DMs), 0.5% extra parameters of the DM
EraseDiff Erasing Data Influence in Diffusion Models
SepME Separable Multi-Concept Erasure from Diffusion Models
avoid unlearning substantial information
MACE Mass Concept Erasure in Diffusion Models
successfully scaling the erasure scope up to 100 concepts and balancing generality and specificity
RobustConcept Erasure Using Task Vectors
concept erasure that uses Task Vectors (TV) is more robust to unexpected user inputs
diverse inversion: used to estimate the required strength of the tv edit
apply a TV edit only to a subset of the model weights
TOFU A Task of Fictitious Unlearning for LLMs
so that it truly behaves as if never trained on the forgeted data
Scissorhands Scrub Data Influence via Connection Sensitivity in Networks
retrains the trimmed model through a optimization process
seeking parameters that preserve information on the remaining data while discarding information related to the forgetting data
and watermarking
CopyRNeRF Protecting the CopyRight of Neural Radiance Fields
replacing the original color representation in NeRF with a watermarked color representation
Tree-Ring Watermarks Fingerprints for Diffusion Images that are Invisible and Robust
patterns hiddens in fourier space
WOUAFWeight Modulation for User Attribution and Fingerprinting in Text-to-Image Diffusion Models
model fingerprinting that assigns responsibility for the generated images
FLIRT Feedback Loop In-context Red Teaming
automatic framework that exposes unsafe, inappropriate, content generation and vulnerabilities
ZoDiac Robust Image Watermarking using Stable Diffusion
inject a watermark into the trainable latent space, which are reliably detected in the latent vector
RAW A Robust and Agile Plug-and-Play Watermark Framework for AI-Generated Images with Provable Guarantees
MimicDiffusion Purifying Adversarial Perturbation via Mimicking Clean Diffusion Model
purification technique, approximates the clean image as input
DisDet Exploring Detectability of Backdoor Attack on Diffusion Models
detecting poisoned input noise, 100% detection rate for trojan triggers