📓 nodes/20230809150646-glow.org by @tekakutli-org ☆

ANTI REGULATION - GLOWS
JGAAP: De-Anonymousposting (stylometry)
- counter: remove all punctuation marks and make everything lower case
- counter: purposeful grammatical and spelling errors (antigeographical location)
RainbowTeaming: Open-Ended Generation of Diverse Adversarial Prompts
- categorizing strings determining likelihood of prompt being unsafe using different attack-styles
Towards ImplicitPrompt For Text-To-Image Models
- implicit prompts: hint at a target without explicitly mentioning it
- censorship can be bypassed with implicit prompts

DETECTING HUMAN

SPOOFING

Transparency Attacks How Imperceptible Image Layers Can Fool AI Perception
- dataset poisoning using the attack to mislabel a collection, in background(hidden) layer in grayscale
- cause mislabeling
- use cases:evading facial recognition and surveillance, digital watermarking, content filtering, dataset curating, automotive and drone autonomy, forensic evidence tampering, and retail product misclassifying

parent: stable_diffusion
AmbientDiffusion: train diffusion models given only corrupted images as input (copyrightless-ed)
Seeing the Worldthrough Your Eyes (getting image from reflection of the eyes)
LoRA Fine-tuningEfficiently Undoes Safety Training in Llama 2-Chat 70B
- successfully undo the safety training using lora
Cheating Suffix Targeted Attack to Text-To-Image Diffusion Models with Multi-Modal Priors
- MMP-Attack, confusing a model into adding a target object into the image content while simultaneously removing the original object

AEROBLADE Training-Free Detection of Latent Diffusion Images Using Autoencoder Reconstruction Error
- does not require any training
Organic orDiffused: Can We Distinguish Human Art from AI-generated Images?

erasing concepts https://note.com/gcem156/n/n9f74d7d1417c
- Using stable diffusioneraser to replace a concept in one model with the same concept from another
- Forget-Me-Not Learning toForget in Text-to-Image Diffusion Models
- All butOne: Surgical Concept Erasing with Model Preservation in Text-to-Image Diffusion Models
  - new approach without issues
ORES Open-vocabulary Responsible Visual Synthesis
- synthesize images avoiding concepts but following query as much as possible
- using a llm
One-dimensionalAdapter to Rule Them All: Concepts, Diffusion Models and Erasing Applications
- solution to erase or edit concepts for diffusion models (DMs), 0.5% extra parameters of the DM
EraseDiff Erasing Data Influence in Diffusion Models
SepME Separable Multi-Concept Erasure from Diffusion Models
- avoid unlearning substantial information
MACE Mass Concept Erasure in Diffusion Models
- successfully scaling the erasure scope up to 100 concepts and balancing generality and specificity
RobustConcept Erasure Using Task Vectors
- concept erasure that uses Task Vectors (TV) is more robust to unexpected user inputs
- diverse inversion: used to estimate the required strength of the tv edit
  - apply a TV edit only to a subset of the model weights

TOFU A Task of Fictitious Unlearning for LLMs
- so that it truly behaves as if never trained on the forgeted data
Scissorhands Scrub Data Influence via Connection Sensitivity in Networks
- retrains the trimmed model through a optimization process
- seeking parameters that preserve information on the remaining data while discarding information related to the forgetting data

and watermarking
CopyRNeRF Protecting the CopyRight of Neural Radiance Fields
- replacing the original color representation in NeRF with a watermarked color representation
Tree-Ring Watermarks Fingerprints for Diffusion Images that are Invisible and Robust
- patterns hiddens in fourier space
WOUAFWeight Modulation for User Attribution and Fingerprinting in Text-to-Image Diffusion Models
- model fingerprinting that assigns responsibility for the generated images
FLIRT Feedback Loop In-context Red Teaming
- automatic framework that exposes unsafe, inappropriate, content generation and vulnerabilities
ZoDiac Robust Image Watermarking using Stable Diffusion
- inject a watermark into the trainable latent space, which are reliably detected in the latent vector
RAW A Robust and Agile Plug-and-Play Watermark Framework for AI-Generated Images with Provable Guarantees

MimicDiffusion Purifying Adversarial Perturbation via Mimicking Clean Diffusion Model
- purification technique, approximates the clean image as input
DisDet Exploring Detectability of Backdoor Attack on Diffusion Models
- detecting poisoned input noise, 100% detection rate for trojan triggers