:PROPERTIES: :ID: b3f064b1-696c-48f0-af8c-e77b9280b053 :END: #+title: glow #+filetags: :neuralnomicon: #+SETUPFILE: https://fniessen.github.io/org-html-themes/org/theme-readtheorg.setup - [[id:3bdb575f-527b-4cf6-bbdb-134e903e1bf5][ANTI REGULATION - GLOWS]] - JGAAP: [[https://github.com/evllabs/JGAAP][De-Anonymous]] posting (stylometry) - counter: remove all punctuation marks and make everything lower case - counter: purposeful grammatical and spelling errors (antigeographical location) - [[https://arxiv.org/pdf/2402.16822.pdf][Rainbow]] Teaming: Open-Ended Generation of Diverse Adversarial Prompts - categorizing strings determining likelihood of prompt being unsafe using different attack-styles - [[https://arxiv.org/abs/2403.02118][Towards Implicit]] Prompt For Text-To-Image Models - implicit prompts: hint at a target without explicitly mentioning it - censorship can be bypassed with implicit prompts * DETECTING HUMAN :PROPERTIES: :ID: c8b0c87f-b5e4-4720-a50b-253bd7f3a329 :END: - [[https://arxiv.org/abs/2301.00250][DensePose From WiFi]] three wifis to get pose of human - [[https://twitter.com/_akhaliq/status/1722110265649508642][SoundCam]]: A Dataset for Finding Humans Using Room Acoustics * SPOOFING - [[https://arxiv.org/abs/2401.15817][Transparency Attacks]]: How Imperceptible Image Layers Can Fool AI Perception - dataset poisoning using the attack to mislabel a collection, in background(hidden) layer in grayscale - cause mislabeling - use cases:evading facial recognition and surveillance, digital watermarking, content filtering, dataset curating, automotive and drone autonomy, forensic evidence tampering, and retail product misclassifying ** VIDEO GLOW - [[https://browse.arxiv.org/abs/2402.01239][PRIME]]: Protect Your Videos From Malicious Editing * DIFFUSION CENSOR :PROPERTIES: :ID: 3bdb575f-527b-4cf6-bbdb-134e903e1bf5 :END: - parent: [[id:c7fe7e79-73d3-4cc7-a673-2c2e259ab5b5][stable_diffusion]] - [[https://twitter.com/giannis_daras/status/1663710057400524800][Ambient]] Diffusion: train diffusion models given only *corrupted* images as input (copyrightless-ed) - [[https://twitter.com/_akhaliq/status/1669536531298516993][Seeing the World]] through Your Eyes (getting image from reflection of the eyes) - [[https://twitter.com/_akhaliq/status/1719559635101692011][LoRA Fine-tuning]] Efficiently Undoes Safety Training in Llama 2-Chat 70B - successfully undo the safety training using lora - [[https://arxiv.org/abs/2402.01369][Cheating]] [[https://github.com/ydc123/MMP-Attack][Suffix]]: Targeted Attack to Text-To-Image Diffusion Models with Multi-Modal Priors - MMP-Attack, confusing a model into adding a target object into the image content while simultaneously removing the original object ** DETECTING AI GENERATED - [[https://arxiv.org/abs/2401.17879][AEROBLADE]]: Training-Free Detection of Latent Diffusion Images Using Autoencoder Reconstruction Error - does not require any training - [[https://arxiv.org/abs/2402.03214][Organic or]] Diffused: Can We Distinguish Human Art from AI-generated Images? ** ERASING CONCEPTS :PROPERTIES: :ID: 7f6f5bc1-ca59-4557-b908-0345e8127cde :END: - [[https://arxiv.org/pdf/2303.07345.pdf][erasing]] [[https://github.com/rohitgandikota/erasing][concepts]] https://note.com/gcem156/n/n9f74d7d1417c - [[https://www.reddit.com/r/StableDiffusion/comments/125dli7/using_stable_diffusion_eraser_to_replace_a/][Using stable diffusion]] eraser to replace a concept in one model with the same concept from another - [[https://arxiv.org/abs/2303.17591][Forget-Me-Not]]: [[https://github.com/SHI-Labs/Forget-Me-Not][Learning to]] Forget in Text-to-Image Diffusion Models - [[https://arxiv.org/abs/2312.12807][All but]] One: Surgical Concept Erasing with Model Preservation in Text-to-Image Diffusion Models - new approach without issues - [[https://twitter.com/_akhaliq/status/1696495635426681240][ORES]]: Open-vocabulary Responsible Visual Synthesis - synthesize images avoiding concepts but following query as much as possible - using a llm - [[https://github.com/Con6924/SPM][One-dimensional]] Adapter to Rule Them All: Concepts, Diffusion Models and Erasing Applications - solution to erase or edit concepts for diffusion models (DMs), 0.5% extra parameters of the DM - [[https://arxiv.org/abs/2401.05779][EraseDiff]]: Erasing Data Influence in Diffusion Models - [[https://arxiv.org/abs/2402.05947][SepME]]: Separable Multi-Concept Erasure from Diffusion Models - avoid unlearning substantial information - [[https://github.com/Shilin-LU/MACE][MACE]]: Mass Concept Erasure in Diffusion Models - successfully scaling the erasure scope up to 100 concepts and balancing generality and specificity - [[https://arxiv.org/abs/2404.03631][Robust]] Concept Erasure Using Task Vectors - concept erasure that uses Task Vectors (TV) is more robust to unexpected user inputs - diverse inversion: used to estimate the required strength of the tv edit - apply a TV edit only to a subset of the model weights *** LLM - [[https://twitter.com/_akhaliq/status/1745643293839327268][TOFU]]: A Task of Fictitious Unlearning for LLMs - so that it truly behaves as if never trained on the forgeted data - [[https://arxiv.org/abs/2401.06187][Scissorhands]]: Scrub Data Influence via Connection Sensitivity in Networks - retrains the trimmed model through a optimization process - seeking parameters that preserve information on the remaining data while discarding information related to the forgetting data ** FINGERPRINTING - and watermarking - [[https://twitter.com/_akhaliq/status/1683345535913308160][CopyRNeRF]]: Protecting the CopyRight of Neural Radiance Fields - replacing the original color representation in NeRF with a watermarked color representation - [[https://twitter.com/_akhaliq/status/1664073210487267335][Tree-Ring Watermarks]]: Fingerprints for Diffusion Images that are Invisible and Robust - patterns hiddens in fourier space - [[https://twitter.com/_akhaliq/status/1683678703048613888][WOUAF]]:Weight Modulation for User Attribution and Fingerprinting in Text-to-Image Diffusion Models - model fingerprinting that assigns responsibility for the generated images - [[https://twitter.com/_akhaliq/status/1689132104515244032][FLIRT]]: Feedback Loop In-context Red Teaming - automatic framework that exposes unsafe, inappropriate, content generation and vulnerabilities - [[https://arxiv.org/abs/2401.04247][ZoDiac]]: Robust Image Watermarking using Stable Diffusion - inject a watermark into the trainable latent space, which are reliably detected in the latent vector - [[https://arxiv.org/abs/2403.18774][RAW]]: A Robust and Agile Plug-and-Play Watermark Framework for AI-Generated Images with Provable Guarantees ** ANTI-GLOW - [[https://arxiv.org/abs/2312.04802][MimicDiffusion]]: Purifying Adversarial Perturbation via Mimicking Clean Diffusion Model - purification technique, approximates the clean image as input - [[https://arxiv.org/abs/2402.02739][DisDet]]: Exploring Detectability of Backdoor Attack on Diffusion Models - detecting poisoned input noise, 100% detection rate for trojan triggers