Security researchers from Trail of Bits have introduced a groundbreaking attack method that can exploit artificial intelligence (AI) models through manipulated images. The technique, developed by Kikimora Morozova and Suha Sabi Hussain, targets vulnerabilities inherent in the way many AI systems process images, potentially allowing unauthorized access to sensitive user data.

The attack builds on earlier concepts, first outlined in a 2020 study by researchers at TU Braunschweig, which described the potential for “image scaling attacks” in machine learning systems. Trail of Bits has adapted this principle for contemporary AI applications, demonstrating its effectiveness in real-world scenarios. Most AI systems automatically reduce uploaded images to save on computing resources, employing common resampling algorithms such as “Nearest Neighbor,” “Bilinear,” and “Bicubic.”

These algorithms can unintentionally reveal hidden patterns in the original images when downscaled. For instance, a manipulated image might include instructions that remain invisible to the naked eye but become apparent during the downscaling process. In one example, a dark area in an image appeared red after processing, exposing previously hidden black text. This text was then misinterpreted by the AI model as legitimate user input, allowing harmful commands to execute without the user’s knowledge.

In practical tests, the researchers successfully extracted calendar data from a Google account and forwarded it to an external email address using the Gemini CLI tool. This discovery highlights the potential risks posed by various platforms, including the Google Assistant on Android, Google’s Gemini models, and the Vertex AI Studio.

To illustrate the risks further, the researchers have released an open-source tool that generates images specifically designed to exploit different downscaling methods. This tool allows for a clearer understanding of how these attacks can be orchestrated.

In response to these vulnerabilities, the researchers recommend several defensive measures. They suggest limiting the size of images that users can upload and providing a preview of the downscaled versions before final confirmation. Additionally, they emphasize that safety-critical actions should not be automated; instead, they should always require explicit user confirmation, particularly when extracting text from images.

The most crucial protective strategy, according to the researchers, is to establish a secure system design that is inherently resistant to prompt injection attacks. By implementing systematic protective mechanisms, developers can prevent multimodal AI applications from becoming conduits for data abuse. The implications of these findings are significant, as they underscore the importance of robust cybersecurity measures in the rapidly evolving landscape of AI technology.