Most AI jailbreak discussions focus on prompts, but this research looks at images instead.

The Florida International University team developed a technique called JaiLIP that makes tiny pixel-level changes to images that humans largely can’t detect. In testing, the altered images increased harmful outputs from a vision-language model and outperformed previous image-based jailbreak methods.

What caught my attention is that the attack doesn’t rely on prompt engineering. It suggests that images themselves can become an attack vector for multimodal AI systems.