AIM Intelligence Blog

AIM Intelligence Blog https://aim-intelligence.vercel.app/blog Latest insights on AI security, red teaming, LLM safety, and enterprise AI from the AIM Intelligence research and engineering teams. en-us Wed, 10 Jun 2026 07:12:03 GMT 60 https://aim-intelligence.vercel.app/images/logo_top.svg AIM Intelligence Blog https://aim-intelligence.vercel.app/blog BadHost (CVE-2026-48710): The Starlette Vulnerability Threatening Millions of AI Agents https://aim-intelligence.vercel.app/blog/badhost-cve-2026-48710 https://aim-intelligence.vercel.app/blog/badhost-cve-2026-48710 CVE-2026-48710, dubbed BadHost, is a critical host header injection vulnerability in the Starlette Python web framework that allows unauthenticated attackers to bypass path-based authentication. With 325 million weekly downloads, Starlette underpins FastAPI, vLLM, LiteLLM, and virtually every Python MCP server — putting millions of AI agents at risk. Sysdig has documented the first in-the-wild case of an LLM agent autonomously exploiting a related vulnerability to exfiltrate an AWS database in under two minutes. yonggyu kim SECURITY Wed, 10 Jun 2026 07:12:03 GMT Tool-Mediated Belief Injection: How Tool Outputs Can Cascade Into Model Misalignment https://aim-intelligence.vercel.app/blog/tool-mediated-belief-injection https://aim-intelligence.vercel.app/blog/tool-mediated-belief-injection When we deploy language models with access to external tools, we dramatically expand their capabilities. However, tool access also introduces new attack surfaces that differ fundamentally from traditional prompt injection. We document how adversarially crafted tool outputs can establish false premises that persist and compound across a conversation. Siddhant RESEARCH Sun, 30 Nov 2025 00:00:00 GMT MisalignmentBench: How We Social Engineered LLMs Into Breaking Their Own Alignment https://aim-intelligence.vercel.app/blog/misalignment-bench https://aim-intelligence.vercel.app/blog/misalignment-bench We got frontier models to lie, manipulate, and self-preserve. Not through prompt injection or jailbreaks. We deployed them in contextually rich scenarios with specific roles and guidelines. The models broke their own alignment trying to navigate the situations we created. Siddhant RESEARCH Thu, 14 Aug 2025 00:00:00 GMT How ELITE Reveals Dangerous Weaknesses in Vision-Language AI https://aim-intelligence.vercel.app/blog/elite-vlm-safety https://aim-intelligence.vercel.app/blog/elite-vlm-safety As AI systems evolve to process images and text together, the risks grow exponentially. ELITE doesn't just measure whether a model is 'safe' — it evaluates how dangerous its outputs could be with precision that rivals human reviewers. Eugene Choi RESEARCH Thu, 29 May 2025 00:00:00 GMT Pressure Point: How One Bad Metric Can Push AI Toward a Fatal Choice https://aim-intelligence.vercel.app/blog/pressure-point https://aim-intelligence.vercel.app/blog/pressure-point In a simulated earthquake response scenario, Claude 4 Opus was given conflicting rules. When pressured by authority, it reversed its ethical decision and recommended letting a critical patient die to optimize an efficiency score. Siddhant Panpatil RESEARCH Mon, 26 May 2025 00:00:00 GMT Exploiting MCP: Emerging Security Threats in Large Language Models (LLMs) https://aim-intelligence.vercel.app/blog/exploiting-mcp https://aim-intelligence.vercel.app/blog/exploiting-mcp Discover how attackers exploit vulnerabilities in the Model Context Protocol (MCP) to manipulate Large Language Models (LLMs), steal data, and disrupt operations. Learn real-world attack scenarios and defense strategies. Eugene Choi SECURITY Wed, 21 May 2025 00:00:00 GMT Making AI Safer with SPA-VL: A New Dataset for Ethical Vision-Language Models https://aim-intelligence.vercel.app/blog/spa-vl-dataset https://aim-intelligence.vercel.app/blog/spa-vl-dataset SPA-VL is a meticulously designed dataset that sets a new standard for safety alignment in VLMs, incorporating diversity, feedback, and real-world relevance to ensure AI systems are both powerful and ethical. Eugene Choi RESEARCH Wed, 27 Nov 2024 00:00:00 GMT The Hidden Threat: Understanding Indirect Prompt Injection in LLMs https://aim-intelligence.vercel.app/blog/indirect-prompt-injection https://aim-intelligence.vercel.app/blog/indirect-prompt-injection Indirect Prompt Injection (IPI) is a sophisticated attack that manipulates how LLM-integrated applications process external data, causing them to misinterpret maliciously crafted inputs as commands. Sejin SECURITY Mon, 25 Nov 2024 00:00:00 GMT Introducing AI Safety Benchmark v0.5: MLCommons' Initiative https://aim-intelligence.vercel.app/blog/ai-safety-benchmark https://aim-intelligence.vercel.app/blog/ai-safety-benchmark AI Safety Benchmark v0.5 is a proof-of-concept benchmark designed to evaluate the safety of text-based generative language models, providing a structured approach to assess potential risks. Eugene Choi RESEARCH Mon, 18 Nov 2024 00:00:00 GMT Indirect Prompt Injection Attacks Against Web Agents https://aim-intelligence.vercel.app/blog/indirect-prompt-injection-web-agent https://aim-intelligence.vercel.app/blog/indirect-prompt-injection-web-agent Explore how EIA, AdvWeb, and WIPI attack methods exploit vulnerabilities in VLM-powered web agents, revealing serious security concerns for AI systems that interact with web environments. Jiankimr SECURITY Fri, 15 Nov 2024 00:00:00 GMT AIM Red Team: Leveraging Psychological Personas for Advanced LLM Jailbreaking Strategies https://aim-intelligence.vercel.app/blog/aim-red-team https://aim-intelligence.vercel.app/blog/aim-red-team Explore how psychological persona-based approaches can be used to test LLM vulnerabilities through single-turn and multi-turn jailbreaking scenarios based on Big Five personality traits. Hyunjun Kim RESEARCH Fri, 15 Nov 2024 00:00:00 GMT Refining Vision-Language Model Benchmarks: Base Query Generation and Toxicity Analysis https://aim-intelligence.vercel.app/blog/vlm-benchmarks-toxicity https://aim-intelligence.vercel.app/blog/vlm-benchmarks-toxicity For existing VLM Safety benchmarks, there are cases where the text alone is sufficiently informative without the image. We explore base query generation and toxicity measurement methods. Eugene Choi RESEARCH Sat, 09 Nov 2024 00:00:00 GMT Defending Web Agents: Advanced Security Strategies through AdvWeb and BrowserART https://aim-intelligence.vercel.app/blog/defending-web-agents https://aim-intelligence.vercel.app/blog/defending-web-agents Explore cutting-edge methodologies for identifying and mitigating vulnerabilities in VLM-powered web agents, including the AdvWeb attack framework and BrowserART red teaming toolkit. Sejin SECURITY Sat, 09 Nov 2024 00:00:00 GMT AIM RED TEAM: Insights from the KAIST Lab Meeting on Persona-Based Jailbreak Strategies https://aim-intelligence.vercel.app/blog/kaist-lab-meeting https://aim-intelligence.vercel.app/blog/kaist-lab-meeting This week, we held a productive meeting with the KAIST lab to refine the direction of our ongoing research project and to solidify our experimental design. The focus was on integrating psychological approaches with LLMs to design jailbreak prompts. Hyunjun Kim RESEARCH Fri, 08 Nov 2024 00:00:00 GMT Evaluating Text-based VLM Attack Methods: In-depth Look at Figstep https://aim-intelligence.vercel.app/blog/figstep-vlm-attacks https://aim-intelligence.vercel.app/blog/figstep-vlm-attacks To evaluate VLM Safety, it is essential to develop a secure model that incorporates the unique characteristics of VLMs. We analyze Figstep and RTVLM datasets to assess typographic visual prompt attacks. Doehyeon RESEARCH Sat, 02 Nov 2024 00:00:00 GMT