OpenAI flags prompt injection as persistent risk for AI browsers

OpenAI said this method reflects a common AI safety testing practice, in which agents are built to identify edge cases and stress-test systems rapidly in simulation.

By Storyboard18| Dec 23, 2025 11:02 AM

OpenAI flags prompt injection as persistent risk for AI browsers

OpenAI said this method reflects a common AI safety testing practice, in which agents are built to identify edge cases and stress-test systems rapidly in simulation.

OpenAI has said that AI-powered browsers are likely to remain vulnerable to prompt injection attacks, even as it works to strengthen the security of its Atlas AI browser, raising broader concerns about how safely AI agents can operate on the open web.

Prompt injection attacks, which manipulate AI agents into following malicious instructions hidden in webpages or emails, are not expected to be fully eliminated, OpenAI stated in a blog post published on Monday. The company informed that such attacks are comparable to scams and social engineering on the internet and acknowledged that the introduction of “agent mode” in ChatGPT Atlas has expanded the overall security threat surface.

OpenAI launched the ChatGPT Atlas browser in October, after which security researchers quickly demonstrated that simple text embedded in platforms such as Google Docs could alter the behaviour of the underlying browser. On the same day, Brave published a blog post stating that indirect prompt injection represents a systemic challenge for AI-powered browsers, including Perplexity’s Comet.

OpenAI is not alone in its assessment. Earlier this month, the UK’s National Cyber Security Centre warned that prompt injection attacks targeting generative AI applications may never be completely mitigated and could expose websites to data breaches. The agency advised cyber security professionals to focus on reducing the risk and impact of such attacks rather than assuming they can be entirely prevented.

As reported by TechCrunch, OpenAI said it views prompt injection as a long-term AI security challenge that will require continuous strengthening of defences. The company stated that its approach centres on a proactive and rapid-response cycle, which it said has shown early promise in identifying new attack strategies internally before they are exploited in real-world environments.

This approach broadly aligns with positions taken by rivals such as Anthropic and Google, which have also stated that defences against prompt-based attacks must be layered and continuously stress-tested. Google’s recent efforts have focused on architectural and policy-level controls for agentic systems.

Where OpenAI differs is in its use of what it described as an “LLM-based automated attacker”. The company informed that it has trained a bot using reinforcement learning to simulate the behaviour of a hacker attempting to inject malicious instructions into an AI agent.

The bot is able to test attacks in a simulated environment before deploying them, with the simulator modelling how the target AI would reason and respond. OpenAI stated that the bot can analyse these responses, refine the attack and repeat the process multiple times. Because this system has visibility into the target AI’s internal reasoning, the company believes it can uncover vulnerabilities more quickly than external attackers.

OpenAI said this method reflects a common AI safety testing practice, in which agents are built to identify edge cases and stress-test systems rapidly in simulation. The company informed that its reinforcement learning-trained attacker has been able to steer agents into executing complex and harmful workflows over extended sequences of actions and has uncovered new attack strategies not identified through human red-teaming exercises or external reports.

While acknowledging that prompt injection cannot be fully secured against, OpenAI stated that it is relying on large-scale testing and faster patch cycles to strengthen its systems before vulnerabilities are exploited in the real world. The company stating that protecting Atlas users from prompt injection is a top priority.