A recent report reveals that AI systems gradually forget their safety protocols during long interactions, increasing the risk of harmful or inappropriate responses. Researchers found that a few simple prompts can break through most artificial intelligence guardrails.
Cisco Tests Chatbots Across Multiple Companies
Cisco analyzed large language models from OpenAI, Mistral, Meta, Google, Alibaba, Deepseek, and Microsoft. The team conducted 499 conversations using “multi-turn attacks,” in which users repeatedly questioned AI chatbots to bypass safety filters. Each dialogue included five to ten exchanges.
The researchers tracked how many prompts caused chatbots to reveal unsafe or illegal details, including private corporate data or misinformation. On average, chatbots gave malicious information in 64 percent of multi-question conversations but only 13 percent of single-question ones. Mistral’s Large Instruct model reached a 93 percent success rate, while Google’s Gemma stayed near 26 percent.
Open Models Shift Safety Responsibility
Cisco warned that multi-turn attacks could spread harmful content or let hackers steal confidential information. The study observed that AI systems often fail to apply safety guidelines consistently in longer chats, allowing attackers to refine their requests and bypass controls.
Mistral, along with Meta, Google, OpenAI, and Microsoft, uses open-weight models that reveal safety parameters to the public. Cisco reported that these open systems typically include fewer built-in safety features, leaving users responsible for maintaining protection when customizing models.
Cisco added that Google, Meta, Microsoft, and OpenAI claim to strengthen defenses against malicious fine-tuning. Despite these assurances, AI firms still face criticism for weak safety systems that enable criminal misuse. In one case, Anthropic confirmed that criminals exploited its Claude model to conduct large-scale data theft and extortion, demanding ransoms exceeding $500,000 (€433,000).

