Article

2024-11-27 Science & Technology

Israeli researchers discover method to hack AI, force it to reveal sensitive information

[Ynet] Researchers from the Israeli cybersecurity company Knostic have unveiled a groundbreaking method to exploit large language models (LLMs), such as ChatGPT, by leveraging what they describe as an "impulsiveness" characteristic in AI.

Dubbed flowbreaking, this attack bypasses safety mechanisms to coax the AI into revealing restricted information or providing harmful guidance — responses it was programmed to withhold.

Israeli researchers discover method to hack AI, force it to reveal sensitive information
Researchers at cybersecurity firm Knostic have developed a method to bypass safeguards in large language models like ChatGPT, extracting sensitive information such as salaries, private communications and trade secrets

Researchers from the Israeli cybersecurity company Knostic have unveiled a groundbreaking method to exploit large language models (LLMs), such as ChatGPT, by leveraging what they describe as an "impulsiveness" characteristic in AI.

Dubbed flowbreaking, this attack bypasses safety mechanisms to coax the AI into revealing restricted information or providing harmful guidance — responses it was programmed to withhold.
The findings, published Tuesday, detail how the attack manipulates AI systems into prematurely generating and displaying responses before their safety protocols can intervene. These responses —ranging from sensitive data such as a boss's salary to harmful instructions — are then momentarily visible on the user’s screen before being deleted by the AI’s safety systems. However, tech-savvy users who record their interactions can still access the fleetingly exposed information.

HOW THE ATTACK WORKS
Unlike older methods such as jailbreaking, which relied on linguistic tricks to bypass safeguards, flowbreaking targets internal components of LLMs, exploiting gaps in the interaction between those components.

Knostic researchers identified two primary vulnerabilities enabled by this method:

Second Thoughts: AI models sometimes stream answers to users before safety mechanisms fully evaluate the content. In this scenario, a response is displayed and quickly erased, but not before the user sees it.

Stop and Roll: By halting the AI mid-response, users can force the system to display partially generated answers that have bypassed safety checks.

"LLMs operate in real-time, which inherently limits their ability to ensure airtight security," said Gadi Evron, CEO and co-founder of Knostic. "This is why layered, context-aware security is critical, especially in enterprise environments."

IMPLICATIONS FOR AI SECURITY
Knostic’s findings have far-reaching implications for the safe deployment of AI systems in industries such as finance, health care, and technology. The company warns that, without stringent safeguards, even well-intentioned AI implementations like Microsoft Copilot and Glean could inadvertently expose sensitive data or create other vulnerabilities.

Evron emphasized the importance of "need-to-know" identity-based safeguards and robust interaction monitoring. "AI safety isn’t just about blocking bad actors. It’s about ensuring these systems align with the organization’s operational context," he said.

Posted by Grom the Reflective 2024-11-27 00:00|| || Front Page|| [11127 views ] Top

#1 "This is why layered, context-aware security is critical, especially in enterprise environments."

Welcome to the '80s.

Posted by Skidmark 2024-11-27 07:59|| 2024-11-27 07:59|| Front Page Top

#2 'Artificial Intelligence' Is Plotting a 'Cyber-Fascist Coup'

Posted by Skidmark 2024-11-27 09:38|| 2024-11-27 09:38|| Front Page Top

#3 pondering AI,
I spy ~~the moon~~ my face in pool of
crude petroleum

Posted by Pancho Poodle8452 2024-11-27 17:05|| 2024-11-27 17:05|| Front Page Top

00:43 Skidmark
00:24 Skidmark
00:19 EMS Artifact
00:06 Rambler
00:03 Rambler