New Research Finds AI Models Can Be Poisoned With Few Samples
A joint study by Anthropic and the UK AI Safety Institute found just 250 malicious documents can backdoor models regardless of their size.
A joint study by the AI company Anthropic, the UK's AI Safety Institute (AISI), and the Alan Turing Institute has revealed that large language models (LLMs) are significantly more vulnerable to "data poisoning" attacks than previously understood. The research found that a "near-constant number" of malicious documents can successfully insert a backdoor into models of any size.
A Scalable Vulnerability
The study's key finding is that as few as 250 poisoned documents can reliably compromise models ranging from 600 million to 13 billion parameters. This directly contradicts the prevailing assumption that an attacker would need to poison a significant …
Archive Access
This article is older than 24 hours. Create a free account to access our 7-day archive.