Research

New Research Finds AI Models Can Be Poisoned With Few Samples

A joint study by Anthropic and the UK AI Safety Institute found just 250 malicious documents can backdoor models regardless of their size.

Olivia Sharp 1 min read 732 views
Free
A study by Anthropic and the UK AI Safety Institute found that just 250 poisoned documents can create a backdoor in large language models, regardless of model size.

A joint study by the AI company Anthropic, the UK's AI Safety Institute (AISI), and the Alan Turing Institute has revealed that large language models (LLMs) are significantly more vulnerable to "data poisoning" attacks than previously understood. The research found that a "near-constant number" of malicious documents can successfully insert a backdoor into models of any size.

A Scalable Vulnerability

The study's key finding is that as few as 250 poisoned documents can reliably compromise models ranging from 600 million to 13 billion parameters. This directly contradicts the prevailing assumption that an attacker would need to poison a significant …

Archive Access

This article is older than 24 hours. Create a free account to access our 7-day archive.

Share this article

Related Articles