Class
Assault Situation
Steering
Immediate Assaults: Crafting adversarial prompts that permit an adversary to affect the conduct of the mannequin, and therefore the output in ways in which weren’t meant by the applying.
Immediate injections which are invisible to victims and alter the state of the sufferer’s account or or any of their belongings.
In Scope
Immediate injections into any instruments through which the response is used to make choices that instantly have an effect on sufferer customers.
In Scope
Immediate or preamble extraction through which a person is ready to extract the preliminary immediate used to prime the mannequin solely when delicate info is current within the extracted preamble.
In Scope
Utilizing a product to generate violative, deceptive, or factually incorrect content material in your personal session: e.g. ‘jailbreaks’. This consists of ‘hallucinations’ and factually inaccurate responses. Google’s generative AI merchandise have already got a devoted reporting channel for a majority of these content material points.
Out of Scope
Coaching Information Extraction: Assaults which are capable of efficiently reconstruct verbatim coaching examples that comprise delicate info. Additionally referred to as membership inference.
Coaching knowledge extraction that reconstructs objects used within the coaching knowledge set that leak delicate, private info.
In Scope
Extraction that reconstructs nonsensitive/public info.
Out of Scope
Manipulating Fashions: An attacker capable of covertly change the conduct of a mannequin such that they’ll set off pre-defined adversarial behaviors.
Adversarial output or conduct that an attacker can reliably set off by way of particular enter in a mannequin owned and operated by Google (“backdoors”). Solely in-scope when a mannequin’s output is used to alter the state of a sufferer’s account or knowledge.
In Scope
Assaults through which an attacker manipulates the coaching knowledge of the mannequin to affect the mannequin’s output in a sufferer’s session in response to the attacker’s choice. Solely in-scope when a mannequin’s output is used to alter the state of a sufferer’s account or knowledge.
In Scope
Adversarial Perturbation: Inputs which are offered to a mannequin that ends in a deterministic, however extremely surprising output from the mannequin.
Contexts through which an adversary can reliably set off a misclassification in a safety management that may be abused for malicious use or adversarial achieve.
In Scope
Contexts through which a mannequin’s incorrect output or classification doesn’t pose a compelling assault situation or possible path to Google or person hurt.
Out of Scope
Mannequin Theft / Exfiltration: AI fashions usually embody delicate mental property, so we place a excessive precedence on defending these belongings. Exfiltration assaults permit attackers to steal particulars a few mannequin akin to its structure or weights.
Assaults through which the precise structure or weights of a confidential/proprietary mannequin are extracted.
In Scope
Assaults through which the structure and weights usually are not extracted exactly, or once they’re extracted from a non-confidential mannequin.
Out of Scope
If you happen to discover a flaw in an AI-powered instrument apart from what’s listed above, you possibly can nonetheless submit, offered that it meets the {qualifications} listed on our program web page.
A bug or conduct that clearly meets our {qualifications} for a sound safety or abuse subject.
In Scope
Utilizing an AI product to do one thing doubtlessly dangerous that’s already attainable with different instruments. For instance, discovering a vulnerability in open supply software program (already attainable utilizing publicly-available static evaluation instruments) and producing the reply to a dangerous query when the reply is already accessible on-line.
Out of Scope
As in line with our program, points that we already learn about usually are not eligible for reward.
Out of Scope
Potential copyright points: findings through which merchandise return content material showing to be copyright-protected. Google’s generative AI merchandise have already got a devoted reporting channel for a majority of these content material points.
Out of Scope