Background

Machine learning (ML) components are increasingly being deployed in critical applications, form computer vision to cybersecurity. However, the data-driven nature of ML introduces new security challenges compared to traditional knowledge-based AI systems. Adversaries can exploit vulnerabilities in ML models through a variety of adversarial attacks, posing significant risks to the integrity, availability, and confidentiality of these systems.

Key Attack Types

The taxonomy of adversarial machine learning (AML) attacks identifies several important attack types, Data Access Attacks, Poisoning Attacks, Evasion Attacks, Oracle Attacks and Membership Inference Attacks. These adversarial attacks on machine learning models can affect the both training and testing stages.

Training Stage

Data Access Attacks

Adversaries can gain unauthorized access to the training data and manipulate the data to create a substitute model, which can be used to test and develop effective adversarial examples to attack the original target model.

Poisoning Attacks

Poisoning attacks, also know as causative attacks, are a type of adversarial attack against machine learning systems where the adversary alters the training data or model directly or indirectly to degrade the performance of the target model. This can be done through indirect poisoning, where the adversary poisons the data before pre-processing, or direct poisoning, which involves data injection, data manipulation (label or input), or logic corruption. Poisoning attacks target the training phase of the machine learning pipeline, aiming to compromise the integrity of the model by exploiting vulnerabilities in the learning process.

Testing Stage

Evasion Attacks

Evasion attacks are adversarial attacks that occur during the testing phase of a machine learning system, where inputs are manipulated to evade correct classification by the model. The attacker uses optimization techniques to find small perturbations that cause significant misclassification. Common algorithms for these attacks include L-BFGS, FGSM, and JSMA, which require knowledge of the target or substitute model to compute gradients.

Oracle Attacks

In Oracle Attacks, an adversary uses an API to observe model inputs and outputs, allowing them to train a substitute model similar to the target model. This substitute model can then be used to generate adversarial examples for Evasion Attacks. Oracle Attacks include Extraction Attacks, which extract model parameters, and Inversion Attacks, which reconstruct training data, potentially compromising privacy.

Membership Inference Attacks

These attacks involve determining whether a particular data point was used to train the machine learning model.

Defense Mechanisms

To mitigate the risks posed by adversarial attacks, researchers have developed several defense mechanisms, Data Encryption and Sanitization, Robust Statistics, Adversarial Training, Gradient Masking, and Differential Privacy. These defense mechanisms aim to improve the overall security and assurance of ML components, making them more robust, resilient, and secure against a range of adversarial attacks.

Defense Mechanisms	Instructions
Data Encryption and Sanitization	Encrypting training data and sanitizing it to remove potential malicious samples can help protect against data access and poisoning attacks.
Robust Statistics	Employing robust statistical techniques during model training can improve the model’s resilience to noisy or adversarial training data.
Adversarial Training	Incorporating adversarial examples into the training process can improve the model’s robustness to evasion attacks.
Gradient Masking	Obfuscating the gradients used in the optimization process can make it more difficult for adversaries to craft effective adversarial examples.
Differential Privacy	Applying differential privacy techniques can help protect the privacy of the training data and mitigate membership inference attacks.

Consequences of Adversarial Attacks

The consequences of successful adversarial attacks on ML models can be severe, leading to integrity, availability, and confidentiality violations:

Integrity Violations: Adversarial attacks can cause ML models to misclassify inputs, leading to reduced confidence in the model’s outputs or targeted misclassifications.
Availability Violations: Adversarial attacks can disrupt the normal operation of ML models, rendering them unusable or unavailable for their intended purposes.
Confidentiality Violations: Adversarial attacks can lead to the extraction of sensitive information about the target ML model, such as its architecture, parameters, or training data, potentially resulting in privacy breaches.
These consequences can have significant impacts on the applications and systems that rely on the security and trustworthiness of ML components, underscoring the importance of developing robust defense mechanisms to mitigate the risks of adversarial attacks.