Efficiency to Exploits: Vulnerabilities in Ollama and Vanna AI Automation
- Aastha Thakker
- Oct 28, 2025
- 6 min read
Remember how we learned to use our own AI tools on our computers? Well, there can be bumps in the road! (Local AI is deployed locally instead of on any cloud servers)

Ollama
Cloud security firm Wiz has identified a vulnerability called ‘Probllama’, which is tracked as CVE-2024–37032. This security issue was responsibly disclosed to Ollama’s maintainers and has since been mitigated. Ollama users are encouraged to upgrade their Ollama installation to version 0.1.34 or newer.
Organizations are rushing to use amazing new AI tools, but these tools are still under development. This means they might not have all the safety features yet, like authentication. Because the code is new, it’s easier to find serious vulnerabilities, making them targets for cyberattacks.
Several remote code execution (RCE) vulnerabilities (security flaw that allows hackers to run any code on a targeted computer or server) were found in inference servers like TorchServe, Ray Anyscale, and Ollama. These vulnerabilities could let attackers take control of AI inference servers (Servers that process data and make predictions based on AI models), steal or change AI models, and compromise AI applications.
While vulnerabilities are concerning, the bigger issue is the lack of built-in authentication in these new AI tools. Exposed online, they become easy targets for hackers. Intruders could steal or manipulate AI models or even execute malicious code. Because they lack authentication, these tools should never be exposed online without protective middleware (Software that acts as a bridge between different systems or applications), like using a reverse proxy (A server that forwards client requests to another server, adding a layer of security and protection) with authentication.
Ollama consists of two parts: client and server
1) Server: The server handles the backend operations and exposes several APIs (Application Programming Interfaces). These APIs perform essential functions like:
Pulling a Model from the Registry: This means retrieving the necessary AI model from a central repository where models are stored.
Generating Predictions: Using the retrieved model, the server can generate predictions or responses based on the input (prompt) provided by the user.
Essentially, the server is responsible for all the heavy lifting in terms of processing and data management.
2) Client: The client is the front-end component that users interact with directly. One example of a client interface is a CLI (Command-Line Interface), where users can type commands to interact with the server. The client sends requests to the server and displays the results to the user, making it the user-friendly part of the system.

During their experiments with Ollama, the team discovered a critical security vulnerability in one of its servers. This vulnerability was because insufficient input validation, which opened the door for attackers to exploit a path traversal attack, overwrite arbitrary files and ultimately lead to RCE.
Understanding the Problem:
A vulnerability in the Ollama API server lets attackers exploit the “/api/pull” endpoint by sending specially crafted HTTP requests. This endpoint, used to download models, can be tricked into accepting a malicious model file. This file can:
Corrupt important files on the system.
Allow remote code execution by altering a system file to load harmful code before any program runs.
The risk is low in default Linux setups because the server only accepts local connections. However, in Docker deployments, the server runs with high privileges and is exposed to the internet, making remote exploitation possible. (In Docker installations, this issue becomes highly critical because the server operates with root privileges and defaults to listening on address 0.0.0.0.). The lack of authentication in Ollama makes it easier for attackers to access and tamper with AI models on publicly accessible servers.
Ollama doesn’t have built-in authentication, so it can’t verify who is accessing it by default. To secure it, it’s recommended to use a reverse proxy. A reverse proxy sits between the user and Ollama, making sure only authorized users can access the server by enforcing authentication.
Potential Impact of a Successful Exploit of CVE-2024–37032:
Data Breaches: Unauthorized access could lead to serious data breaches, including Sensitive AI research findings, Intellectual property of organizations or researchers, Personal information of users interacting with Ollama.
System Compromise: Exploiting the vulnerability to run malicious code could severely compromise the system: Disrupt critical AI development projects, Cause data loss or corruption, Use the compromised system to launch further attacks within the network.
Reputational Damage: Organizations and researchers using Ollama could suffer reputational harm if a breach occurs, damaging trust in their AI projects and hindering future collaborations.
Lateral Movement: The compromised Ollama instance could be a gateway for further attacks within the network.
Cryptojacking: Attackers might use the compromised system for cryptocurrency mining, exploiting its resources for their gain.
Securing Your Ollama Deployment

Vanna AI
Vanna AI makes it easy to ask questions of your data in plain English. It then translates your questions into SQL queries and uses charts to show you the answers.
During examination of libraries and frameworks utilizing LLMs for user-facing applications, Vanna.AI library, JFrog identified CVE-2024–5565, a remote code execution vulnerability caused by prompt injection techniques. CVE-2024–5565, assigned a CVSS score of 8.1, involves a prompt injection vulnerability in the ‘ask’ function — a primary API endpoint used for generating SQL queries to run on a database. This flaw can be manipulated to execute arbitrary commands, as highlighted by supply chain security firm JFrog.
LLMs struggle to differentiate between user input and their pre-defined guidelines. This makes them susceptible to manipulation through prompt injection attacks. As a result, developers should not solely rely on pre-prompting as a defense mechanism. More robust security measures are needed.
Unlike traditional systems with separate control and data planes, LLMs treat everything as input. This includes pre-defined instructions, making them vulnerable to user input that can bend or break those instructions and alter the intended context. This means someone could type something tricky that messes with the instructions and makes the LLM misunderstand what it’s supposed to do. It’s like hacking the system with clever wording!
Large language models (LLMs) often rely on guardrails (rulesets), programmed limitations to prevent outputs that are offensive, harmful, or illegal. However, a new technique called Skeleton Key exploits these guardrails. [Unlike simpler multi-turn attacks (Crescendo)] Skeleton Key employs a multi-step strategy to completely bypass the model’s safety measures. This essentially tricks the model into ignoring its safeguards, leaving it vulnerable to any request regardless of its ethical or safety implications.
Crescendo:
This is a multi-turn attack where the attacker gradually steers the conversation over multiple prompts towards a prohibited goal.
Each prompt builds on the previous one, slowly chipping away at the model’s safeguards. It doesn’t achieve a complete bypass.
The attacker aims to achieve a specific outcome, like getting the model to generate harmful content, through a series of related prompts.
Skeleton Key:
This is a multi-step strategy designed to completely bypass the model’s safety measures in one go.
Once successful, the model becomes vulnerable to any request, regardless of its nature.
The attacker aims to trick the model into ignoring its safeguards altogether, opening it up to a wide range of malicious uses.
AI jailbreaks can be achieved indirectly through malicious data in emails or documents, or directly through multi-turn conversations that manipulate the model (like Crescendo).
Vanna AI is a Python library that leverages large language models (LLMs) to bridge the gap between natural language and database interaction. The primary purpose of Vanna AI is to facilitate accurate text-to-SQL conversion, making it easier for users to interact with databases without needing extensive knowledge of SQL.
The core technology behind Vanna AI extends the functionality of LLM using Retrieval-Augmented Generation (RAG) techniques (for understanding this we have to read more, so just remember the name right now) which allows it to produce accurate SQL statements.
We found a library that lets you ask questions in plain English and get answers from a database. It seemed cool, but there could be a security risk where hackers might trick the system into doing bad things.
The surprising part? The issue wasn’t with the main feature, but how it showed the answers as charts. After executing the SQL query, the Vanna library can graphically present the results as charts using Plotly, a Python-based graphical library.
The Plotly code is not static but is generated dynamically via LLM prompting and code evaluation. This eventually allowed the firm to achieve full RCE using a smart prompt with pre-defined constraints.
Impact of the Prompt Injection Vulnerability in Vanna AI Library
Remote Code Execution (RCE): Attackers can execute arbitrary code on the server, leading to potential full system compromise.
Data Breach: Unauthorized access to sensitive data stored or processed by the affected system, leading to potential data leaks.
System Downtime: Exploits can cause system crashes or make services unavailable, leading to operational disruptions.
Financial Loss: Costs associated with incident response, system restoration, and potential legal fines due to data breaches.
Reputation Damage: Loss of trust from customers and stakeholders due to compromised security, potentially affecting business relationships and market position.
Real-Time Suggestions to Protect Against Prompt Injection Vulnerabilities




Comments