Agent Zero + Venice + Morpheus: A Walkthrough
How to set up Agent Zero with Venice AI for inference and Morpheus for decentralised compute. A fully sovereign AI agent stack, step by step.
What we are building
Agent Zero is an open-source AI agent framework. Venice is a privacy-focused AI inferenceInferenceRunning a trained AI model to produce an answer. Inference is what happens when you type a prompt into ChatGPT and get a response. The model takes your input, computes a best guess, and returns it.Like asking an expert for their opinion. The training was the decades they spent becoming an expert. The inference is the 30 seconds it takes them to answer your specific question.Read more → provider. Morpheus provides decentralised compute infrastructure. Combining them gives you an autonomous AI agent running on infrastructure you control, with no centralised intermediary seeing your prompts or data.
This is what a sovereign AI agent stack looks like in practice.
Prerequisites
- A VPS or local machine with Docker installed
- A Venice AI APIAPIApplication Programming Interface. A structured way for one piece of software to talk to another. In DeAI, APIs let applications request inference from a model without running the model themselves.Like a waiter in a restaurant. You don't walk into the kitchen and cook your own meal. You tell the waiter what you want, they tell the kitchen, the kitchen cooks it, and the waiter brings it back. The API is the waiter.Read more → key (from venice.ai)
- Basic terminal and Docker familiarity
- Approximately 30 minutes
I run this on both a RackNerd VPS in Texas and locally on my Mac Studio. The VPS gives me 24/7 uptime for persistent tasks. Local gives me faster iteration and full sovereignty over the inference layer when I point it at Ollama. You can start with either, or run both.
Step 1: Set up the VPS (if using one)
If running locally, skip to Step 2.
# SSH into your VPS
ssh root@your-vps-ip
# Install Docker if not present
curl -fsSL https://get.docker.com | sh
# Verify
docker --version
A basic VPS with 4GB RAM is sufficient for Agent Zero. You do not need GPUGPUGraphics Processing Unit. Originally designed to render video game graphics, GPUs turned out to be exceptionally good at the massively parallel math that AI models need. Modern AI training and inference runs almost entirely on GPUs.Like a factory with 10,000 workers doing the same simple task in parallel, versus a CPU which is more like 10 workers each doing different complex tasks. AI training involves doing simple math a million times per second on a million numbers, which is exactly what the GPU factory is designed for.Read more →. The inference happens on Venice’s API or your local Ollama instance; the agent itself is lightweight.
Step 2: Clone and configure Agent Zero
# Clone the repository
git clone https://github.com/frdel/agent-zero.git
cd agent-zero
# Copy the example environment file
cp example.env .env
Edit the .env file to configure your API endpoints:
# Open the config
nano .env
Agent Zero uses four separate modelModelA trained neural network that takes inputs (text, images, audio) and produces outputs (more text, classifications, generated content). In DeAI the model is the thing that actually does the work.Like a very experienced apprentice who has spent years watching thousands of masters make furniture. They can't explain how they know when a joint is right, but they can make a chair that looks and functions like a Chippendale. The training is invisible. The output is what matters.Read more → roles. Each can point at a different provider and model, which gives you flexibility to balance cost, speed, and capability.
| Role | What it does | Recommended model |
|---|---|---|
| Chat model | Main reasoning. The brain that handles your tasks, plans steps, writes code | Largest model you can afford (llama-3.3-70b on Venice, or GPT-4o) |
| Utility model | Background tasks: summarisation, formatting, tool output parsing | Smaller/cheaper model is fine (llama-3.3-70b or a 7B model locally) |
| Browser model | Reads and interprets web pages when the agent browses | Needs decent comprehension (llama-3.3-70b works well) |
| EmbeddingEmbeddingA numerical representation of a word, sentence, or image as a list of numbers (a vector) that captures its meaning. Similar things have similar embeddings, which makes them useful for search, clustering, and recommendation.Like a map where every word has GPS coordinates. Words with similar meanings end up close together on the map. "Cat" and "kitten" are nearby. "Cat" and "thunderstorm" are far apart. The map is the embedding space.Read more → model | Converts text to vectors for memory retrieval and RAG | Dedicated embedding model (text-embedding-bge-m3 on Venice, or nomic-embed-text locally) |
Set these in the .env file:
# === Venice AI (privacy-focused, uncensored) ===
CHAT_API_BASE=https://api.venice.ai/api/v1
CHAT_API_KEY=your-venice-api-key
CHAT_MODEL=llama-3.3-70b
UTILITY_API_BASE=https://api.venice.ai/api/v1
UTILITY_API_KEY=your-venice-api-key
UTILITY_MODEL=llama-3.3-70b
BROWSER_API_BASE=https://api.venice.ai/api/v1
BROWSER_API_KEY=your-venice-api-key
BROWSER_MODEL=llama-3.3-70b
EMBEDDING_API_BASE=https://api.venice.ai/api/v1
EMBEDDING_API_KEY=your-venice-api-key
EMBEDDING_MODEL=text-embedding-bge-m3
You do not have to use the same provider for every role. A common setup is Venice for chat (quality matters) and local Ollama for utility and embeddings (saves API credits):
# === Hybrid: Venice for chat, local Ollama for the rest ===
CHAT_API_BASE=https://api.venice.ai/api/v1
CHAT_API_KEY=your-venice-api-key
CHAT_MODEL=llama-3.3-70b
UTILITY_API_BASE=http://host.docker.internal:11434/v1
UTILITY_API_KEY=not-needed
UTILITY_MODEL=mistral
BROWSER_API_BASE=http://host.docker.internal:11434/v1
BROWSER_API_KEY=not-needed
BROWSER_MODEL=mistral
EMBEDDING_API_BASE=http://host.docker.internal:11434/v1
EMBEDDING_API_KEY=not-needed
EMBEDDING_MODEL=nomic-embed-text
For a fully local setup, point all four roles at Ollama. For a fully sovereign setup on Venice, use Venice for all four. Your prompts are anonymised through their proxy and nothing is stored. See our Venice review for the full privacy model assessment, including the distinction between anonymisation and confidentiality.
The embedding model is the one people most often misconfigure. It must be an embedding model, not a chat model. Venice offers text-embedding-bge-m3. For Ollama, pull nomic-embed-text with ollama pull nomic-embed-text.
Step 3: Run Agent Zero with Docker
# Build and start the container
docker compose up -d
# Check it is running
docker ps
# View logs
docker logs agent-zero -f
Agent Zero exposes a web interface on port 50001 by default. Access it at http://your-vps-ip:50001 or http://localhost:50001 if running locally.
Step 4: Test the agent
Open the web interface and give the agent a task:
Research the current MOR token price and calculate my daily earnings
if I have 50 stETH staked in the Morpheus capital contract.
The agent will:
- Search for current MOR price data
- Look up total stETH staked in Morpheus
- Calculate the proportional daily emissionsEmissionsNew tokens created and distributed by a blockchain protocol over time as rewards to validators, stakers, or miners. Emissions fund network security and participation at the cost of diluting existing holders.Like a company that pays employees partly in newly printed shares. Every year the total number of shares goes up, which means existing shareholders own a slightly smaller slice of the same company unless the company grows faster than the printing.Read more →
- Return a formatted answer
This is a simple example. Agent Zero can handle multi-step tasks including web research, file operations, code execution and API calls. The key difference from a centralised agent: your prompts go through Venice (encrypted, no logging) or your own Ollama instance, not through OpenAI or Anthropic.
Step 5: Connect to Morpheus compute (optional)
Instead of using Venice’s hosted API, you can route inference through the Morpheus compute network. This means your agent’s inference requests are served by decentralised compute providers earning MOR tokens. See our Morpheus Lumerin Node tutorial if you want to run the other side of this, providing compute rather than consuming it.
The Morpheus compute endpoint works as an OpenAI-compatible API. Update your .env:
CHAT_API_BASE=https://compute.mor.org/v1
CHAT_API_KEY=your-morpheus-api-key
CHAT_MODEL=llama-3.1-70b
This adds latency compared to Venice or local Ollama. The trade-off is that you are using genuinely decentralised infrastructure and contributing to network demand that drives MOR tokenTokenA digital unit of value or access rights tracked on a blockchain. Tokens can represent ownership in a project, a right to use a service, a share of future revenue, or simply a tradable asset with no underlying claim.Like a physical poker chip a casino issues. The chip itself has no value. What makes it worth something is what it lets you do at the casino, what the casino has promised, and how much other people will pay you for it.Read more → value.
Step 6: Make it persistent
For a VPS deployment, ensure the agent restarts automatically:
# Docker compose already handles restart policy
# Verify in docker-compose.yml:
# restart: unless-stopped
# To update Agent Zero later
cd agent-zero
git pull
docker compose down
docker compose up -d --build
The three inference options compared
| Option | Privacy | Speed | Cost | Sovereignty |
|---|---|---|---|---|
| Local Ollama | Full | Fast | Free (after hardware) | Complete |
| Venice AI | High (encrypted, no logging) | Fast | Per-token pricing | High |
| Morpheus compute | High | Moderate | MOR per request | Complete |
I use all three depending on the task. Local Ollama on my Mac Studio for sensitive work and rapid iteration. The promptPromptThe text you give an AI model to tell it what to generate. A prompt can be a simple question, a long instruction, a chunk of context plus a task, or a conversation history the model uses to produce its response.Like a brief you give to a junior designer. A vague brief gets a vague result. A detailed brief with context, constraints, and examples gets something usable. The quality of the output depends heavily on the quality of the brief.Read more → never leaves my machine. Venice for tasks that need larger models than my hardware supports, or when I want uncensored output. Morpheus compute when I want to test the network and contribute to demand. Running Agent Zero on both a VPS and locally means I can keep persistent agents running remotely while experimenting freely on my local instance.
What the agent can do
With the sovereign stack running, your Agent Zero instance can:
- Research topics and synthesise information from the web
- Execute code (Python, shell) to process data
- Interact with files on the host system
- Call external APIs on your behalf
- Chain multiple steps together autonomously
- Operate 24/7 without supervision
What it cannot do yet: interact with DeFiDeFiDecentralised Finance. Financial services like lending, trading, and yield farming built on smart contracts instead of traditional banks or brokerages. DeFi protocols are usually permissionless and global.Like a vending machine that can give you a loan, swap your currencies, or invest your savings. Nobody is behind the counter, the rules are written into the machine itself, and anyone with money in the right format can use it.Read more → protocols directly, manage wallets or execute on-chain transactions. These capabilities are on the Morpheus roadmap but are not production-ready in Agent Zero today. If someone tells you their agent is autonomously trading on your behalf via decentralised infrastructure, verify that claim carefully.
Troubleshooting
Agent not responding. Check Docker logs: docker logs agent-zero. The most common issue is an incorrect API key or unreachable API endpoint.
Slow responses. If using Morpheus compute, expect higher latency than centralised APIs. If using local Ollama, check that your model fits in available memory.
Container not starting. Ensure Docker is running and the port is not in use: docker ps and lsof -i :50001.
Venice API errors. Verify your API key is active and has credits. Check Venice’s status page for outages.