Agent Zero + Venice + Morpheus: A Walkthrough

What we are building

Agent Zero is an open-source AI agent framework. Venice is a privacy-focused AI inference provider. Morpheus provides decentralised compute infrastructure. Combining them gives you an autonomous AI agent running on infrastructure you control, with no centralised intermediary seeing your prompts or data.

This is what a sovereign AI agent stack looks like in practice.

Prerequisites

A VPS or local machine with Docker installed
A Venice AI API key (from venice.ai)
Basic terminal and Docker familiarity
Approximately 30 minutes

I run this on both a RackNerd VPS in Texas and locally on my Mac Studio. The VPS gives me 24/7 uptime for persistent tasks. Local gives me faster iteration and full sovereignty over the inference layer when I point it at Ollama. You can start with either, or run both.

Step 1: Set up the VPS (if using one)

If running locally, skip to Step 2.

# SSH into your VPS
ssh root@your-vps-ip

# Install Docker if not present
curl -fsSL https://get.docker.com | sh

# Verify
docker --version

A basic VPS with 4GB RAM is sufficient for Agent Zero. You do not need GPU. The inference happens on Venice’s API or your local Ollama instance; the agent itself is lightweight.

Step 2: Clone and configure Agent Zero

# Clone the repository
git clone https://github.com/frdel/agent-zero.git
cd agent-zero

# Copy the example environment file
cp example.env .env

Edit the .env file to configure your API endpoints:

# Open the config
nano .env

Agent Zero uses four separate model roles. Each can point at a different provider and model, which gives you flexibility to balance cost, speed, and capability.

Role	What it does	Recommended model
Chat model	Main reasoning. The brain that handles your tasks, plans steps, writes code	Largest model you can afford (llama-3.3-70b on Venice, or GPT-4o)
Utility model	Background tasks: summarisation, formatting, tool output parsing	Smaller/cheaper model is fine (llama-3.3-70b or a 7B model locally)
Browser model	Reads and interprets web pages when the agent browses	Needs decent comprehension (llama-3.3-70b works well)
Embedding model	Converts text to vectors for memory retrieval and RAG	Dedicated embedding model (text-embedding-bge-m3 on Venice, or nomic-embed-text locally)

Set these in the .env file:

# === Venice AI (privacy-focused, uncensored) ===
CHAT_API_BASE=https://api.venice.ai/api/v1
CHAT_API_KEY=your-venice-api-key
CHAT_MODEL=llama-3.3-70b

UTILITY_API_BASE=https://api.venice.ai/api/v1
UTILITY_API_KEY=your-venice-api-key
UTILITY_MODEL=llama-3.3-70b

BROWSER_API_BASE=https://api.venice.ai/api/v1
BROWSER_API_KEY=your-venice-api-key
BROWSER_MODEL=llama-3.3-70b

EMBEDDING_API_BASE=https://api.venice.ai/api/v1
EMBEDDING_API_KEY=your-venice-api-key
EMBEDDING_MODEL=text-embedding-bge-m3

You do not have to use the same provider for every role. A common setup is Venice for chat (quality matters) and local Ollama for utility and embeddings (saves API credits):

# === Hybrid: Venice for chat, local Ollama for the rest ===
CHAT_API_BASE=https://api.venice.ai/api/v1
CHAT_API_KEY=your-venice-api-key
CHAT_MODEL=llama-3.3-70b

UTILITY_API_BASE=http://host.docker.internal:11434/v1
UTILITY_API_KEY=not-needed
UTILITY_MODEL=mistral

BROWSER_API_BASE=http://host.docker.internal:11434/v1
BROWSER_API_KEY=not-needed
BROWSER_MODEL=mistral

EMBEDDING_API_BASE=http://host.docker.internal:11434/v1
EMBEDDING_API_KEY=not-needed
EMBEDDING_MODEL=nomic-embed-text

For a fully local setup, point all four roles at Ollama. For a fully sovereign setup on Venice, use Venice for all four. Your prompts are anonymised through their proxy and nothing is stored. See our Venice review for the full privacy model assessment, including the distinction between anonymisation and confidentiality.

The embedding model is the one people most often misconfigure. It must be an embedding model, not a chat model. Venice offers text-embedding-bge-m3. For Ollama, pull nomic-embed-text with ollama pull nomic-embed-text.

Step 3: Run Agent Zero with Docker

# Build and start the container
docker compose up -d

# Check it is running
docker ps

# View logs
docker logs agent-zero -f

Agent Zero exposes a web interface on port 50001 by default. Access it at http://your-vps-ip:50001 or http://localhost:50001 if running locally.

Step 4: Test the agent

Open the web interface and give the agent a task:

Research the current MOR token price and calculate my daily earnings
if I have 50 stETH staked in the Morpheus capital contract.

The agent will:

Search for current MOR price data
Look up total stETH staked in Morpheus
Calculate the proportional daily emissions
Return a formatted answer

This is a simple example. Agent Zero can handle multi-step tasks including web research, file operations, code execution and API calls. The key difference from a centralised agent: your prompts go through Venice (encrypted, no logging) or your own Ollama instance, not through OpenAI or Anthropic.

Step 5: Connect to Morpheus compute (optional)

Instead of using Venice’s hosted API, you can route inference through the Morpheus compute network. This means your agent’s inference requests are served by decentralised compute providers earning MOR tokens. See our Morpheus Lumerin Node tutorial if you want to run the other side of this, providing compute rather than consuming it.

The Morpheus compute endpoint works as an OpenAI-compatible API. Update your .env:

CHAT_API_BASE=https://compute.mor.org/v1
CHAT_API_KEY=your-morpheus-api-key
CHAT_MODEL=llama-3.1-70b

This adds latency compared to Venice or local Ollama. The trade-off is that you are using genuinely decentralised infrastructure and contributing to network demand that drives MOR token value.

Step 6: Make it persistent

For a VPS deployment, ensure the agent restarts automatically:

# Docker compose already handles restart policy
# Verify in docker-compose.yml:
# restart: unless-stopped

# To update Agent Zero later
cd agent-zero
git pull
docker compose down
docker compose up -d --build

The three inference options compared

Option	Privacy	Speed	Cost	Sovereignty
Local Ollama	Full	Fast	Free (after hardware)	Complete
Venice AI	High (encrypted, no logging)	Fast	Per-token pricing	High
Morpheus compute	High	Moderate	MOR per request	Complete

I use all three depending on the task. Local Ollama on my Mac Studio for sensitive work and rapid iteration. The prompt never leaves my machine. Venice for tasks that need larger models than my hardware supports, or when I want uncensored output. Morpheus compute when I want to test the network and contribute to demand. Running Agent Zero on both a VPS and locally means I can keep persistent agents running remotely while experimenting freely on my local instance.

What the agent can do

With the sovereign stack running, your Agent Zero instance can:

Research topics and synthesise information from the web
Execute code (Python, shell) to process data
Interact with files on the host system
Call external APIs on your behalf
Chain multiple steps together autonomously
Operate 24/7 without supervision

What it cannot do yet: interact with DeFi protocols directly, manage wallets or execute on-chain transactions. These capabilities are on the Morpheus roadmap but are not production-ready in Agent Zero today. If someone tells you their agent is autonomously trading on your behalf via decentralised infrastructure, verify that claim carefully.

Troubleshooting

Agent not responding. Check Docker logs: docker logs agent-zero. The most common issue is an incorrect API key or unreachable API endpoint.

Slow responses. If using Morpheus compute, expect higher latency than centralised APIs. If using local Ollama, check that your model fits in available memory.

Container not starting. Ensure Docker is running and the port is not in use: docker ps and lsof -i :50001.

Venice API errors. Verify your API key is active and has credits. Check Venice’s status page for outages.