Listen to this episode

Privacy Is the Killer App for Decentralised AI

Every prompt you send to OpenAI is training data. Every query to Claude is logged. Privacy is not a nice-to-have. It is the reason decentralised AI matters beyond ideology.

The invisible cost of free

When you send a promptPromptThe text you give an AI model to tell it what to generate. A prompt can be a simple question, a long instruction, a chunk of context plus a task, or a conversation history the model uses to produce its response.Like a brief you give to a junior designer. A vague brief gets a vague result. A detailed brief with context, constraints, and examples gets something usable. The quality of the output depends heavily on the quality of the brief.Read more → to ChatGPT, you are not just paying with money. You are paying with data. Your queries, your documents, your code, your confidential business strategies - all of it flows through OpenAI’s servers, gets logged, and may be used to train future models.

OpenAI’s enterprise privacy page is 2,400 words long. That is the length it takes to explain all the ways your data moves through their infrastructure. The page exists because corporate legal teams demanded it. Samsung employees leaked source code. Amazon staff shared confidential documents. Banks discovered their analysts were pasting financial data into chatbots.

Centralised AI companies will tell you they don’t train on enterprise data. They have settings for that now. But those settings are toggles on their infrastructure, not architectural guarantees. When the next data breach happens, or the next government subpoena arrives, those toggles won’t protect you. The data exists. It is stored. It can be accessed.

Privacy in AI isn’t about hiding wrongdoing. It’s about the fundamental difference between renting intelligence and owning it. When you run inferenceInferenceRunning a trained AI model to produce an answer. Inference is what happens when you type a prompt into ChatGPT and get a response. The model takes your input, computes a best guess, and returns it.Like asking an expert for their opinion. The training was the decades they spent becoming an expert. The inference is the 30 seconds it takes them to answer your specific question.Read more → on infrastructure you control, your prompts exist only on your terms. That’s sovereignty. For a practical breakdown of what a full sovereignty setup looks like, see the sovereignty stack.

The privacy technologies, honestly evaluated

Three cryptographic approaches dominate the conversation around private AI inference. Each works. Each has tradeoffs that matter in practice.

Trusted Execution Environments (TEE)

TEE-based inference runs your prompts inside a hardware-enforced enclaveEnclaveAn isolated region of CPU or GPU memory protected by hardware. Code and data inside the enclave are inaccessible to the operating system, the hypervisor, or even the machine's physical owner.Like a secure room inside a much larger office building. The building's caretakers have keys to every other room but not this one. What happens inside is invisible to them by design.Read more →. The GPUGPUGraphics Processing Unit. Originally designed to render video game graphics, GPUs turned out to be exceptionally good at the massively parallel math that AI models need. Modern AI training and inference runs almost entirely on GPUs.Like a factory with 10,000 workers doing the same simple task in parallel, versus a CPU which is more like 10 workers each doing different complex tasks. AI training involves doing simple math a million times per second on a million numbers, which is exactly what the GPU factory is designed for.Read more → processes your data, but even the server operator cannot see the contents. Intel SGXSGXIntel Software Guard Extensions. The first widely-deployed TEE technology, introduced in 2015. SGX creates encrypted memory regions (enclaves) where code and data are protected from the operating system and the machine's owner.Like a safe deposit box at a bank. The bank owns the safe room and can see who comes in and out, but they can't see what's inside the boxes. SGX gives applications a private box on a shared computer.Read more →, AMD SEV, NVIDIA Confidential Compute - these are the hardware primitives.

Performance overhead is minimal. Phala Network’s GPU TEETEETrusted Execution Environment. A hardware-secured region of a CPU or GPU where code runs in isolation, so even the machine's operator can't read what's happening inside. TEEs give decentralised AI inference privacy guarantees.Like a bank vault inside a bank. The bank owns the building, staffs the lobby, and runs the security cameras. But what's inside the vault is invisible to everyone, including the bank staff, unless the customer opens it.Read more → documentation claims less than 2% throughput penalty for large modelModelA trained neural network that takes inputs (text, images, audio) and produces outputs (more text, classifications, generated content). In DeAI the model is the thing that actually does the work.Like a very experienced apprentice who has spent years watching thousands of masters make furniture. They can't explain how they know when a joint is right, but they can make a chair that looks and functions like a Chippendale. The training is invisible. The output is what matters.Read more → inference on NVIDIA H100 and B200 hardware. Near-native performance for private inference.

The tradeoff is trust. TEEs rely on hardware manufacturers to implement isolation correctly. When Intel’s SGX was compromised in 2022, Secret Network had to coordinate a network-wide key update. Phala itself responded to a similar SGX key extraction attack by shutting down all SGX workers and migrating to Intel TDX. Competent handling of a real vulnerability, but a reminder that hardware trust is not absolute. You’re trading trust in a cloud provider for trust in a hardware vendor.

Fully Homomorphic Encryption (FHE)

FHEFHEFully Homomorphic Encryption. A cryptographic technique that lets you compute on encrypted data without decrypting it. The result is also encrypted, and only the data owner can read it. FHE is the strongest form of computational privacy.Like sending a sealed box of ingredients to a chef, having the chef cook a meal inside the box without ever opening it, and getting back a sealed box with the finished meal. Only you can unseal it and see what's inside.Read more → allows computation on encrypted data. Your prompt stays encrypted from your machine through the entire inference process. The server computes the answer without ever seeing the question.

The technology is genuine magic. It’s also practically unusable for most AI workloads today. Current FHE implementations impose 100-1000x computational overhead. GPU acceleration has pushed throughput to 20-30 transactions per second for simple operations, but that’s orders of magnitude below what inference demands.

FHE is the future. For today, it is research infrastructure, not production inference.

Multi-Party Computation (MPC)

MPCMPCMulti-Party Computation. A cryptographic technique where multiple parties jointly compute a function over their inputs without any party revealing its input to the others. Useful for shared computation on private data.Like a group of friends calculating the average of their salaries without anyone revealing their actual salary to the others. Each person contributes a piece of the answer, and the pieces combine into the result, but nobody learns anything except the average itself.Read more → distributes computation across multiple parties, none of whom see the complete data. Your prompt gets split into fragments, processed separately, and recombined.

The limitation is network latency, not compute. Every operation requires coordination between parties. For inference on large models, MPC currently adds 10-100x latency compared to plaintext execution. It works best when you already have a distributed network with economic alignment between parties, but for a single user wanting private inference on a single prompt, MPC is overkill.

The practical verdict

Privacy Technology Comparison

TechnologyOverheadProduction ReadyTrust Model
TEE Under 5% Yes Hardware vendor
FHE 100-1000x Research stage Mathematical
MPC 10-100x (latency) Limited use cases Network participants

For private inference today, TEE is the only option with acceptable performance. The trust tradeoff is real, but it’s a better tradeoff than sending your data to OpenAI.

The enterprise compliance tailwind

GDPR. HIPAA. The EU AI Act. California’s CCPA. The regulatory landscape for AI data processing is expanding, not contracting.

Every enterprise legal team is asking the same question: can we send patient data to OpenAI? Can we process financial records through Claude? Can we let engineers paste source code into ChatGPT?

The answer, increasingly, is no. Or more precisely: only with a BAA, a DPA, and written guarantees about data handling that may or may not survive a government subpoena.

Google and OpenAI now offer HIPAA-compliant deployments for enterprise. But compliance isn’t the same as privacy. HIPAA allows data processing under agreement. It doesn’t prevent the data from being processed on someone else’s infrastructure.

When you run inference on a TEE-enabled GPU network or on your own hardware, the compliance question simplifies dramatically. Your data never leaves infrastructure you control. No third party can be compelled to produce records they never possessed. That’s the architectural answer to GDPR, HIPAA, and the AI Act simultaneously.

Venice, the privacy-focused inference platform, reports 1.3 million registered users and claims to process 45 billion tokens daily. That is not traffic funnelled through a single provider. That is inference demand from users who explicitly chose not to send their prompts to OpenAI. The signal is clear.

Venice Registered Users
1.3M
Daily Tokens Processed
45B

What private inference actually costs

The assumption is that privacy costs more. The pricing data tells a different story.

Venice APIAPIApplication Programming Interface. A structured way for one piece of software to talk to another. In DeAI, APIs let applications request inference from a model without running the model themselves.Like a waiter in a restaurant. You don't walk into the kitchen and cook your own meal. You tell the waiter what you want, they tell the kitchen, the kitchen cooks it, and the waiter brings it back. The API is the waiter.Read more → pricing (per 1M tokens):

Venice API Pricing (per 1M tokens)

Model CategoryInputOutput
Budget (Llama 3.2, Gemma, Nemotron) $0.07-0.15 $0.20-0.60
Mid-range (Llama 3.3 70B, Qwen 3) $0.70-0.75 $2.80-3.20
Premium (Claude Opus, GPT-5.2) $2.00-6.00 $15.00-30.00

For comparison, OpenAI’s GPT-5.2 costs $1.75 per 1M input tokens. Claude Opus 4.6 costs $5.00 per 1M input tokens through Anthropic directly.

Private inference through decentralised networks is often cheaper than centralised APIs. Akash Network claims GPU compute at 50-85% below AWS, GCP and Azure pricing for comparable workloads. The reverse auction model drives prices down because providers compete for your deployment. Remove the premium for OpenAI’s brand and infrastructure margin, and inference is fundamentally cheap. Privacy isn’t the premium. Centralisation is.

The real cost of centralised AI isn’t the API fee. It’s the implicit cost of data exposure, compliance overhead, and platform dependency.

Who is actually shipping private inference

Venice

Venice routes prompts through a privacy proxy to distributed GPU providers sourced from Akash, Hyperbolic, Prime Intellect, NEAR AI Cloud, and Phala Network. No logs, no conversation history, no stored prompts. Everything stays in your browser’s local storage.

The architecture is not fully decentralised; Venice runs a centralised proxy. But privacy capabilities now span multiple levels. In the default Private mode, prompts are anonymised through the proxy and GPU providers see individual requests without user identity. In March 2026, Venice launched TEE and E2EE modes powered by NEAR and Phala: TEE runs inference inside hardware enclaves with verifiable attestationAttestationA cryptographic proof that a piece of code is running on a specific hardware enclave in an unmodified state. Attestation lets remote users verify that a service is genuinely running what it claims to be running.Like a tamper-evident seal on a medicine bottle. The seal itself doesn't make the medicine safe, but it gives you a way to verify that nobody opened the bottle and swapped the contents before you bought it.Read more →, while E2EE encrypts prompts on your device before they leave, decrypting only inside the enclave. Neither Venice nor the GPU provider can see your data in E2EE mode. For most users, even the default mode is a meaningful improvement over OpenAI. For sensitive workloads, E2EE provides cryptographic guarantees within the same product.

Pricing: Pro subscription at $18/month for unlimited text with premium model access, or API access starting at $0.07 per 1M tokens for budget models. The VVV tokenTokenA digital unit of value or access rights tracked on a blockchain. Tokens can represent ownership in a project, a right to use a service, a share of future revenue, or simply a tradable asset with no underlying claim.Like a physical poker chip a casino issues. The chip itself has no value. What makes it worth something is what it lets you do at the casino, what the casino has promised, and how much other people will pay you for it.Read more → stakingStakingLocking up a cryptocurrency to help secure a blockchain network, usually in exchange for rewards. The locked tokens act as a security deposit that can be taken away if the staker misbehaves.Like putting down a large rental deposit for an apartment. You get the money back if you behave, you earn interest while it's locked, and the landlord takes it if you trash the place.Read more → model offers pro-rata inference capacity without per-token fees.

Phala Network

Phala has pivoted from general-purpose confidential computingConfidential ComputeHardware-enforced computation where data and code are encrypted in memory and only the authorised application can access them. The machine's operator cannot read what the application is doing even though they own the machine.Like renting space in a bank vault. The bank owns the building and runs the security, but what you put in the vault is invisible even to the bank staff. Only you have the key.Read more → to AI agent infrastructure. GPU TEE inference on NVIDIA H100, H200, and B200 hardware is live, with SOC 2 Type I and HIPAA compliance certifications that most crypto projects cannot match. The open-source dstack SDKSDKSoftware Development Kit. A collection of code libraries, documentation, and tools that lets developers integrate a service into their applications without writing everything from scratch. SDKs are how projects become easy to build with.Like a plug-and-play kit for building furniture. You don't have to mill your own wood, forge your own screws, or design the joinery from scratch. The kit gives you pre-cut parts and instructions so you can assemble the thing in an afternoon.Read more →, donated to the Linux Foundation’s Confidential Computing Consortium, converts standard containers into confidential VMs.

The commercial traction is early: 398 paid users and a partially verified $2M+ ARR. That is more than most DePINDePINDecentralised Physical Infrastructure Networks. Protocols that use token incentives to coordinate real-world physical infrastructure like GPU compute, wireless networks, storage, mapping sensors, or bandwidth.Like crowd-sourced ride-sharing but for physical hardware. Uber incentivises drivers with dollars. DePIN incentivises hardware operators with tokens. The network grows because individuals choose to contribute capacity in exchange for rewards.Read more → projects can demonstrate, but it is a tiny base for a cloud infrastructure business. The TEE technology is credible and differentiated. Whether Phala converts its compliance positioning into meaningful adoption is the open question.

Paid Users
398
Annual Recurring Revenue
$2M+
Network Devices
29,478

Other privacy-adjacent projects

Nillion (NIL) and Oasis (ROSE) are general-purpose privacy infrastructure that has pivoted toward AI as a use case. Nillion offers “blind computation” through MPC and secret sharing; Oasis provides the ROFL framework for verifiable off-chain inference via TEEs. Both are earlier stage for AI-specific workloads than Venice or Phala.

Local inference

The most private inference is inference that never touches a network. Running Llama 3.3 70B on a Mac Studio with 64GB unified memory requires no external API, no data transmission, no logs on anyone else’s server.

The tradeoffs are real:

  • Hardware cost: $3,000-6,000 for a capable local machine
  • Model selection: open weightsOpen WeightsAn AI model whose trained parameters are publicly published and downloadable, so anyone can run, fine-tune, or modify it without permission. Llama, Qwen, DeepSeek, Mistral, and Hermes are open-weight models.Like the difference between a published recipe and a restaurant's secret formula. Anyone with the recipe can cook the dish at home, modify it, or open their own restaurant. The secret formula stays locked in someone else's kitchen.Read more → only, no Claude or GPT-5
  • Performance: quantised models run at 80-90% quality but slower than cloud inference

For sensitive work - legal documents, medical analysis, confidential code - local inference is the gold standard. For everything else, private cloud inference through TEE-enabled providers offers a practical middle ground.

A practical framework

Not every workflow requires private inference. But the decision should be intentional, not accidental.

Run locally when:

  • Working with confidential client data (legal, medical, financial)
  • Processing proprietary code or trade secrets
  • Operating in regulated industries with data residency requirements
  • You want zero ongoing costs after hardware investment

Use TEE-enabled inference when:

  • You need model quality that open weightsParametersThe internal numbers (weights and biases) inside a neural network that get adjusted during training. A 70-billion-parameter model has 70 billion adjustable internal numbers encoding everything it has learned.Like the synapses in a human brain. Each parameter is a tiny dial that gets nudged a little during training. With enough dials, the network can represent surprisingly complex patterns. The total parameter count is roughly how much "brain" the model has.Read more → cannot match
  • Latency matters but privacy also matters
  • You want API convenience without the data exposure
  • Compliance requires data to stay within certain jurisdictions

Use decentralised compute (non-TEE) when:

  • Privacy is not a primary concern
  • Cost savings are the priority
  • You are running batch workloads without sensitive data
  • You want to participate in the network economics (staking, earning)

Stick with centralised APIs when:

  • The data is already public or non-sensitive
  • You need frontier model capabilities not available elsewhere
  • The convenience-privacy tradeoff is acceptable for your use case

The privacy premium is negative

GPT-5.2 costs $1.75 per 1M input tokens. Claude Opus 4.6 costs $5.00. You pay that premium for frontier model quality, brand trust, and polished developer experience. Fair enough.

But nobody is pricing the privacy discount. When you send data to a centralised provider, you’re giving them something valuable: your trainingTrainingThe one-time process of teaching a neural network to perform a task by showing it massive amounts of example data and adjusting its internal weights until the outputs are good. Training builds the model; inference uses it.Like the years an apprentice spends learning a trade. You don't see any of the actual work, just thousands of repeated mistakes gradually becoming competence. By the end, the apprentice can do the job. The training was invisible, but the skill is now permanent.Read more → data, your usage patterns, your competitive intelligence. You should be paid for that, not charged extra.

Venice’s budget models sit at $0.07 per 1M tokens, undercutting GPT-4 by 97%, and the privacy comes with it at no extra charge. The decentralised AIDeAIDecentralised AI. An umbrella term for blockchain-based projects that build AI infrastructure (compute, data, inference, models, agents) without a single central provider controlling the system.Like the difference between streaming a movie from Netflix and sharing it via BitTorrent. Netflix is fast and polished but one company controls what you can watch and what you pay. BitTorrent is messier but no single operator can shut you out.Read more → projects that will win aren’t the ones shouting “private inference as a premium feature.” They’re the ones making privacy the default and charging less for it, because they’re not extracting value from your data to cross-subsidise their margins.

What changes in six months

TEE-enabled inference is production-ready now. FHE is 18-24 months from competitive throughput. Local inference hardware improves every generation.

Regulatory pressure isn’t going to decrease. GDPR enforcement is increasing. The EU AI Act is in effect. State-level privacy laws in the US are multiplying. Every enterprise legal team that hasn’t already restricted ChatGPT usage will do so within two years.

Venice, Phala, and the TEE-enabled compute networks are positioning for exactly this moment, building the infrastructure that enterprise AI adoption will require. Not because ideology demands it. Because compliance demands it.

By the time FHE achieves competitive throughput, the private inference market will already be established. Winners will be the projects shipping today with TEE, solving real problems for real users, and building the trust that cryptographic proofs alone can’t provide.

Privacy is the killer app. Not because it sounds good in a whitepaper. Because every prompt you send to a centralised provider is a liability you are choosing to accept.

Score changes, new reviews, one editorial take every two weeks. No spam.