When does self-hosted open source AI make sense for an SME?

Three situations make it worth considering. 1) Highly sensitive data (client-confidential cases, personal data, GDPR Article 9) that cannot leave your infrastructure. 2) Expected use above DKK 50,000 to 100,000 per year in cloud AI costs, where self-hosted becomes economically competitive. 3) Compliance positioning that requires documented data sovereignty (law firms, accountants, financial advisers). If none of the three trigger, start with cloud AI and consider self-hosted later once usage is established.

What hardware do we need for self-hosted AI?

It depends on the model and concurrent users. For Llama 3.3 70B with 5 to 20 concurrent users: a server with 2x NVIDIA RTX 4090 (24GB VRAM each) or 1x A6000 costs DKK 60,000 to 90,000 and covers the need. For smaller models (Llama 3.3 8B, Mistral 7B), a single RTX 4090 is enough for 30+ users. On Jesper's EnterpriseIQ stack the models run on a Proxmox host with GPU passthrough to a dedicated LXC running Ollama or vLLM. Power consumption: 400 to 700 watts under load, which is DKK 8,000 to 14,000 per year on electricity.

What does self-hosted AI cost on a 3-year horizon versus cloud?

For 10 to 20 concurrent users with moderate use: cloud AI (Claude Team or GPT Enterprise) typically costs DKK 4,000 to 8,000 per month, that is DKK 150,000 to 290,000 over 3 years. Self-hosted (server DKK 80,000 + electricity DKK 12,000 per year + operations DKK 15,000 per year) costs about DKK 90,000 in year 1, DKK 30,000 in years 2 and 3, for a total of DKK 150,000 over 3 years. Break-even sits at around DKK 5,000 per month of cloud spend. Below that, cloud is economically clearly better. Above it, self-hosted wins, plus you get data sovereignty on top.

What is a hybrid stack and why is it recommended?

Hybrid means you use BOTH self-hosted and cloud, where each task is routed to the right tool. Client-confidential tasks (contract review, client notes, financial data) are routed to self-hosted Llama. Strategy deliberation and general research (non-confidential) are routed to Claude Opus or GPT-5. n8n or a custom router agent handles routing based on task type. That gives the best of both worlds: data sovereignty where it matters, plus state-of-the-art models where it does not. Hybrid is typically the right answer for knowledge-intensive SMEs.

What is OpenWebUI and why is it used?

OpenWebUI is an open source chat interface that looks similar to ChatGPT or Claude but runs on your own infrastructure and talks to your self-hosted models via Ollama or vLLM. It gives employees a familiar interface without having to learn new tools. Plus: built-in user management (can be integrated with Authentik SSO), audit trail of conversations, and the ability to create custom 'prompts' that work like ChatGPT GPTs but run locally. EnterpriseIQ runs OpenWebUI as the standard interface for the self-hosted part of the stack.

How hard is it to operate self-hosted AI?

Harder than SaaS but not insurmountable for SMEs with existing IT competence. Self-hosted AI requires Linux server operations (Proxmox or another hypervisor), GPU management, model updates every 3 to 6 months, and monitoring of latency and errors. If you already have one person who can operate Linux servers, the extra work is 4 to 8 hours per month. If you do not, you should either outsource (managed self-hosted via EnterpriseIQ retainer) or stay on cloud. Self-driven operations without the competence end in unavailable systems and frustrated users.

What happens if Meta or Mistral close down the open source models?

Already released models (Llama 3.3, Mistral Large 2) are permanently licensed under open weights licenses. Even if Meta stopped releasing new versions, the existing models keep working. The risk is the absence of new updates, not that the existing ones stop. In practice the ecosystem (Meta, Mistral, Alibaba, Cohere) is competitive, so if one player pulls back, another takes the lead. Robustness for SME strategy: track 2 to 3 model families, so you can migrate if one player changes course.

What role does the EU AI Act play in the cloud versus self-hosted choice?

The EU AI Act does not require self-hosted, but it requires documented data flow and supplier risk. Cloud AI with EU residency and DPA can fully meet the requirements. Self-hosted is simply easier to document and audit. For high-risk AI systems (Articles 9 to 15) the audit trail requirements are significant. Self-hosted gives full control over logging and evidence retention. For minimal-risk, cloud is completely fine. The decision is not binary. It is a compliance-cost analysis per AI system. See our governance pillar for details.

How do we get started with self-hosted AI?

Three phases over 4 to 6 weeks. 1) Assess: identify 1 to 3 use cases where data sovereignty is critical, calculate the cloud cost baseline. 2) Pilot: set up Proxmox host + GPU + Ollama + OpenWebUI for one pilot, run a 30-day measurement period on one concrete workflow. 3) Decide: scale to more workflows OR stop if cloud turns out to be just as good. EnterpriseIQ delivers the entire setup as a push-button engagement (Kit #4 Pilot Project) for DKK 80,000 to 150,000. We run the stack ourselves and can show you what ours looks like.

Which open source tools should be part of the stack besides the model?

Five components are typically part of the stack. 1) Model server: Ollama (lightweight, single-user) or vLLM (production, multi-user with batching). 2) Chat interface: OpenWebUI. 3) Orchestration: n8n (visual agent flows) or LangGraph (code-based). 4) Vector DB: Qdrant (data sovereignty, fast) or Chroma (lightweight). 5) Audit trail and monitoring: Langfuse (open source) or custom logging to Prometheus + Grafana. All five run on Proxmox LXCs on Jesper's EnterpriseIQ stack. See /en/ai-stack for a live view of the stack.

Pillar · Published 2026-05-27 · 16 min read

Open source AI for SMEs

Q: Which open source models are good enough in 2026?

Three tiers. 1) Llama 3.3 70B (Meta, open weights) is competitive with GPT-4 level on most benchmarks. 2) Mistral Large 2 and Mixtral 8x22B are strong on multi-language including European languages. 3) Qwen 3 32B from Alibaba is surprisingly strong on structured data and code. For SME use, Llama 3.3 70B is the right default choice. Best-in-class cloud models (Claude Opus 4.7, GPT-5) are still meaningfully better on complex strategy deliberation, but the gap is no longer decisive for most internal workflows.

Until two years ago, open source AI was something leaders could overlook without losing competitive ground. The cloud models (GPT, Claude, Gemini) were meaningfully stronger. That has changed. In 2026 Llama 3.3 70B is competitive with GPT-4 on most benchmarks, Mistral Large 2 is strong on European languages, and Qwen 3 from Alibaba is surprisingly strong on structured data. That raises a real choice for knowledge-intensive SMEs: when does self-hosted make more sense than cloud? This pillar walks through when it is worth considering, hardware requirements, cost comparison, and how a hybrid stack is typically the right answer.

Written by Jesper Sachmann, founder of EnterpriseIQ. Runs all of EnterpriseIQ on a hybrid stack with self-hosted Llama plus Claude and GPT in a routing flow. Our stack is documented openly at /en/ai-stack.

TL;DR

→Open source AI (Llama 3.3 70B, Mistral Large 2) is competitive enough in 2026 for most SME use cases.
→Three triggers make self-hosted relevant: data sovereignty, cost above DKK 5,000 per month, compliance positioning.
→Hardware: a single server with 2x RTX 4090 covers 5 to 20 concurrent users on Llama 3.3 70B. Price DKK 60,000 to 90,000.
→3-year TCO: self-hosted wins at cloud spend above DKK 5,000 per month. Below that, cloud wins.
→Hybrid stack is typically the right answer: self-hosted for client-confidential tasks, cloud for strategy and research.

Why open source AI is real in 2026

The argument for ignoring open source AI has for many years been the same. The cloud models were significantly stronger, and the gap was large enough to justify data exposure and monthly payments to OpenAI or Anthropic. That argument is now more nuanced.

Llama 3.3 70B (released by Meta) delivers benchmarks around GPT-4 level on ordinary tasks: summarisation, classification, translation, draft generation. On complex strategy deliberation and coding, Claude Opus 4.7 and GPT-5 are still clearly ahead, but on 70 to 80 percent of typical SME work the gap is no longer decisive.

At the same time, hardware costs have fallen. The NVIDIA RTX 4090 (24GB VRAM) costs DKK 18,000 to 22,000 and can run Llama 3.3 70B in 4-bit quantised form with acceptable performance. Two cards in the same server cover 5 to 20 concurrent users. For an SME with 30 to 100 employees, that is realistic hardware territory.

Finally: the EU AI Act and GDPR have turned data sovereignty into a real compliance argument. Client-confidential data sent to US cloud providers requires more documentation and risk management than data that stays on your own Proxmox host. For law firms, accountants and financial advisers, that is the difference between "we use AI with full control" and "we use AI, and here is the DPA".

Three triggers that make self-hosted relevant

Not every SME should run self-hosted AI. Three situations make it worth considering. If none of them trigger for you, start with cloud and return to the question in 12 months.

Trigger 1: Highly sensitive data

Client-confidential cases (legal cases, audit engagements, financial plans), personal data, or anything falling under GDPR Article 9 (special categories) or industry confidentiality rules. If that data cannot leave your infrastructure without explicit client approval, self-hosted is typically the right choice for the specific use cases.

Nuance: cloud AI with EU residency and a documented DPA (Claude Enterprise, Microsoft Copilot for M365 Enterprise) can often meet the requirements. Self-hosted is simply easier to document and audit.

Trigger 2: Cloud spend above DKK 5,000 per month

If your cloud AI spend passes DKK 5,000 per month (DKK 60,000 per year), self-hosted starts to be economically competitive on a 3-year horizon. Hardware investment DKK 80,000 + operations DKK 15,000 per year + electricity DKK 12,000 per year typically land at DKK 150,000 over 3 years, versus DKK 180,000 to 290,000 cloud over the same period.

Note: cost numbers are not the only thing that counts. Self-hosted requires IT competence you need to have or buy. If you do not have it, cloud is economically clearly better even at higher usage.

Trigger 3: Compliance positioning

If your customers explicitly ask about data sovereignty, or industry standards require documented on-premise processing (certain public sector customers, financial institutions with heavy regulation), then self-hosted is part of your go-to-market story, not just an internal operational decision.

Example: a law firm bidding on framework agreements with ministries or defence suppliers may need to document that client data is processed on-premise.

If zero triggers are active: stay on cloud. If one trigger is active: consider hybrid stack (cloud + selective self-hosted for the specific workflows). If two or more are active: hybrid stack with the centre of gravity on self-hosted is typically the right answer.

Model landscape 2026

Three model families cover 95 percent of SME needs. They are all open weights and can be run locally.

Llama 3.3 (Meta)

Three sizes: 8B (lightweight, fast, runs on a single RTX 4090), 70B (default choice, requires 2x RTX 4090), 405B (only for large organisations with a dedicated GPU cluster). 70B is the right default for SMEs.

Strong on: English, summarisation, classification, code. Weaker on: nuanced European languages, complex reasoning. Licence: Llama 3 Community License (commercial use is fine for SMEs with under 700M monthly active users, that is no real constraint).

Mistral Large 2 and Mixtral 8x22B (Mistral AI, France)

Mistral Large 2 is proprietary but can be licence-run on-premise via Mistral Enterprise. Mixtral 8x22B is open weights and can be run freely. Both are strong on European languages.

Strong on: multi-language (good European-language quality), legal vocabulary, financial terminology. EU-based supplier, which simplifies the compliance story. Licence: Mixtral is Apache 2.0 (free commercial use).

Qwen 3 (Alibaba)

Three sizes: 7B, 32B, 72B. 32B is the sweet spot for performance/hardware balance. Open weights under Apache 2.0.

Strong on: structured data, code, mathematical reasoning. Weaker on: creative text generation. Political nuance: Chinese supplier, which some customers will perceive as a risk. For technical use cases (IT services, data analysis) it is often not an issue. Verify with the customer where relevant.

Recommendation for SME start: Llama 3.3 70B as the default choice. Add Mixtral 8x22B if European-language quality is critical for your use case. Add Qwen 3 32B if you have heavy structured-data or code use cases.

Hardware setup and cost comparison

Hardware requirements are no longer extreme. A realistic SME stack looks like this.

Server spec (SME level, 5 to 20 concurrent users):

CPU: AMD Ryzen 9 7950X or Intel Core i9-14900K

RAM: 128GB DDR5

GPU: 2x NVIDIA RTX 4090 (24GB VRAM each), 48GB VRAM total

Storage: 2TB NVMe SSD (models + cache)

Power supply: 1600W

OS: Proxmox VE 8 with GPU passthrough to LXC

Software stack: Ollama or vLLM + OpenWebUI + n8n + Qdrant + Langfuse

Purchase price: DKK 60,000 to 90,000 incl. VAT

Power consumption: 400 to 700W under load, 100W idle. About DKK 12,000 per year on electricity at typical SME use.

Operations cost: 4 to 8 hours per month of one IT person, which is DKK 15,000 to 25,000 per year if valued in cash.

3-year TCO comparison

For an SME with 15 users, moderate use (typical workplace with AI-augmented daily work):

Cloud (Claude Team, 15 seats):

Price: DKK 215 per seat per month x 15 = DKK 3,225 per month = DKK 38,700 per year

3-year total: about DKK 116,000

Cloud (Claude Enterprise, 15 seats):

Price: typically DKK 350 to 500 per seat per month x 15 = DKK 5,250 to 7,500 per month

3-year total: DKK 189,000 to 270,000

Self-hosted (server + operations):

Year 1: DKK 80,000 (server) + DKK 12,000 (electricity) + DKK 20,000 (operations) = DKK 112,000

Year 2: DKK 12,000 + DKK 20,000 = DKK 32,000

Year 3: DKK 12,000 + DKK 20,000 = DKK 32,000

3-year total: about DKK 176,000

The difference is not dramatic. Over a 3-year horizon Claude Team is marginally cheaper than self-hosted, but only for non-confidential use cases. If you have to move to Claude Enterprise (for DPA and data handling), self-hosted becomes significantly cheaper. Plus: you get data sovereignty and can use the stack for tasks that would otherwise have been excluded.

Hybrid stack: best of both worlds

For most knowledge-intensive SMEs the answer is not "cloud" or "self-hosted". It is hybrid. You use BOTH, and each task is routed to the right tool based on data sensitivity.

Typical hybrid routing rule:

Routes to self-hosted Llama 3.3 70B:

- Client-confidential cases (contract review, audit notes)

- Personal data under GDPR Article 9

- Internal strategic documents with high confidentiality

- Client-specific analysis where data cannot leave infrastructure

Routes to Claude Opus 4.7 or GPT-5:

- General research without confidential data

- Complex strategy deliberation (the model is stronger)

- Code assistance on non-proprietary code

- Marketing copy, communication drafts without confidential detail

Routing can be handled by n8n or a custom router agent. The user does not need to know which model is being used. They see one chat interface (OpenWebUI), and routing happens behind the scenes based on task type, document sensitivity, or explicit selection.

EnterpriseIQ runs on a hybrid stack itself: Llama 3.3 70B for client work, Claude Opus 4.7 for strategy and this pillar, Perplexity for research. See /en/ai-stack for details.

Industry examples

Law firm: contract review with full confidentiality

Self-hosted Llama 3.3 70B + OpenWebUI for contract review and client notes. Cloud Claude Opus for internal strategic research and employee announcements. Result: client data never leaves the Proxmox host, while the lawyer still has state-of-the-art AI for strategy deliberation.

Accounting firm: financial analysis with data sovereignty

Self-hosted Qwen 3 32B (strong on structured data) + Mixtral 8x22B (European-language quality) for audit notes and materiality assessment. Cloud GPT-5 for general professional research and publication work. Audit trail retained in Langfuse on the same Proxmox host as the models.

Financial advisory: portfolio analysis with GDPR Article 9 data

Self-hosted Mixtral 8x22B for portfolio reports and client communication (personal data + financial detail). Cloud Claude Opus for market analysis and strategic deliberation without client data. Compliance positioning: the customer can see in the DPA that their data does not leave the advisory firm's infrastructure.

IT services: code assistance and knowledge base

Self-hosted Qwen 3 32B (strong on code) + Llama 3.3 70B (general) for client-specific code assistance and internal knowledge base across all support tickets. Cloud Claude for complex architecture advice. Client-confidential system names and credentials never leave the stack.

Seven typical pitfalls

Pitfall 1: Self-hosted without IT competence

SMEs that buy a server and try to operate without Linux experience or GPU management know-how. Result: unstable operations, frustrated users, a quick return to cloud. Fix: either you have IT competence in-house, or you outsource operations (EnterpriseIQ retainer or another managed service). Self-hosted is not "cheaper" if it ends in IT chaos.

Pitfall 2: Believing open source equals free

The models are free to download, but hardware, electricity, operations and upgrades are not. Total cost of ownership for a self-hosted SME stack lands at DKK 120,000 to 180,000 in year 1 including hardware investment. That is reasonable compared with cloud, but it is not free.

Pitfall 3: Ignoring the model update cycle

Open source models are updated every 3 to 6 months with significant improvements. Self-hosted stacks that are not maintained quickly fall behind cloud competitors. Fix: schedule model updates every 3 to 6 months as part of the operations work. Test the new model against baseline before swapping it into production.

Pitfall 4: Expecting cloud parity on the first generation

Self-hosted Llama 3.3 70B is not 100 percent equivalent to Claude Opus 4.7. On complex tasks you lose about 10 to 25 percent in quality. Fix: hybrid stack where self-hosted is used where data sovereignty is decisive, cloud is used where state-of-the-art quality is decisive.

Pitfall 5: No audit trail

Self-hosted gives full control over logs, but only if you actively set them up. Langfuse or custom Prometheus/Grafana must be established from day one. Otherwise you end up with "we have self-hosted AI" but no evidence for an EU AI Act audit. Fix: audit trail is part of the pilot canvas, not something added later.

Pitfall 6: GPU purchase before model testing

SMEs that buy a server with 2x RTX 4090 based on blog recommendations without having tested that the models actually deliver on their use cases. Fix: run the pilot on cloud first (or on a rented GPU instance at Lambda Cloud / RunPod) for 2 to 4 weeks. Decide on self-hosted investment based on verified use case value.

Pitfall 7: Believing self-hosted alone solves compliance

The EU AI Act requires inventory, risk classification, audit trail and governance policy regardless of whether you use cloud or self-hosted. Self-hosted simplifies some of the documentation but does not replace the compliance work. Fix: self-hosted is part of the compliance strategy, not the entire strategy.

Three steps you can take this month

Step 1: Assess your trigger status

Identify your top 5 AI use cases. How many involve client-confidential or personal data?
Estimate your cloud AI cost if you rolled AI out to the entire team. Are you crossing DKK 5,000 per month?
Do you have customers who ask about data sovereignty or industry standards that require it?
Zero triggers: stay on cloud. One or more: move on to Step 2.

Step 2: Pilot on rented GPU cloud

Rent a GPU instance at Lambda Cloud or RunPod (about DKK 1,500 to 3,000 for a 2-week test).
Set up Llama 3.3 70B + Ollama + OpenWebUI on the instance.
Test 3 to 5 of your actual use cases. Compare the output with Claude or GPT.
Decide: is the quality good enough to justify on-premise investment?

Step 3: Hardware investment OR managed self-hosted

If the pilot succeeded and you have the IT competence: invest in a server (DKK 60,000 to 90,000), set up Proxmox + Ollama stack, run the prioritised use cases on-premise.
If the pilot succeeded but you lack IT competence: order managed self-hosted via EnterpriseIQ retainer (we run the hardware and operations, you pay a fixed monthly fee).
If the pilot did not succeed: stay on cloud, come back in 6 to 12 months once the models have moved on.

FAQ

When does self-hosted AI make sense?

Three triggers: highly sensitive data that cannot leave infrastructure, cloud cost above DKK 5,000 per month, or compliance positioning that requires data sovereignty. Zero triggers: stay on cloud.

Which open source models are good enough in 2026?

Llama 3.3 70B (default), Mistral Large 2 or Mixtral 8x22B (strong on European languages), Qwen 3 32B (structured data and code). All at about GPT-4 level on ordinary tasks.

What does it cost?

Server DKK 60,000 to 90,000 + DKK 12,000 per year electricity + DKK 15,000 to 25,000 per year operations. 3-year TCO about DKK 150,000 to 180,000. Break-even versus Claude Team at about DKK 5,000 per month of cloud spend.

What is a hybrid stack?

Use BOTH self-hosted and cloud. Client-confidential tasks are routed to self-hosted Llama. Strategy and research are routed to Claude Opus or GPT-5. n8n handles routing based on data sensitivity.

How hard is it to operate?

Requires Linux server operations, GPU management, model updates. 4 to 8 hours per month if you have the competence. If not: outsource via managed self-hosted (EnterpriseIQ retainer) or stay on cloud.

Does self-hosted replace EU AI Act compliance work?

No. The EU AI Act requires inventory, risk classification and governance regardless of deployment. Self-hosted simplifies audit trail documentation but does not replace the compliance work.

Next steps

Three paths depending on where you stand:

Free

See our AI stack

Openly documented hybrid stack. We show how we run on Llama plus Claude plus GPT in a routing flow.

DKK 80,000 to 150,000

Self-hosted pilot

4 to 6 weeks delivery: hardware recommendation, setup, pilot on your prioritised use case, plus 30-day measurement period.

Free

30-minute conversation

No obligation. We work through your trigger status and assess whether self-hosted, cloud or hybrid is right for you.

About the author

Jesper Sachmann is the founder of EnterpriseIQ. 27 years of IT leadership across Oracle, Logica and Capgemini plus 11 years of Archer experience as Alliance Director Europe and Integrated Risk Management Lead Nordics, combined with hands-on self-hosted AI on Proxmox since 2023. The entire EnterpriseIQ business runs on a hybrid stack with self-hosted Llama plus cloud models in a routing flow.

AI attribution: This article is AI-assisted produced with Claude Opus 4.7, human review by Jesper Sachmann. See our AI transparency policy for how we use AI in every deliverable.

Citing this article? "EnterpriseIQ: Open source AI for SMEs (2026-05-27)" or link to enterpriseiq.dk/en/insights/open-source-ai-for-smes.