Open source AI for SMEs
Until two years ago, open source AI was something leaders could overlook without losing competitive ground. The cloud models (GPT, Claude, Gemini) were meaningfully stronger. That has changed. In 2026 Llama 3.3 70B is competitive with GPT-4 on most benchmarks, Mistral Large 2 is strong on European languages, and Qwen 3 from Alibaba is surprisingly strong on structured data. That raises a real choice for knowledge-intensive SMEs: when does self-hosted make more sense than cloud? This pillar walks through when it is worth considering, hardware requirements, cost comparison, and how a hybrid stack is typically the right answer.
Written by Jesper Sachmann, founder of EnterpriseIQ. Runs all of EnterpriseIQ on a hybrid stack with self-hosted Llama plus Claude and GPT in a routing flow. Our stack is documented openly at /en/ai-stack.
- →Open source AI (Llama 3.3 70B, Mistral Large 2) is competitive enough in 2026 for most SME use cases.
- →Three triggers make self-hosted relevant: data sovereignty, cost above DKK 5,000 per month, compliance positioning.
- →Hardware: a single server with 2x RTX 4090 covers 5 to 20 concurrent users on Llama 3.3 70B. Price DKK 60,000 to 90,000.
- →3-year TCO: self-hosted wins at cloud spend above DKK 5,000 per month. Below that, cloud wins.
- →Hybrid stack is typically the right answer: self-hosted for client-confidential tasks, cloud for strategy and research.
Why open source AI is real in 2026
The argument for ignoring open source AI has for many years been the same. The cloud models were significantly stronger, and the gap was large enough to justify data exposure and monthly payments to OpenAI or Anthropic. That argument is now more nuanced.
Llama 3.3 70B (released by Meta) delivers benchmarks around GPT-4 level on ordinary tasks: summarisation, classification, translation, draft generation. On complex strategy deliberation and coding, Claude Opus 4.7 and GPT-5 are still clearly ahead, but on 70 to 80 percent of typical SME work the gap is no longer decisive.
At the same time, hardware costs have fallen. The NVIDIA RTX 4090 (24GB VRAM) costs DKK 18,000 to 22,000 and can run Llama 3.3 70B in 4-bit quantised form with acceptable performance. Two cards in the same server cover 5 to 20 concurrent users. For an SME with 30 to 100 employees, that is realistic hardware territory.
Finally: the EU AI Act and GDPR have turned data sovereignty into a real compliance argument. Client-confidential data sent to US cloud providers requires more documentation and risk management than data that stays on your own Proxmox host. For law firms, accountants and financial advisers, that is the difference between "we use AI with full control" and "we use AI, and here is the DPA".
Three triggers that make self-hosted relevant
Not every SME should run self-hosted AI. Three situations make it worth considering. If none of them trigger for you, start with cloud and return to the question in 12 months.
Trigger 1: Highly sensitive data
Client-confidential cases (legal cases, audit engagements, financial plans), personal data, or anything falling under GDPR Article 9 (special categories) or industry confidentiality rules. If that data cannot leave your infrastructure without explicit client approval, self-hosted is typically the right choice for the specific use cases.
Nuance: cloud AI with EU residency and a documented DPA (Claude Enterprise, Microsoft Copilot for M365 Enterprise) can often meet the requirements. Self-hosted is simply easier to document and audit.
Trigger 2: Cloud spend above DKK 5,000 per month
If your cloud AI spend passes DKK 5,000 per month (DKK 60,000 per year), self-hosted starts to be economically competitive on a 3-year horizon. Hardware investment DKK 80,000 + operations DKK 15,000 per year + electricity DKK 12,000 per year typically land at DKK 150,000 over 3 years, versus DKK 180,000 to 290,000 cloud over the same period.
Note: cost numbers are not the only thing that counts. Self-hosted requires IT competence you need to have or buy. If you do not have it, cloud is economically clearly better even at higher usage.
Trigger 3: Compliance positioning
If your customers explicitly ask about data sovereignty, or industry standards require documented on-premise processing (certain public sector customers, financial institutions with heavy regulation), then self-hosted is part of your go-to-market story, not just an internal operational decision.
Example: a law firm bidding on framework agreements with ministries or defence suppliers may need to document that client data is processed on-premise.
If zero triggers are active: stay on cloud. If one trigger is active: consider hybrid stack (cloud + selective self-hosted for the specific workflows). If two or more are active: hybrid stack with the centre of gravity on self-hosted is typically the right answer.
Model landscape 2026
Three model families cover 95 percent of SME needs. They are all open weights and can be run locally.
Llama 3.3 (Meta)
Three sizes: 8B (lightweight, fast, runs on a single RTX 4090), 70B (default choice, requires 2x RTX 4090), 405B (only for large organisations with a dedicated GPU cluster). 70B is the right default for SMEs.
Strong on: English, summarisation, classification, code. Weaker on: nuanced European languages, complex reasoning. Licence: Llama 3 Community License (commercial use is fine for SMEs with under 700M monthly active users, that is no real constraint).
Mistral Large 2 and Mixtral 8x22B (Mistral AI, France)
Mistral Large 2 is proprietary but can be licence-run on-premise via Mistral Enterprise. Mixtral 8x22B is open weights and can be run freely. Both are strong on European languages.
Strong on: multi-language (good European-language quality), legal vocabulary, financial terminology. EU-based supplier, which simplifies the compliance story. Licence: Mixtral is Apache 2.0 (free commercial use).
Qwen 3 (Alibaba)
Three sizes: 7B, 32B, 72B. 32B is the sweet spot for performance/hardware balance. Open weights under Apache 2.0.
Strong on: structured data, code, mathematical reasoning. Weaker on: creative text generation. Political nuance: Chinese supplier, which some customers will perceive as a risk. For technical use cases (IT services, data analysis) it is often not an issue. Verify with the customer where relevant.
Recommendation for SME start: Llama 3.3 70B as the default choice. Add Mixtral 8x22B if European-language quality is critical for your use case. Add Qwen 3 32B if you have heavy structured-data or code use cases.
Hardware setup and cost comparison
Hardware requirements are no longer extreme. A realistic SME stack looks like this.
Server spec (SME level, 5 to 20 concurrent users):
CPU: AMD Ryzen 9 7950X or Intel Core i9-14900K
RAM: 128GB DDR5
GPU: 2x NVIDIA RTX 4090 (24GB VRAM each), 48GB VRAM total
Storage: 2TB NVMe SSD (models + cache)
Power supply: 1600W
OS: Proxmox VE 8 with GPU passthrough to LXC
Software stack: Ollama or vLLM + OpenWebUI + n8n + Qdrant + Langfuse
Purchase price: DKK 60,000 to 90,000 incl. VAT
Power consumption: 400 to 700W under load, 100W idle. About DKK 12,000 per year on electricity at typical SME use.
Operations cost: 4 to 8 hours per month of one IT person, which is DKK 15,000 to 25,000 per year if valued in cash.
3-year TCO comparison
For an SME with 15 users, moderate use (typical workplace with AI-augmented daily work):
Cloud (Claude Team, 15 seats):
Price: DKK 215 per seat per month x 15 = DKK 3,225 per month = DKK 38,700 per year
3-year total: about DKK 116,000
Cloud (Claude Enterprise, 15 seats):
Price: typically DKK 350 to 500 per seat per month x 15 = DKK 5,250 to 7,500 per month
3-year total: DKK 189,000 to 270,000
Self-hosted (server + operations):
Year 1: DKK 80,000 (server) + DKK 12,000 (electricity) + DKK 20,000 (operations) = DKK 112,000
Year 2: DKK 12,000 + DKK 20,000 = DKK 32,000
Year 3: DKK 12,000 + DKK 20,000 = DKK 32,000
3-year total: about DKK 176,000
The difference is not dramatic. Over a 3-year horizon Claude Team is marginally cheaper than self-hosted, but only for non-confidential use cases. If you have to move to Claude Enterprise (for DPA and data handling), self-hosted becomes significantly cheaper. Plus: you get data sovereignty and can use the stack for tasks that would otherwise have been excluded.
Hybrid stack: best of both worlds
For most knowledge-intensive SMEs the answer is not "cloud" or "self-hosted". It is hybrid. You use BOTH, and each task is routed to the right tool based on data sensitivity.
Typical hybrid routing rule:
Routes to self-hosted Llama 3.3 70B:
- Client-confidential cases (contract review, audit notes)
- Personal data under GDPR Article 9
- Internal strategic documents with high confidentiality
- Client-specific analysis where data cannot leave infrastructure
Routes to Claude Opus 4.7 or GPT-5:
- General research without confidential data
- Complex strategy deliberation (the model is stronger)
- Code assistance on non-proprietary code
- Marketing copy, communication drafts without confidential detail
Routing can be handled by n8n or a custom router agent. The user does not need to know which model is being used. They see one chat interface (OpenWebUI), and routing happens behind the scenes based on task type, document sensitivity, or explicit selection.
EnterpriseIQ runs on a hybrid stack itself: Llama 3.3 70B for client work, Claude Opus 4.7 for strategy and this pillar, Perplexity for research. See /en/ai-stack for details.
Industry examples
Law firm: contract review with full confidentiality
Self-hosted Llama 3.3 70B + OpenWebUI for contract review and client notes. Cloud Claude Opus for internal strategic research and employee announcements. Result: client data never leaves the Proxmox host, while the lawyer still has state-of-the-art AI for strategy deliberation.
Accounting firm: financial analysis with data sovereignty
Self-hosted Qwen 3 32B (strong on structured data) + Mixtral 8x22B (European-language quality) for audit notes and materiality assessment. Cloud GPT-5 for general professional research and publication work. Audit trail retained in Langfuse on the same Proxmox host as the models.
Financial advisory: portfolio analysis with GDPR Article 9 data
Self-hosted Mixtral 8x22B for portfolio reports and client communication (personal data + financial detail). Cloud Claude Opus for market analysis and strategic deliberation without client data. Compliance positioning: the customer can see in the DPA that their data does not leave the advisory firm's infrastructure.
IT services: code assistance and knowledge base
Self-hosted Qwen 3 32B (strong on code) + Llama 3.3 70B (general) for client-specific code assistance and internal knowledge base across all support tickets. Cloud Claude for complex architecture advice. Client-confidential system names and credentials never leave the stack.
Seven typical pitfalls
Pitfall 1: Self-hosted without IT competence
SMEs that buy a server and try to operate without Linux experience or GPU management know-how. Result: unstable operations, frustrated users, a quick return to cloud. Fix: either you have IT competence in-house, or you outsource operations (EnterpriseIQ retainer or another managed service). Self-hosted is not "cheaper" if it ends in IT chaos.
Pitfall 2: Believing open source equals free
The models are free to download, but hardware, electricity, operations and upgrades are not. Total cost of ownership for a self-hosted SME stack lands at DKK 120,000 to 180,000 in year 1 including hardware investment. That is reasonable compared with cloud, but it is not free.
Pitfall 3: Ignoring the model update cycle
Open source models are updated every 3 to 6 months with significant improvements. Self-hosted stacks that are not maintained quickly fall behind cloud competitors. Fix: schedule model updates every 3 to 6 months as part of the operations work. Test the new model against baseline before swapping it into production.
Pitfall 4: Expecting cloud parity on the first generation
Self-hosted Llama 3.3 70B is not 100 percent equivalent to Claude Opus 4.7. On complex tasks you lose about 10 to 25 percent in quality. Fix: hybrid stack where self-hosted is used where data sovereignty is decisive, cloud is used where state-of-the-art quality is decisive.
Pitfall 5: No audit trail
Self-hosted gives full control over logs, but only if you actively set them up. Langfuse or custom Prometheus/Grafana must be established from day one. Otherwise you end up with "we have self-hosted AI" but no evidence for an EU AI Act audit. Fix: audit trail is part of the pilot canvas, not something added later.
Pitfall 6: GPU purchase before model testing
SMEs that buy a server with 2x RTX 4090 based on blog recommendations without having tested that the models actually deliver on their use cases. Fix: run the pilot on cloud first (or on a rented GPU instance at Lambda Cloud / RunPod) for 2 to 4 weeks. Decide on self-hosted investment based on verified use case value.
Pitfall 7: Believing self-hosted alone solves compliance
The EU AI Act requires inventory, risk classification, audit trail and governance policy regardless of whether you use cloud or self-hosted. Self-hosted simplifies some of the documentation but does not replace the compliance work. Fix: self-hosted is part of the compliance strategy, not the entire strategy.
Three steps you can take this month
Step 1: Assess your trigger status
- Identify your top 5 AI use cases. How many involve client-confidential or personal data?
- Estimate your cloud AI cost if you rolled AI out to the entire team. Are you crossing DKK 5,000 per month?
- Do you have customers who ask about data sovereignty or industry standards that require it?
- Zero triggers: stay on cloud. One or more: move on to Step 2.
Step 2: Pilot on rented GPU cloud
- Rent a GPU instance at Lambda Cloud or RunPod (about DKK 1,500 to 3,000 for a 2-week test).
- Set up Llama 3.3 70B + Ollama + OpenWebUI on the instance.
- Test 3 to 5 of your actual use cases. Compare the output with Claude or GPT.
- Decide: is the quality good enough to justify on-premise investment?
Step 3: Hardware investment OR managed self-hosted
- If the pilot succeeded and you have the IT competence: invest in a server (DKK 60,000 to 90,000), set up Proxmox + Ollama stack, run the prioritised use cases on-premise.
- If the pilot succeeded but you lack IT competence: order managed self-hosted via EnterpriseIQ retainer (we run the hardware and operations, you pay a fixed monthly fee).
- If the pilot did not succeed: stay on cloud, come back in 6 to 12 months once the models have moved on.
FAQ
When does self-hosted AI make sense?
Three triggers: highly sensitive data that cannot leave infrastructure, cloud cost above DKK 5,000 per month, or compliance positioning that requires data sovereignty. Zero triggers: stay on cloud.
Which open source models are good enough in 2026?
Llama 3.3 70B (default), Mistral Large 2 or Mixtral 8x22B (strong on European languages), Qwen 3 32B (structured data and code). All at about GPT-4 level on ordinary tasks.
What does it cost?
Server DKK 60,000 to 90,000 + DKK 12,000 per year electricity + DKK 15,000 to 25,000 per year operations. 3-year TCO about DKK 150,000 to 180,000. Break-even versus Claude Team at about DKK 5,000 per month of cloud spend.
What is a hybrid stack?
Use BOTH self-hosted and cloud. Client-confidential tasks are routed to self-hosted Llama. Strategy and research are routed to Claude Opus or GPT-5. n8n handles routing based on data sensitivity.
How hard is it to operate?
Requires Linux server operations, GPU management, model updates. 4 to 8 hours per month if you have the competence. If not: outsource via managed self-hosted (EnterpriseIQ retainer) or stay on cloud.
Does self-hosted replace EU AI Act compliance work?
No. The EU AI Act requires inventory, risk classification and governance regardless of deployment. Self-hosted simplifies audit trail documentation but does not replace the compliance work.
Next steps
Three paths depending on where you stand:
See our AI stack
Openly documented hybrid stack. We show how we run on Llama plus Claude plus GPT in a routing flow.
Self-hosted pilot
4 to 6 weeks delivery: hardware recommendation, setup, pilot on your prioritised use case, plus 30-day measurement period.
30-minute conversation
No obligation. We work through your trigger status and assess whether self-hosted, cloud or hybrid is right for you.
About the author
Jesper Sachmann is the founder of EnterpriseIQ. 27 years of IT leadership across Oracle, Logica and Capgemini plus 11 years of Archer experience as Alliance Director Europe and Integrated Risk Management Lead Nordics, combined with hands-on self-hosted AI on Proxmox since 2023. The entire EnterpriseIQ business runs on a hybrid stack with self-hosted Llama plus cloud models in a routing flow.
AI attribution: This article is AI-assisted produced with Claude Opus 4.7, human review by Jesper Sachmann. See our AI transparency policy for how we use AI in every deliverable.
Citing this article? "EnterpriseIQ: Open source AI for SMEs (2026-05-27)" or link to enterpriseiq.dk/en/insights/open-source-ai-for-smes.