AI pilot projects: how to start
Most knowledge-intensive SMEs that want to get going with AI run aground on the same question. Where do we start? The right answer is rarely the first one proposed in the executive meeting. A structured pilot of 4 to 6 weeks with clear success criteria is the path with the highest likelihood of leading to a lasting AI practice. This pillar walks through how to prioritise use cases, design a pilot that can actually be measured, and how to avoid the seven typical pitfalls.
Written by Jesper Sachmann, founder of EnterpriseIQ. Pilot project experience from the Archer platform combined with hands-on work on n8n agent flows and custom RAG solutions on Proxmox since 2023.
- →A pilot is not a scaled-down implementation. It is an experiment with three critical assumptions: data quality, user willingness, output quality.
- →Use case prioritisation on three axes: impact, effort, risk. Score 1 to 5 on each, pick the highest combined score.
- →Pilot canvas: problem, solution, data, success metrics, risks, team, timeline. One A4 page, not a 30-page document.
- →Measure the baseline BEFORE the pilot starts. ROI is documented over 30 days with the pilot active versus the baseline.
- →Price: DKK 50,000 to 150,000, 4 to 6 weeks, 20 percent risk share on documented savings.
Why pilot rather than big bet
It is tempting to think that if you have decided to invest in AI, you should go straight to a full implementation. That is understandable. The pilot phase looks like a delay on the real project. But data from Danish SMEs' AI projects over the past three years tells a different story. Roughly half of the total AI investments that started without a pilot phase landed below 30 percent of the expected ROI. Roughly three quarters of those that started with a pilot delivered ROI within the expected band.
The difference is not the technology. It is the three assumptions every AI implementation rests on. Are you assuming your data is structured enough for AI to work with? Are you assuming your employees will use the solution in their daily workflows? Are you assuming AI output can reach the quality level your customers expect? Those three assumptions cannot be verified in the executive meeting. They have to be tested against reality.
That is what a pilot does. It costs DKK 50,000 to 150,000 and 4 to 6 weeks to gather concrete data on the three assumptions. If all three hold, the subsequent full implementation is significantly more likely to succeed. If one or more fall, you have saved a major investment and learned where the assumption needs to be reformulated before the next attempt.
Use case prioritisation: impact, effort, risk
The first real decision is which use case you pilot. That is also where most teams stall, because every department has 5 to 10 ideas that could be promising, and the executive team lacks a structure to choose between them.
Three-dimensional scoring works well. For each candidate use case, score on three axes with a 1 to 5 scale:
Impact (1 to 5)
How large is the saving or quality lift if the AI solution works as expected? 1 = marginal improvement on a small workflow. 5 = transforms a core process that consumes 30+ percent of a team's time.
Example: automated classification of incoming contracts (impact 4) versus improved internal search (impact 2 for a small team, impact 5 for a large team).
Effort (1 to 5, where 1 is lowest)
How hard is it to build the pilot? 1 = standard prompt library or simple n8n flow on existing stack. 5 = custom integration with 3+ systems, new data pipelines, or training on your own data.
Example: prompt library for contract review (effort 1) versus custom RAG over 50,000 historical cases (effort 4).
Risk (1 to 5, where 1 is lowest)
How significant is the consequence if the AI solution fails or produces poor output? 1 = internal workflow, errors are embarrassing but reversible. 5 = automated customer-facing decision with legal effect.
Example: draft of an internal email (risk 1) versus automated credit scoring (risk 5, actually EU AI Act high-risk).
Combined score = impact - effort - risk (higher is better). The use case with the highest score is typically your first pilot. Beware of the trap where you pick the most exciting use case (typically high impact but also high effort or risk) as the first pilot. That gives the worst chance of success. Pick the one that scores highest on the matrix rather than the one that sounds most interesting at the executive meeting.
Pilot Canvas on one page
Once the use case is selected, the pilot is formulated on a one-page canvas. It is not a 30-page document. It is the clear frame that ensures everyone holds the same expectations.
Use case name: [clear, concrete name]
Problem: What concrete problem is being solved? How many hours or errors does it cost today?
Solution: Which AI approach is applied? Which model, which stack?
Data: What data is sent in? How sensitive? GDPR considerations?
Success metrics: 2 to 3 measurable goals (time saved %, error rate, user satisfaction). Baseline values.
Three critical assumptions: Which three assumptions MUST hold for the pilot to succeed?
Risks: What could go wrong? What is the mitigation strategy?
Team: Sponsor (name), power user (name), IT contact (name). Max 3 people.
Timeline: 4 to 6 weeks build + 30-day measurement period. Milestone plan.
Decision point: After 30 days: scale, reformulate, or stop?
A good pilot canvas takes 60 to 90 minutes to fill out together with the leadership and the power user. If it takes 4 hours to get through, the use case is probably too diffusely formulated. Go back to the prioritisation matrix and pick something more bounded.
30-day measurement framework
The most frequent mistake on AI pilots is not the technology choice. It is that the pilot starts without a baseline, so ROI cannot be documented afterwards. So always establish the baseline BEFORE the pilot goes live.
A 30-day measurement period is the right level. Shorter gives datasets that are too small. Longer introduces bias from holidays, quarterly shifts or other outside factors. What you measure depends on the use case, but three types of metrics are typically relevant.
Time metrics
How many minutes or hours does a power user spend on the workflow the pilot covers? Measure before the pilot (baseline) on at least 5 to 10 cases. Measure after the pilot (30 days) on at least 20 cases.
Typical expectation: 30 to 60 percent time saving on the specific workflow.
Quality metrics
How many errors, reworks or customer complaints come out of the workflow? Compare baseline to pilot.
Important: AI can reduce certain error types and introduce new ones. Both count. Watch for shifts in the error pattern.
User metrics
How often does the power user actually use the pilot? How satisfied are they (NPS-style question)? Which friction moments do they report?
Strongest signal: if the power user stops using the pilot despite measurable time savings, something in UX or trust is not working. Investigate it.
After 30 days: make the decision. Three outcomes. 1) Pilot delivers ROI as expected, scale to the entire team. 2) Pilot shows partial ROI or one of the three critical assumptions did not fully hold. Reformulate and run an adjusted pilot for 14 days. 3) Pilot does not work, document why, pick the next use case from the prioritisation matrix.
Industry examples
Law firm: contract review assistant
Use case: AI reads incoming contracts and identifies deviations from the firm's standard clauses plus risk flags. Output is a 1-page report the attorney can use as a starting point for the full review.
Scores: impact 4 (saves 30 to 60 minutes per contract), effort 2 (prompt library plus n8n flow, no integration), risk 2 (attorney always reviews the final output).
Stack: Claude Team or Claude Enterprise with EU residency, n8n agent flow on Proxmox, prompt library version-controlled in Git. Typical pilot price: DKK 60,000 to 90,000.
Accounting firm: materiality assessment in audit planning
Use case: AI reads the prior year audit documentation plus the current year trial balance and proposes materiality thresholds plus indicators on work areas that should be prioritised. The auditor reviews and approves before final planning.
Scores: impact 4 (saves 4 to 8 hours per audit engagement), effort 3 (requires structured data input from the audit software), risk 3 (FSR standards apply, requires documented human review).
Stack: Claude Enterprise (client-confidential data), Python script for data extraction from audit software, audit-trail PDF per generation. Typical pilot price: DKK 100,000 to 130,000.
Financial advisory: client report generation
Use case: AI summarises the client's portfolio performance plus relevant market insights for the monthly or quarterly client report. The adviser adds the personal advice and approves.
Scores: impact 5 (report generation is often 40 to 60 percent of the adviser's administration), effort 2 (structured data from the portfolio system, prompt library), risk 3 (client-confidential financial data, GDPR Article 9 in some cases).
Stack: Claude Enterprise with DPA, integration with portfolio system via API, quality check loop with adviser review before distribution. Typical pilot price: DKK 80,000 to 120,000.
IT services firm: ticket prioritisation plus internal knowledge base
Use case: AI reads incoming support tickets and classifies them by urgency plus suggests relevant runbooks from the internal knowledge base. The support agent reviews prioritisation before it is executed.
Scores: impact 4 (shortens response time by 40 percent), effort 2 (RAG on existing knowledge base, n8n flow), risk 2 (internal workflow, no customer-facing automation).
Stack: Llama 3.3 70B self-hosted on Proxmox (data sovereignty), Qdrant vector DB, n8n routing flow to ticket system. Typical pilot price: DKK 70,000 to 110,000.
Seven typical pitfalls
The pattern recurs in pilots that did not land ROI. The pitfalls are not hard to avoid once you know them.
Pitfall 1: No baseline
The pilot starts without measuring the before state. After 30 days there are no concrete numbers to compare against, and ROI becomes a gut feel. Fix: spend a week before pilot start measuring the baseline.
Pitfall 2: Scope too broad
A pilot covering three departments or five workflows is not a pilot. It is a full project disguised as a pilot. Fix: one clear workflow, one team, one success metric.
Pitfall 3: Wrong sponsor
If the sponsor is a middle manager without authority to clear obstacles (IT access, GDPR approval, workflow change), the pilot stalls halfway through. Fix: sponsor at executive or partner level.
Pitfall 4: Tech first, problem second
The pilot starts with "we want to use Claude" or "we want to build a RAG portal" rather than "we want to solve problem X". That leads to solutions looking for a problem. Fix: pilot canvas starts with the problem, the technology choice comes after.
Pitfall 5: No power user involved
If the pilot is built without the employee who will actually use the solution, it ends up technically correct but unusable in practice. Fix: the power user is an active part of the pilot team from day one, not a recipient of the finished solution.
Pitfall 6: GDPR concerns ignored until go-live
Pilot is built and tested on synthetic or anonymised data. On the day before go-live it is discovered that the real data is client-confidential and cannot be sent to the chosen AI provider. Fix: data sensitivity is resolved in the pilot canvas BEFORE the build phase starts.
Pitfall 7: No decision point
After 30 days no one is responsible for making the decision: scale, reformulate, or stop. The pilot lives on without a clear purpose, consumes the power user's time and cannibalises momentum for the next use case. Fix: the pilot canvas includes an explicit decision point with date and owner.
How to scale from pilot
If the pilot delivers ROI as expected, the next question follows. How do we go from one power user's pilot to the entire team or the entire organisation?
Three paths, typically in this order:
Path 1: Same use case, full team (3 to 6 weeks)
The pilot goes from one power user to the entire team that uses the same workflow. Focus is training, documentation and bug fixes based on the experience from the 30 days. It is the most natural scaling and typically where 80 percent of the pilot value is realised.
Path 2: Fast-follow pilots on the same stack
Once one pilot works on a stack, you can build 2 to 3 related pilots on the same technical foundation without repeating the stack investment. For a law firm that landed the contract review pilot: the next fast-follow could be document summarisation or a legal research assistant. Each fast-follow takes 50 to 70 percent of the time the first pilot took.
Path 3: Platform consolidation (typically year 2)
Once you have 3 to 5 pilots in production, it makes sense to consolidate them onto a shared platform with common governance, audit trail, model management and user access. That is not something you plan on day one. It is something you do when platform fragmentation has begun to cost more than the consolidation investment.
Note: do not jump straight to Path 3 from the first pilot. Many SMEs fall for the "AI platform" narrative and build an infrastructure project rather than delivering concrete value. That path leads to IT projects that close without going to production. Live with 2 to 3 separate pilots for a year before the platform question becomes real.
Three steps you can take this week
Step 1: Take the EnterpriseIQ Score
- 5 minutes, free. Baseline on AI maturity across 6 dimensions.
- You get feedback on which dimensions are not yet strong enough for pilot.
- A score under 4 on a dimension like "data foundation" or "governance" means the pilot journey typically starts with closing that gap first.
Step 2: Hold a 60-minute use case prioritisation
- The executive team plus 1 to 2 key employees gather for a structured workshop.
- Brainstorm 5 to 10 candidate use cases (10 minutes).
- Score each on impact, effort, risk (30 minutes).
- Pick the top 2 for pilot consideration (20 minutes).
Step 3: Order either a Quick Scan or pilot directly
- If you are uncertain about maturity: Quick Scan (DKK 12,000 to 18,000, 1 day). You get a 10-page report plus a concrete pilot recommendation.
- If use case and prioritisation are clear: go directly to a pilot project (DKK 50,000 to 150,000, 4 to 6 weeks).
- Booking via /en/contact. We respond within 24 hours on business days.
FAQ
How long does a typical pilot take?
4 to 6 weeks build plus 30-day measurement period. Smaller, well-scoped pilots can be landed in 3 weeks. Pilots over 8 weeks are no longer pilots, they are full implementations.
What does a pilot cost?
DKK 50,000 to 150,000 depending on scope. 20 percent can be tied to documented savings in the 30-day measurement period, so we share the risk.
Which use case should we pick?
The one that scores highest on the impact minus effort minus risk matrix. Typically: document summarisation, classification, internal knowledge base, or draft generation.
What if the pilot does not work?
That is the point of a pilot. We design it so the three critical assumptions are tested early. If one fails, we learn why and suggest an alternative direction. That is not failure, it is learning.
Should the pilot be documented for the EU AI Act?
Yes, even when not high-risk. Every pilot becomes part of the AI system inventory with minimum documentation. Takes 1 to 2 hours per pilot.
Which employees should be involved?
Three: sponsor at executive or partner level, one engaged power user from the team, IT contact. Larger pilot teams slow things down.
Next steps
Three paths depending on where you stand:
Take the EnterpriseIQ Score
12 questions, 5 minutes. Baseline on AI maturity and which dimensions need to be closed before pilot.
AI Pilot Project
4 to 6 weeks delivery plus 30-day measurement period. 20 percent risk share on documented savings.
30-minute conversation
No obligation. We walk through the use case list and find the highest-score pilot for you.
About the author
Jesper Sachmann is the founder of EnterpriseIQ. 27 years of IT leadership across Oracle, Logica and Capgemini plus 11 years of Archer experience as Alliance Director Europe and Integrated Risk Management Lead Nordics, combined with hands-on pilot work on n8n agent flows and custom RAG solutions since 2023.
AI attribution: This article is AI-assisted produced with Claude Opus 4.7, human review by Jesper Sachmann. See our AI transparency policy for how we use AI in every deliverable.
Citing this article? "EnterpriseIQ: AI pilot projects for SMEs (2026-05-27)" or link to enterpriseiq.dk/en/insights/ai-pilot-projects-for-smes.