How to build a strong enterprise AI moat with context engineering and data estate
Strong models on weak data make weak enterprise agentic systems. Here's how to build a 3-layer framework to catch up before the window closes.
Context engineering is about to become your organization’s strategic advantage.
In 24 months, it will separate the AI winners from the dead. Model intelligence is commoditizing fast. Every competitor has access to the same frontier models you do. What they don’t have is your data, your governance, or your retrieval architecture. That combination is unique to you, and it directly shapes the quality of every decision your agents make for your organization.
I've been beating this drum for a while (previous edition here). This week I want to focus on the layer underneath context engineering, the one that decides whether your agents are smart or confidently wrong: your data architecture.
Most AI agent failures aren’t model problems. They’re data architecture problems.
You can run GPT-5.4 on garbage data and get confident, hallucinations, authoritative-sounding garbage back. I’ve seen it.
Your competitors are building on the same foundation models you are. The model is not the moat and never was. Your data layer is. Because that is what feeds into the context window of the models.
Organizations that figure out how to give agents the right data, at the right time, in the right shape are about to pull so far ahead that companies still arguing over which LLM to use won’t even be in the same conversation.
This week’s edition is about that gap.
The team with perfect models and broken agents
Last year, I was advising a large fortune 500 logistics company. Call the lead architect Marcus. Marcus’s team had done everything right on the model side: latest frontier models, fine-tuned retrieval, sophisticated prompt templates.
Six months into production, their AI agent was still giving customers wrong shipping estimates, hallucinating carrier policies, and occasionally pulling refund rules that had been deprecated two quarters ago.
The post-mortems kept pointing to the model. The team kept tweaking prompts. Nothing moved.
When I sat down with their data team, I found the real problem in about 20 minutes. Carrier rate tables lived in three different systems with no sync schedule.
Return policies were documented in a SharePoint site no one governed. Inventory data flowed through a pipeline with a 48-hour lag.
The agent wasn’t broken. It was operating on a broken data estate.
Marcus’s team had spent six months optimizing the wrong layer.
The framework: the three-layer context stack
Here is how I think about preparing an organization for agents. There are three layers that must all be healthy before agents can deliver value. I call it the Context Stack.
Get all three right, and your agents become smarter as your organization grows. Get any one wrong, and you’re building a performance ceiling you’ll hit faster than you expect.
The three layers are:
Unified data foundation: One governed, trusted source of truth for enterprise data
Retrieval architecture: How agents find and pull the right information at the right time
Governance and documentation: Who authorized what access, and why
Let’s go through each one.
Layer 1: Unified data foundation
The first thing agents need is a single, trusted place to find information.
Not 14 databases, 3 SharePoint sites, and a folder on someone’s laptop that hasn’t been touched since 2023.
Fragmented data leads to fragmented answers. When agents synthesize information from multiple ungoverned sources, inconsistencies compound. One source says the refund window is 30 days. Another says 14. The agent picks one. It is wrong half the time. And now your agent is undermining trust with customers.
The architectural answer is a unified data platform. In the Microsoft ecosystem, this is Microsoft Fabric OneLake. Business units create governed data products that live in domain-specific workspaces. Those data products become the canonical input for any AI workload. The agent doesn’t touch raw systems. It touches certified data products.
This is not a new idea. Data mesh and data product thinking have been in the enterprise lexicon for years.
What’s new is that agents make the stakes existential. Bad data used to produce bad reports. Now it produces bad decisions made at machine speed, at scale, without a human review cycle.
Layer 2: Retrieval architecture
Once your data foundation is unified, you need a deliberate strategy for how agents actually retrieve information. This is where most teams take shortcuts that hurt them later.
There are two primary retrieval patterns, and they are not interchangeable.
RAG (retrieval-augmented generation) is what most teams reach for first. An agent queries a vector index, retrieves relevant chunks, stuffs them into a prompt. This works well for static or semi-static content: policy documents, product FAQs, onboarding guides. The content doesn’t change minute to minute, and search relevance is the key quality lever.
MCP (Model Context Protocol) is the right tool when agents need to take actions or access live system state.
Choose MCP when agents must take actions or access real‑time data.
“How many units of SKU123 are in stock right now?” is not a search question. It’s an API call to an ERP system.
“Create a support ticket for this customer” is not retrieval. It’s a write operation.
MCP gives agents a standardized way to call tools hosted on remote servers, with identity, policy, and audit controls baked in.
The mistake I see repeatedly: teams reach for RAG everywhere because it’s simpler to set up, then wonder why their agent keeps giving stale inventory data. The answer isn’t better embeddings. The answer is the wrong retrieval pattern.
A practical decision guide:
For organizations on the Microsoft platform, the built-in retrieval options (Foundry IQ, Fabric IQ, Fabric data agents, SharePoint connection, Azure AI Search, OneLake search indexers) should be your default.
Build custom retrieval solutions only when the built-in options genuinely cannot meet your regulatory or operational constraints. Most teams build custom too early.
Agentic retrieval is where this gets interesting at scale. Instead of a fixed retrieval pipeline, the agent plans and executes its own search strategy.
It might run multiple searches, synthesize partial answers, and decide whether to make an additional API call based on what it found.
Foundry IQ is designed to be the control plane for this pattern: a unified knowledge base endpoint over one or more knowledge sources, with consistent governance and citation behavior across workloads.
Layer 3: Governance and documentation
This is the layer teams skip. It is also the layer that makes everything else sustainable.
Retrieval decisions determine three things at once: what answers are even possible (accuracy), who can see what data (security), and how badly things break when they go wrong (operational risk).
Treating retrieval as an implementation detail is how teams end up with agents that confidently leak data or take destructive actions on behalf of the wrong user.
Without explicit documentation, you end up with shadow integrations, undocumented data flows, and compliance reviews that take weeks because nobody can explain what the agent actually touches.
Here is a simple documentation template I recommend:
This table becomes your retrieval contract. It is the artifact that lets compliance teams audit what agents touch. It is what you hand to a new engineer onboarding to the platform. It is what you review when something goes wrong.
Security is not optional here. Every MCP tool call requires authentication. Prefer identity passthrough (where the agent acts on behalf of the user’s identity) when user-level permissions must persist.
Use RBAC on both the platform and the target service. Audit every tool invocation.
Microsoft Entra Agent Identity makes this tractable on Azure, but the discipline has to be organizational, not just technical.
How I’ve applied this
When I build agent architectures for enterprise clients, the first thing I do is a data estate audit, not a model selection exercise. I want to know: where does the authoritative data live? Who owns it? How fresh is it? What governance is in place?
The unified data platform conversation typically takes longer than any conversation about models. It should. Models are a commodity. Your data estate is not.
I’ve watched organizations skip this step and then spend 18 months retrofitting governance onto agent systems that were already in production. That is significantly harder than building the foundation correctly at the start.
The organizations that will have an AI moat in three years aren’t the ones who picked the best model in 2025. They’re the ones who spent 2025 and 2026 building a governed, unified, retrieval-ready data estate.
♻️ If this was useful, share it with someone building with AI.
✉️ Subscribe at newsletter.karuparti.com so you never miss an edition.
P.S. Want more? 👋
1/ My visual guide to agentic AI → Gumroad
2/ Deep dives on agentic AI architecture → LinkedIn
3/ Real-time takes on breaking AI news → X
4/ Casual hot takes and community → Threads
5/ Visual frameworks and carousels → Instagram
6/ 60-second production lessons → TikTok
7/ The full newsletter, free → newsletter.karuparti.com
References
Microsoft Cloud Adoption Framework, Data architecture for AI agents across your organization
Microsoft Cloud Adoption Framework, Unify your data platform
Microsoft Foundry, Foundry IQ knowledge retrieval
Microsoft Foundry, Model Context Protocol tool
Microsoft Foundry, Authentication support for MCP tool
Microsoft Fabric, Fabric IQ overview
Disclaimer: The story and scenario in this article are hypothetical, inspired by patterns observed across similar real-world experiences. They are used to convey key concepts more effectively and do not represent any specific individual or organization.






