
TL;DR
- Salesforce has hundreds of APIs, but none of them answer the questions that matter most when you're actually changing your org. "What references this field?" "Which of these flows is actually doing the work?" "What breaks if I deprecate this Apex class?" There's no endpoint for those.
- These are L3 questions and they need the whole org as context, not a few files. The answer doesn't exist as a thing you can fetch. It has to be built.
- AI tools that call APIs in real time hit this wall fast. A deterministic metadata graph is the difference between an agent that guesses and one that knows.
*****
Try this in your org right now.
Ask any AI assistant connected to Salesforce — Claude with the DX MCP, Cursor with a custom integration, whatever you've got wired up: what references the Industry field on Account?
What you'll get back is some subset of the answer. It might list a few flows. It might mention a validation rule. It might surface a couple of Apex classes. The answer will sound confident.
It will also be incomplete, and there's no way for you to know how incomplete without checking the work yourself.
The reason for this is that there’s no Salesforce API for the question you just asked. None of the platform's hundreds of endpoints — REST, SOAP, Tooling, Metadata, UI, Bulk — returns "everywhere Industry is used." The answer doesn't exist as a fetchable thing. It has to be assembled by walking every flow, every validation rule, every Apex class, every layout, every permission set, every managed package — and connecting them.
That's an L3 question — the third tier in the four-layer framework for how AI handles Salesforce work:
- L1 is self-contained (Claude alone is fine).
- L2 is multi-element retrieval (MCP territory, with limits).
- L3 is the whole-org layer, and it's where most production Salesforce work actually lives.
- L4 is change over time.
What an L3 question looks like
The pattern is the same across every L3 question worth asking. You want to know how the org behaves as a system, not what's in a singular file.
- What updates Opportunity Stage automatically?
- Which fields on Account are unused?
- What breaks if I deprecate this Apex class?
- Where is the duplicate logic across our routing flows?
- Which permission sets actually grant access to Revenue__c?
- What does this managed package touch in our org?
- Which automations fire on Lead conversion, in what order?
None of these are exotic. They're the questions every admin, architect, and ops lead asks during a typical week. Every one of them requires the whole org as context, because the answer is distributed across dozens of components that don't know about each other.
There is no Metadata API call for "all unused Apex classes." There is no SOQL query for "every label with zero references." There is no Tooling API endpoint for "what does this flow actually do at runtime, factoring in entry criteria and the three Apex triggers it eventually fires."
What Salesforce exposes is the raw material. The answer, as it turns out, has to be built.
Why retrieve-on-demand can't do it
When an AI agent answers an L2 question, the pattern works fine: figure out which API to call, call it, read the response, answer. One trip, maybe two.
L3 breaks that loop. There's no single API to call, so the agent has to retrieve dozens of components and assemble the dependency map on the fly. Two problems with that.
First, the agent doesn't know what to retrieve. Without a pre-built dependency graph, it's guessing which flows might touch the field, which Apex classes might reference it, which validation rules might check it. It pulls the ones it can name and stops there. Anything it doesn't think to ask for, it doesn't find.
Second, even if it tried to "just download everything," the math doesn't work. A real enterprise Salesforce org is hundreds of megabytes to gigabytes of metadata. The context window of even a top-tier model holds about a megabyte of text at a time. You can't fit the org in the agent's head. It has to pick and choose what to retrieve, and it picks blind.
Result: confident, partial answers. The most dangerous failure mode in production work, because the person asking the question can't tell what's missing.
A recent example
A customer call from last month makes this concrete.
The team asked their internal Claude + CLI MCP setup: "find me the flows that use these specific record types."
It returned ten flows. The team trusted the answer enough to start scoping a migration around it. When we ran the same question through Sweep — same org, same data — we returned four flows. The other six were false positives, pattern-matched on flow names that contained the record-type keyword without actually using it in the flow's logic.
Four real flows versus ten "found" flows isn't a 60% accuracy gap. It's a structural difference in how the answer gets produced. The retrieve-on-demand approach can find candidate flows by name. It cannot validate that they actually use the record type, because validating that requires parsing every flow's XML and walking the references — which is L3 work, not L2 work.
That gap won’t actually shrink as MCP tools get better. It’ll shrink as the graph underneath gets built.
The Opportunity Stage walkthrough
Let me do the full one, because it's the cleanest case study for L3 in practice.
Ask: what updates Opportunity Stage automatically?
The candidate sources, in a real enterprise org:
- Workflow field updates (legacy but still live in most orgs that haven't migrated)
- Approval processes with stage-updating actions
- Process Builder processes (also legacy, also still live)
- Record-triggered flows on Opportunity
- Scheduled flows
- Auto-launched flows called from other flows or triggers
- Apex triggers on Opportunity
- Apex classes invoked from triggers, processes, or future jobs
- Managed package automation (CPQ, Conga, FormAssembly, etc.)
- Lightning components or LWC that update stage via Apex controllers
- External integrations writing through the API
- Outbound messages or platform events that trigger downstream Apex
Even an exhaustive list isn't enough — you also need to know the order they fire in, whether any of them suppress others, and which ones are inactive but still deployed.
There is no API that returns "everything in the above list that touches Opportunity Stage." There never will be one, because the list isn't a thing Salesforce stores — it's an emergent property of how all those components reference the field.
The only way to answer the question is to parse every component, extract every reference to the Stage field, deduplicate, classify by type, and present them in execution order. That's the metadata graph.
What "the graph" actually is
Though perhaps at first blush it may sound like it, a metadata graph isn't a high-falutin’ way to say "a list of components." It's a deterministic, queryable model of how every piece of the org references every other piece.
In Sweep's case: every object, field, flow, Apex class, trigger, validation rule, approval process, permission set, profile, layout, record type, managed package component, and CPQ rule — parsed into a dependency graph that traces forward (what does this touch?) and backward (what touches this?). When you ask an L3 question, the graph already has the answer. It doesn't go fetching.
That's the structural difference. Retrieve-on-demand starts from scratch every time. The graph is built once and kept current as the org changes.
The downstream effect: L3 questions become fast and complete. "What updates Opportunity Stage" returns the full list — workflow field updates, approval processes, flows, Apex, managed packages, the lot — because every reference is in the graph and the question is a traversal, not a search.
And because the graph is deterministic, the same question returns the same answer twice. With retrieve-on-demand, you can ask the same question of the same MCP setup and get different lists depending on which APIs the agent decided to call this time. That's not a feature.
What this changes for AI
The point isn't that AI fails at Salesforce. AI is genuinely useful at L1 and L2. It struggles at L3 not because the model is weak but because the architecture is wrong for the work.
The fix isn't a better prompt or more MCP tools. The fix is to put a graph between the AI and the org, so when an agent asks an L3 question, it gets the full answer back instead of a confident partial one. That's what Sweep's MCP server does — Claude, Cursor, Agentforce, whatever client you're using gets the graph as context instead of guessing at it through raw API calls.
The agentic layer for enterprise systems is just a name for that — the layer where AI gets enough context to do L3 work safely. Discover, design, build, all running against the same graph.
The test
If you want to know whether your current setup serves L3 questions, run this in your own org…
Ask whatever AI tool you've got connected: list every automation, validation rule, Apex class, and managed package component that touches Opportunity Stage. Include execution order.
Then check the answer against your org by hand — or against Sweep, if you want a faster check.
The gap between what came back and what's actually there is the gap between L2 and L3 in your stack right now. It's also the gap between AI that helps you ask better questions and AI that helps you ship real architecture work.


