
For its part, as AI, Agentforce exposes the failure modes that have always been in your Salesforce.
Over the years, Salesforce systems have relied on the elasticity of the human brain. When something breaks, it’s a person who must compensate. A missing field gets filled in manually. A failed validation gets worked around manually. A flow misfires, someone retries it manually.
These failures remain local, contained, and often invisible.
Agentforce removes that buffer. Entirely.
An agent executes exactly as instructed. It touches more objects, triggers more automations, and does so repeatedly and concurrently. That means every hidden dependency — permissions, validation rules, flows, data inconsistencies — becomes load-bearing. And when one of those breaks, the system fails, or it produces an incorrect output.
What follows is a cataloging of those failures, with their real error messages, their actual root causes, and what it takes to fix them in a production environment.
Setup and enablement errors
The first class of failures appears before an agent ever runs.
You’ll encounter errors like:
- “Agentforce is not enabled in your organization”
- “Agents are still being provisioned”
- Missing “+New Agent” button
- Missing “New Topic” button
These aren’t actually logic errors. They’re dependency mismatches between platform layers.
Agentforce depends on Einstein, Data Cloud, and multiple backend services initializing correctly. If any layer lags or fails silently, the UI partially renders or blocks access.
How to solve it
Resolution here requires sequencing, not troubleshooting.
You must explicitly enable:
- Einstein Setup
- Agentforce Agents (including the default agent toggle)
- Data Cloud / Data 360
If errors persist:
- Toggle Agentforce off and back on
- Hard refresh the UI
- Verify language settings (non-English can hide UI elements)
The error “Agents are still being provisioned” often reflects backend initialization delays. In new orgs, this can take hours. There is no workaround beyond simply waiting or retriggering provisioning.
These errors are deterministic. Once resolved correctly, they do not recur.
Agent Builder, topic, and action configuration errors
Once setup completes, failures shift into configuration.
Common errors include:
- “Topic Overlap Detected”
- “Configuration Issues Detected”
- “We couldn’t retrieve the action’s output”
- UNKNOWN_EXCEPTION (Knowledge / Data Library)
These errors arise when the agent cannot reliably classify a request or cannot successfully execute the action it selects.
“Topic Overlap Detected”
This occurs when multiple topics contain similar classification language. The Atlas reasoning engine cannot confidently assign intent.
How to solve it
Topics must function as routing boundaries, not descriptions.
Each topic needs:
- Mutually exclusive scope
- Explicit “out of scope” instructions
- Clear intent signals
Exceeding ~13 topics introduces classification degradation even before the hard limit of 15.
This becomes difficult at scale because topic clarity depends on understanding how actions, objects, and downstream automations interact. Without visibility into that system, teams tend to write overlapping abstractions.
“Configuration Issues Detected”
This error appears during activation when:
- Actions reference inactive flows
- Required inputs are unmapped
- Output schemas are incomplete
How to solve it
Every action must be treated as a contract, so, you need to validate:
- Flow activation state
- Input mappings
- Output availability
The challenge here will be ensuring consistency across all actions. In larger agents, this becomes a systemic audit problem rather than a configuration tweak.
“We couldn’t retrieve the action’s output”
This is one of the most common runtime configuration failures.
It occurs when:
- A flow fails internally
- The flow returns null
- Output mapping is incorrect
- The Agent User lacks access to required data
How to solve it
You must debug outside Agentforce first.
Run the flow in Flow Debugger. Confirm output values. Then log in as the Agent User and execute the same flow.
If it fails only under the Agent User, the issue is permissions—not logic.
Understanding which objects and fields the flow touches — including related objects and formula dependencies — becomes critical here. That dependency tracing is non-trivial in complex orgs and is exactly where platforms like us (Sweep) replace manual investigation with a full dependency graph.
UNKNOWN_EXCEPTION (Knowledge / Data Library)
This error commonly appears when integrating Knowledge or Data Library.
It is triggered by:
- Missing Knowledge permissions
- Lack of “Allow View Knowledge”
- Data Cloud dataspace access issues
- Data Streams in error state
How to solve it
Resolution requires rebuilding trust between systems:
- Create a new Agent Data Library
- Verify Knowledge object + field access
- Assign required permissions
- Retry Data Streams (sometimes repeatedly)
In new orgs, Data Streams may remain in error for extended periods. This is not always fixable immediately and may require waiting for backend processes to stabilize.
Runtime execution errors
These errors occur during live conversations and represent the most critical failure surface.
System.LimitException: Apex CPU time limit exceeded
This is the most dangerous error in Agentforce. It occurs when the cumulative execution time of all triggered automations exceeds the 10-second limit.
The key insight: Agentforce actions do not run in isolation.
When an agent creates a Case, it triggers:
- Apex triggers
- Flows
- Validation rules
- Workflow rules
Each of those may trigger additional automations. In chained actions, this compounds rapidly.
How to solve it
This is not a fixable error in isolation. It requires a full automation audit.
You must identify:
- Every automation on each object the agent touches
- The order of execution
- Cross-object dependencies
Then:
- Consolidate redundant automations
- Replace after-save flows with before-save flows where possible
- Introduce Apex circuit breakers (Limits.getCpuTime())
- Test under concurrency (not single-user testing)
The difficulty is, and will continue to be, visibility.
Most teams cannot easily answer: what fires when this record changes?
That’s precisely the problem Sweep addresses — surfacing all automations and dependencies across objects instantly, rather than requiring manual tracing across flows, triggers, and metadata relationships.
Without that visibility, CPU issues become iterative guesswork.
System.LimitException: Too many DML statements: 151
This occurs when flows or triggers perform excessive database operations, often inside loops.
How to solve it
The root cause is almost always non-bulkified logic.
Resolution requires:
- Removing DML operations from loops
- Aggregating operations
- Refactoring flows and Apex
Again, the challenge lies in identifying where these patterns exist across the automation landscape.
FIELD_CUSTOM_VALIDATION_EXCEPTION
This error occurs when validation rules block agent-created or updated records. Validation rules assume human input context. Agents do not operate under those assumptions.
How to solve it
Validation logic must become agent-aware.
Typical patterns include:
- Bypass conditions for Agent User
- Custom permissions for agent execution
- Moving required fields to schema-level enforcement
The actual work will be identifying all validation rules that affect agent actions — not only on primary objects, but on related updates triggered downstream.
This requires mapping validation rules across object relationships, which again becomes a metadata visibility problem at scale.
Hallucination and grounding failures (no explicit error)
These are silent failures.
The agent returns incorrect information due to:
- Duplicate records
- Missing fields
- Inconsistent picklist values
How to solve it
There is no configuration fix.
Resolution requires:
- Deduplication
- Standardization
- Knowledge maintenance
Clean data reduces hallucination rates dramatically. Dirty data guarantees incorrect outputs.
Permission and data access errors
These represent the most common production failures.
Errors include:
- INSUFFICIENT_ACCESS_ON_CROSS_REFERENCE_ENTITY
- “Looks like you don’t have access to the data space”
- Silent FLS failures (no error, null data)
- INSUFFICIENT_ACCESS_OR_READONLY (Query Records action)
How to solve it
The critical step is always the same:
Log in as the Agent User.
Then test every action.
Permissions must be explicitly granted for:
- Objects
- Fields
- Related objects
- Lookup relationships
- Formula dependencies
The difficulty lies in understanding the full surface area of what an agent touches.
An action may query one object but depend on multiple related objects and fields. Without mapping those dependencies, permission gaps remain hidden.
This is another area where Sweep provides immediate value — showing the full object and field dependency graph so permissions can be assigned correctly in one pass rather than through repeated failures.
Agent API errors
These are the rare category of errors that behave like you’d expect software to behave: they fail fast, return a code, and point (more or less) to the problem.
You’ll typically see:
- 400 — “{VALUE} is not a valid agent ID”
- 401 — “Connected app is not attached to Agent”
- 404 — Endpoint not found
- 500 — “EngineConfigLookupException”
- 500 — “HttpServerErrorException”
- 500 — “Unsupported Media Type”
- 429 — Rate limit exceeded
What makes these deceptive is that they look simple—but they often sit at the boundary between multiple systems: your connected app, your org domain, your agent configuration, and Salesforce’s internal routing.
How to solve it (without guessing)
Start by treating API errors as identity + routing + contract problems.
400 — “{VALUE} is not a valid agent ID”
This error almost always comes down to using the wrong identifier format.
Salesforce uses multiple IDs across contexts (15-character, 18-character, UI IDs, internal IDs), and the Agent API is strict.
What to check:
- You’re using the 18-character agent ID
- The ID starts with the correct prefix (typically 0Xx)
- You copied it from the Agent Overview page URL, not from UI labels or metadata exports
Where people go wrong:
They grab an ID from a list view, a debug log, or a truncated UI element. It looks right. It isn’t.
401 — “Connected app is not attached to Agent”
This isn’t an auth failure in the traditional sense. It’s a relationship failure.
Your connected app exists. Your agent exists. They just aren’t explicitly linked.
What to check:
- The connected app is added under the agent’s Connections tab
- OAuth scopes match what the agent expects
- The token you’re using was issued for that connected app
Where this breaks in practice:
Teams configure the connected app correctly at the org level—but never attach it to the specific agent instance. The API call authenticates… and then gets rejected anyway.
404 — Endpoint not found
This is almost always a domain mismatch.
Salesforce APIs are extremely sensitive to which domain you use.
What to check:
- You’re using your My Domain URL, not a generic Salesforce endpoint
- The endpoint path matches the Agent API spec exactly
- You’re not mixing sandbox and production domains
Common failure pattern:
The token comes from My Domain, but the request goes to a generic instance URL. Authentication succeeds. Routing fails.
500 — “EngineConfigLookupException” / “HttpServerErrorException”
These are the least helpful errors—but still diagnosable.
They usually indicate:
- The agent ID exists but doesn’t match the environment or endpoint
- The request is hitting the wrong internal configuration layer
What to check:
- Agent ID matches the org you’re calling
- Endpoint matches the environment (sandbox vs prod)
- Agent is fully activated and provisioned
If everything looks correct, this is one of the few cases where you may actually be hitting a platform-side inconsistency.
500 — “Unsupported Media Type”
This one is straightforward, but it still trips people up.
Fix:
Content-Type: application/json
If you’re using a client or middleware, confirm it’s not silently overriding headers.
429 — Rate limit exceeded
This is where things stop being purely configurational and start becoming architectural.
What’s happening:
You’ve exceeded the Models API or Agent API throughput limits for your org.
What to implement:
- Exponential backoff retries
- Request queuing
- Throttling at the client layer
What not to do:
Retry immediately in a tight loop. You’ll amplify the problem and burn through limits faster.
The takeaway on API errors
These errors are deterministic—but only if you isolate variables.
If you try to debug them inside a full system (agent + flows + integrations), they blur together. The correct move is to reduce the problem:
- Validate the endpoint independently
- Validate the agent ID independently
- Validate the connected app independently
Then recombine.
Unlike runtime failures, these don’t require deep system understanding. They require precision and isolation.
Integration and deployment errors
If API errors fail loudly and runtime errors fail catastrophically, integration and deployment errors fail ambiguously.
They sit between systems. They don’t always throw explicit errors. And they often present as:
- The agent not responding
- The agent responding without context
- Escalation not triggering
- Messages never reaching the agent
- Sessions opening but doing nothing
These are the hardest issues to debug because the failure doesn’t live in one place. It lives in the seams between systems.
Agentforce integrations—especially Messaging for Web, Experience Cloud, Slack, and escalation flows—are not single configurations. They are chains. And the system only works if every link in the chain is correct.
Messaging for Web (Experience Cloud) failures
This is the most failure-prone integration surface in Agentforce.
On paper, it’s a simple goal: user opens a chat → agent responds.
In reality, that interaction depends on at least 10 separate configuration layers working together:
- Messaging Settings
- Omni-Channel Routing
- Queues
- Routing Configuration
- Omni-Channel Flow
- Messaging Channel
- Embedded Service Deployment
- Experience Site
- CORS / Trusted URLs
- Agent assignment
You can get 9 of these right and still have a completely dead experience.
The most common failure: silent routing breakdown
The most dangerous failure mode here is no error at all.
A user opens the widget. A session is created. Nothing happens.
This almost always traces back to one issue:
Context is not being passed correctly into the Omni-Channel Flow.
And that usually means one thing:
The recordId problem
The system expects a variable named recordId to exist and flow through multiple layers.
If any of the following are wrong, routing fails silently:
- The variable doesn’t exist
- The variable is not Text type
- The variable is not marked Available for Input
- The variable name doesn’t match exactly across systems
- The Experience Site script passes a different name (record_id, RecordId, etc.)
This is not validated anywhere. There is no warning. It just doesn’t work.
How to solve Messaging for Web (the right way)
Do not debug this live. Validate it layer by layer.
Step 1: Confirm channel activation chain
Start at the top:
- Messaging is enabled
- Messaging Channel is created and activated
- Embedded Service Deployment is published
If any of these are inactive, nothing downstream matters.
Step 2: Validate Omni-Channel routing
Then move to routing:
- Routing Configuration exists and has capacity
- Queue exists and supports Messaging Sessions
- Agent (or bot) is assigned to the queue
If routing has nowhere to go, the system stalls silently.
Step 3: Inspect Omni-Channel Flow
This is where most issues actually live.
Check:
- Flow is activated
- Flow includes Route Work element
- Route target = Bot (Agentforce agent)
- Flow variables include recordId
Then inspect variable definitions:
- Name: recordId (exact match)
- Type: Text
- Available for Input: true
Step 4: Trace variable propagation
Now check the full chain:
- Experience Site script → passes recordId
- Messaging Channel → receives it
- Omni-Channel Flow → consumes it
Every step must use the exact same name.
This is where most teams fail—not conceptually, but syntactically.
Step 5: Validate environment alignment
Finally:
- Domain matches (Experience Site vs My Domain)
- CORS allowlist includes the site
- Trusted URLs configured
A mismatch here can block communication entirely.
The pattern
Messaging failures are rarely about logic.
They are about alignment across systems that do not validate each other.
That’s why they’re so hard to debug.
Escalation and handoff failures
These failures are subtle—and often discovered only in production.
Symptoms include:
- Agent never escalates
- Escalation triggers but no human picks up
- Session ends instead of routing
- User gets stuck in a loop
Where escalation actually breaks
Escalation depends on:
- Agent configuration (Escalation topic exists)
- Omni-Channel routing (flows + queues)
- Human agent availability
- User permissions (Service Cloud User checkbox)
Failure in any of these layers breaks the experience.
How to solve it
You need to treat escalation as a separate system, not a feature.
Step 1: Confirm escalation topic exists
No topic = no escalation path.
Seems obvious. Frequently missed.
Step 2: Validate outbound routing
Check:
- Omni-Channel Flow includes escalation logic
- Flow routes to a queue with human agents
- Queue has active users
If no agents are online, escalation fails silently.
Step 3: Verify human agent configuration
Each human agent must have:
- Service Cloud User enabled
- Correct Omni-Channel presence status
- Access to relevant objects
Without this, routing completes—but no one receives the work.
Step 4: Build fallback logic
This is where most implementations fall apart.
If no agents are available:
- What happens?
- Does the session end?
- Does it retry?
- Does it log a case?
Without fallback logic, escalation becomes a dead end.
The takeaway
Escalation is not a handoff. It’s a second routing system layered on top of the first. If you don’t design it intentionally, it fails silently.
Slack integration errors
Slack introduces a different class of problem: identity and cross-system mapping.
Common errors:
- “Error on retrieving the token”
- Agent not appearing in Slack
- Messages not reaching agent
- Users unable to interact
Where Slack integrations break
Slack integrations depend on:
- Salesforce users
- Slack users
- Connected apps
- Agent configuration
All four must align.
How to solve it
Step 1: Map users across systems
The Agent User must:
- Exist in Salesforce
- Exist in Slack
- Be correctly mapped between the two
If this mapping fails, token retrieval fails.
Step 2: Validate permissions and licenses
Check:
- Slack integration permissions
- Required Salesforce licenses
- Access to Agentforce features
Slack users without Salesforce context often require provisional access.
Step 3: Verify agent eligibility
Agents must:
- Include “Agent” in label
- Be available for Slack integration
- Be properly configured in connections
If the agent doesn’t meet Slack’s criteria, it won’t appear—even if everything else is correct.
The real pattern
Slack failures are not about messaging.
They’re about identity resolution across two systems with different models.
And when that mapping breaks, everything downstream fails.
The deeper issue with integration errors
Every integration error in this section shares the same root cause:
You are operating across multiple systems that do not share a unified model of truth.
- Variables must match manually
- Users must map manually
- Flows must align manually
- Domains must align manually
There is no system that validates the entire chain for you.
Which means debugging becomes:
- Checking one layer
- Then the next
- Then the next
Until something clicks.
And in larger orgs, where flows, objects, and dependencies multiply, even understanding what should happen becomes difficult.
That’s where system-level visibility becomes essential.
Because when an integration fails, it’s rarely just the integration.
It’s how that integration interacts with everything else in your org.
The resolution model (what actually fixes Agentforce errors)
By the time you’ve worked through enough of these errors, a pattern does really emerge — but it’s not the neat five-bucket model people like to present.
Yes, every failure can be categorized as one of the following:
- Permissions
- Data
- Automation
- Configuration
- Platform limits
But in practice, almost no failure belongs to just one of these.
A System.LimitException: Apex CPU time limit exceeded error looks like an automation issue until you realize it’s triggered by a permission-expanded query pulling in more records than expected. A FIELD_CUSTOM_VALIDATION_EXCEPTION looks like a validation problem until you trace the upstream automation that populated the field incorrectly. A hallucination looks like an AI problem until you find duplicate records and inconsistent picklists underneath it.
These are entangled systems. Which means success in debugging Agentforce is tied to your ability to trace the chain.
When something breaks, the only reliable approach is sequential:
Start with permissions. Log in as the Agent User and attempt the exact action. If it fails there, you’ve found your answer. If it succeeds, move on.
Then look at data. Not abstractly—specifically. The exact records involved. Are there duplicates? Missing fields? Conflicting values? If the agent is grounding on bad data, everything downstream becomes unreliable.
Then follow the first flow/ trigger and everything that fires afterward. What updates what? What dependencies exist across objects? What gets recalculated, revalidated, or retriggered?
This is the step where most teams stall — because the system is opaque. Salesforce does not provide a native way to see the full cascade of automations and dependencies in one place. You are left stitching it together manually across flows, triggers, validation rules, and object relationships.
That’s exactly the gap tools like Sweep close. Instead of reconstructing the system from fragments, you can see the full execution surface—what fires, in what order, across which objects—before you try to fix it.
Only after those steps do configuration issues make sense. Topics, actions, mappings — these are relatively easy to fix once you understand what the system is actually doing.
And finally, if everything checks out and the problem persists, you’re likely hitting a platform constraint. At that point, the question isn’t “how do we fix this?” It’s “how do we redesign around it?”
The mistake is treating these steps as interchangeable. They’re not. If you start with configuration when the problem is automation, or with automation when the problem is data, you’ll chase symptoms indefinitely.
The model is one system, traced in the right order.


