AI Agents Found 19 Bugs That Sentry Had No Way of Catching
MyLabelDesk, a B2B SaaS platform had Sentry running in production. It was catching exceptions just fine. But two critical bugs — one that completely blocked file uploads, another that made export buttons do nothing at all — had been live for weeks without a single alert. Here's how AI agents found them.
The setup
The platform in question is a workflow management tool for the music industry. Its core flow spans 11 steps — from track upload through internal review, signing decisions, artist intake, contract management, and distribution to services like Spotify and Apple Music.
It's exactly the kind of app that's hard to test: multi-step workflows with role-based access, real-time updates, file uploads to cloud storage, and integrations with three external services. The team had Sentry set up and was catching runtime errors. They assumed their critical paths were covered.
They weren't.
What we ran
We deployed 25 AI agent test sessions using Aiqaramba. Each agent received a plain-language description of a test scenario. No selectors, no scripts, no page objects. The agents navigated the app in real browsers (Chrome and Firefox), making decisions like a real user would.
Six distinct journeys covered the full 11-step workflow end-to-end, with multiple agents per journey to test across browsers and edge cases.
The 2 critical bugs Sentry couldn't see
Bug 1: CORS blocks every file upload after track creation
When users created a new track and then tried to attach audio files from the detail page, the upload silently failed every time. The browser's CORS policy blocked the request to the upload endpoint. Because CORS errors are enforced by the browser, not the server, Sentry never received an exception.
From the user's perspective: they click upload, nothing happens. No error message, no spinner, no feedback. They try again. Nothing. Eventually they assume the feature is broken and work around it, or leave.
From Sentry's perspective: silence. The server never received the request, so there's no error to log.
Our AI agents caught this because they actually tried to upload a file, observed that nothing happened, checked the browser console for errors, and flagged the CORS block with the exact endpoint and error message. This bug appeared in 4 out of 25 sessions.
Bug 2: Export buttons produce zero feedback
The platform's distribution step has export buttons to send track packages to distributors. When agents clicked them: no download, no success message, no error, no loading indicator. The buttons accepted the click and did nothing.
Again no exception for Sentry to catch. The click handler ran, but whatever was supposed to happen downstream simply didn't. This is a logic bug, not a crash. Error monitoring tools are designed to catch unhandled exceptions, not to verify that a button actually does what it promises.
Why error monitoring has a blind spot
Sentry, Datadog, and Bugsnag are all excellent at what they do. But they share a fundamental limitation: they can only report errors that the code explicitly throws. That leaves an entire category of bugs invisible:
| Bug type | Sentry | AI agents |
|---|---|---|
| Unhandled exception / crash | ✓ | ✓ |
| Server 500 errors | ✓ | ✓ |
| Silent upload failures | ✗ | ✓ |
| Buttons that do nothing | ✗ | ✓ |
| Broken multi-step workflows | ✗ | ✓ |
| Missing form validation | ✗ | ✓ |
| Confusing or missing form feedback | ✗ | ✓ |
| State not updating after action | ✗ | ✓ |
| Cross-browser inconsistencies | ✗ | ✓ |
The pattern is clear: error monitoring catches what the code reports. AI agents catch what users experience. They're complementary tools, not alternatives. Most teams, however, only have the first one.
The full breakdown: 19 bugs by severity
Beyond the two critical bugs, our agents found a layered set of issues across the entire workflow:
- 2 critical: CORS upload blocker + non-functional export (described above)
- 1 major: Zero user feedback when uploads fail silently
- 5 moderate: Including persistent Supabase 406 errors across 10 sessions, a non-functional settings section, and review states requiring manual page refresh to update
- 11 minor: Confusing placeholder text in forms, inconsistent search results, and UI elements that didn't respond on first click
The Supabase 406 errors are a good example of the gray zone. These technically appeared in network requests and could be caught by Sentry if instrumented correctly. But they were silent. The app continued functioning with degraded data, and without someone actually clicking through the affected screens, they went unnoticed.
What made AI agents effective here
Three things made this work where manual testing and error monitoring fell short:
1. End-to-end workflow coverage. Each agent walked through the full 11-step flow, not isolated pages. Bugs that only appear at step 7 after specific actions at step 3 are invisible to unit tests and unlikely to surface in ad-hoc manual testing.
2. Parallel cross-browser execution. Running Chrome and Firefox simultaneously caught inconsistencies without doubling the time investment. The same scenario, the same flow, two browsers, at once.
3. Console error analysis. The agents don't only observe the UI, but they can also inspect the browser console, network responses, and DOM state. That's how they caught the CORS block: the UI showed nothing, but the console told the full story.
The result
After running agents across the full workflow, the team had a prioritized bug report covering every severity level, with reproduction steps, browser console output, and screenshots for each issue.
The two critical bugs, the CORS upload blocker and the broken export, were fixed within days. Both had been live in production, invisible to Sentry, affecting real users who had quietly worked around them.
Error monitoring answers: "Did the code crash?"
AI agent testing answers: "Does the product actually work?"
If your team has Sentry and thinks the green dashboard means everything's fine it might just mean nothing is crashing. That's not the same thing as working.
See what your monitoring is missing
Book a 30-minute demo and we'll run AI agents on your critical flows.
Book a demo →