All posts
· The Grivara team· 9 min read

The fraud landscape got synthetic

Generative AI made fabricated evidence cheap, and rule-based defenses built for the pre-AI era can't keep up. Here's what broke and how Grivara closes the gap.

fraudproductengineering
ES·Leer en español

A claim came into a carrier we work with on a Wednesday last month. Rear-end at a signal, single-vehicle damage, $11,400 estimate. Photos of the bumper, a clean PDF from a body shop, an FNOL narrative that read like an adjuster wrote it herself. Two priors in 36 months, nothing exotic.

The estimate was generated in the browser. The photos were diffusion-model output, lit to match the weather report for that zip code on that day. The narrative came from a language model with the claimant's policy number pasted into the prompt. Total time to fabricate: under an hour. Total cost: less than a week's worth of streaming subscriptions.

The old defenses would have paid it.

(Composite. Each element is something an SIU lead has described to us in the last quarter; the compression is editorial, the elements are not.)

What changed

Three shifts, all stacked on top of each other, all in the last eighteen months.

US insurance fraud annually

$308B

Coalition Against Insurance Fraud

To fabricate a full claim packet

< 1 hr

photos, estimate, narrative

Faster ring adaptation

vs. pre-2024 retraining cycles

The first shift is synthetic evidence. Photos, repair estimates, medical bills, paystubs for wage-loss claims, even dashcam clips — all of it is now cheap to generate and hard to catch with EXIF inspection or reverse image search. The tools that flagged copy-paste fraud a few years ago don't see diffusion output.

The second shift is rings adapting faster than models retrain. The traditional cycle was: a ring gets caught, SIU writes it up, data scientists retrain the fraud model on the new pattern, deploy in a quarter. Today the ring has moved twice in that quarter. Any system that depends on periodic retraining is permanently behind.

The third shift is coordination inside a single book that nobody is structured to see. ISO ClaimSearch and NICB already surface cross-carrier entity matches — that's not the gap. The gap sits inside one carrier's own claims: rings that spread thin across unrelated-looking claimants, providers, and body shops within the same book. No row-wise model catches that. The entity matching goes one step; the ring lives two or three.

None of this is theoretical. The Coalition Against Insurance Fraud pegs total US insurance fraud at around $308B per year — a number that was climbing before generative AI got good, and has accelerated since. The SIU leads we talk to say their QC-coded referral rate jumped noticeably in 2025. Some of that is better detection. Most of it isn't.

Why the old stack broke

Every carrier has a fraud stack. The shape is familiar: rules at the front, a gradient-boosted model in the middle, an SIU team at the end. It worked for a long time.

It stopped working for two specific reasons.

Before

Pre-2024 defense. Rules flag staged-loss patterns and known bad actors. A tabular ML model scores each claim on its features — priors, damage amount, time-of-day, garaging zip. SIU drowns in false positives but catches the obvious stuff. Synthetic evidence is rare enough to ignore.

After

Post-2024 reality. Rules catch last year's ring. The ML model scores a claim in isolation, blind to 2-hop neighbors. Evidence anomalies aren't grep-able JPEG artifacts anymore. They're statistical, and statistics need a model.

The row-wise scoring problem is the one people underestimate. A gradient-boosted tree sees a claim as a row of features. It has no way to know that the attorney on this FNOL also represented someone in a ring SIU tagged last quarter — because that fact isn't a column. It's an edge in a graph nobody built.

Rules are worse. Rules encode yesterday's fraud. "Body-shop A files more than 40 claims per month" is a decent rule until the ring decides to use Body-shops A through F instead. You can write more rules. They catch less every month.

The three signals Grivara fuses

The way out isn't a better single model. It's a small set of complementary signals, each weak alone and strong together, fused into a score an adjuster — or, once routed, an investigator — can reason about.

The examples in this post are auto. The same fusion pattern applies to property, workers comp, and health with different weights and different multimodal detectors. We're using auto here because the rear-end scenario is easy to hold in your head.

Signal components

  • Graph collusion0.78×0.35
  • Tabular risk0.45×0.25
  • Multimodal evidence0.62×0.20
  • Adversarial stress0.55×0.20

Fused fraud score

0.62

weighted fuse

Graph collusion. We treat a claim as a point in a graph of claimants, policies, providers, attorneys, body shops, and the documents every claim drags with it. When a new claim lands, we traverse its 3-hop neighborhood and ask two questions: how dense is it, and does it brush against anything flagged? Density alone is not fraud — a small town has one body shop and one tow yard — but density plus a known-bad neighbor two hops out is the signal that was invisible to row-wise models. The graph sits on top of your ClaimSearch and CLUE feeds, not in place of them: those bring the signal from outside your book; the graph structures the relationships inside it. (Graph collusion deep-dive for edge weights, hop decay, and feedback propagation.)

Tabular risk. Priors, loss timing, garaging-vs-loss-location delta, coverage-to-damage ratio. The old-world signals still matter — they're just not sufficient on their own. We keep them, we weight them at 0.25, and we let the other signals fight them when they disagree.

Multimodal evidence. Every artifact the claim produces — photos, PDFs, audio from the FNOL call — runs through a multimodal pass. Diffusion-generated photos leave statistical fingerprints that are invisible to EXIF checks but visible to a model trained on them. A fabricated repair estimate has subtle inconsistencies between the header layout and the line-item arithmetic. We don't publish the exact detectors because they change monthly, but the output is a calibrated score that contributes to the fuse.

Adversarial stress. Before scoring, a red-team agent generates plausible fraud narratives consistent with the FNOL: staged rear-end, pre-existing damage, inflated repair scope, phantom passenger for injury claims. A defense agent then has to argue against each one using the evidence on file. A claim that survives this exchange with no defensible counter gets flagged. The beauty of adversarial is that it scales with the attacker — if fraudsters get better narratives, the red-team model sees them and generates harder tests.

Fused at 0.35 / 0.25 / 0.20 / 0.20. No single component can move a claim into high band alone.

Why the combination matters

Each signal has a named failure mode. Each other signal catches it.

Signal aloneFailure modeWhat catches it
GraphRegional concentration ≠ fraudAdversarial must also fail the narrative
TabularBlind to coordinated ringsGraph lights up 2-hop neighbors
MultimodalMisses well-coordinated narrativesAdversarial probes the story
AdversarialAirtight narratives slip past the red teamGraph density and multimodal catch what the story hides

That table is the whole product argument in one square. Carriers that run one of these well still have three blind spots. The point of fusion is not that each signal is individually better than what came before — some of them aren't. The point is that the blind spots don't overlap.

Take the synthetic-evidence claim from the opening.

  • Graph would have missed it — the claimant was new, no ring signal.
  • Tabular would have passed it — the features were unremarkable.
  • Multimodal caught a diffusion fingerprint in the photos.
  • Adversarial flagged the narrative had no defensible counter to "pre-existing damage photographed post-loss."

No single signal would have moved the needle. Fused, the score landed in the medium-high band and routed to SIU, who accepted the referral and QC-coded it within forty minutes.

The adjuster is still the decider

This is the part that sometimes gets lost in fraud-AI marketing, and it's the part that matters most.

What the investigator sees is a single surface. One claim, one score, four components with their weights, the 3-hop neighborhood rendered as a graph, the adversarial turns side-by-side, and the evidence artifacts with their anomaly notes. They can drill into any piece. They can override. The override is logged, attached to the audit trail, and fed back into the graph as a confirmed-clean or confirmed-fraud seed that propagates to neighbors with hop decay.

CLM-19241 · auto collision · synthetic evidence suspected

IntakeDONE0.5s

Extracted 27 fields · 0 prior claims · policy active 14 months

CoverageDONE1.1s

Likely covered · PIP + collision in force

FraudDONE3.8s

Fused score 0.62 · graph 0.78 · adversarial 0.55 · multimodal 0.62

ReserveNEEDS HUMAN

Above SIU threshold · routed to investigator queue with full trace

Total runtime from FNOL upload to decision packet: under five seconds. The investigator takes as long as they need. Nothing the agent did was a decision — it was a very fast, very thorough memo.

That distinction is what keeps the system legible. Investigators have been burned by black-box fraud products before and they don't trust them, rightly. The fix isn't to make the model more accurate. It's to make the reasoning visible, every step, every score, every tool call, every traversal. We stream it all as SSE events so the UI — or a DOI auditor — can replay the decision frame by frame.

A few things SIU leads ask every time, worth naming here. The audit trail is built for DOI review and anti-fraud plan filings, not just internal use. Every traversal, tool call, and override is timestamped and replayable. Feedback seeds propagate on confirmed outcomes, and they also unwind: if an investigator's decision gets overturned on appeal or through an EUO outcome, the seed reverses and the propagation rolls back across the same neighborhood. Nothing poisons a book forever. And reserves set on the basis of a fused score carry the fraud reasoning in the same record — if discovery comes for it, the reasoning is there, not a black-box number with no provenance.

The SIU queue problem

Fraud detection is not only a precision/recall problem. It's a queue problem. SIU has finite capacity — typically a handful of investigators handling hundreds of referrals a week. A model that doubles true positives but quadruples false positives makes SIU's life worse, not better.

Fusion helps here in an indirect way. Because no single signal can route to high band alone, the high-band queue is smaller and denser with real cases. The medium band is a different affordance — not "investigate now" but "flag for the adjuster's awareness; escalate if something feels off." That separation is what keeps the top-of-queue precision high enough for SIU to trust it.

Integration is the other half. Output lands in your case-management system — Guidewire ClaimCenter, Duck Creek, whatever you run — as a QC-coded referral with the trace attached as a linked document. It doesn't replace the investigator's workflow. It shows up inside it, pre-populated.

One SIU lead put it to us this way, mid-walkthrough: "I don't need the model to be smarter than my investigators. I need it to land on their desk with the SOS half-drafted and the EUO questions queued up. Then they're writing the referral packet, not chasing tabs."

If you run claims or SIU at a carrier or TPA and want to see the fused view on a live claim from your book, we'll walk you through it.