Home » Accuracy methodology

Accuracy methodology

How we measure Bank2XL's extraction quality, and the real numbers.

Short version: on our internal test corpus of 61 bank statements from 20+ banks across the US, UK, Canada, Europe, Australia, and Brazil, 82% reconciled cleanly, 18% needed human review. The reconciliation status badge on every result page tells you which bucket your statement falls into before you trust the data.

Why publish the real numbers

Bank statement converters tend to claim a single eye-catching accuracy figure ("99% accuracy!", "field-leading precision!") without saying what was measured, what the corpus looked like, or how often the tool admits it doesn't know. We think that's the wrong way to earn trust. We'd rather tell you the honest picture and let the reconciliation badge on every result do the rest.

The test corpus

Our corpus is intentionally diverse. It is also small — 61 statements — and we want to be upfront about that. We are continuing to grow it, and we will update this page when we re-run.

Dimension	Coverage
Total documents	61 bank statements
Unique banks	30+ (Chase, Bank of America, Wells Fargo, Capital One, Citi, KeyBank, M&T, RBC, TD, CIBC, HSBC, ANZ, Commonwealth Bank, BNP Paribas, Deutsche Bank, plus regional credit unions and court-record extracts)
Countries	US, UK, Canada, Australia, France, Germany, South Africa, Brazil, New Zealand
Languages	English, French, German, Portuguese, Russian (small samples)
Document types	Personal checking / savings, credit-card statements, court-record statement extracts, business statements, summary-only PDFs
Format types	Text-layer PDFs and scanned (image-only) PDFs in roughly 70 / 30 ratio

Headline results

Verdict	Count	Share
reconciled Transactions extracted, sum ± opening = closing within 0.5%	30	49%
summary doc Statement contained no transaction table to verify (e.g., cover page or balance summary only)	18	30%
PDF inconsistency Our extraction matched the PDF's own Activity Summary, but the PDF's reported totals disagreed with the transaction list itself	2	3%
reconcile failed Our extraction did not match the reported balance — needs review	11	18%

The headline 82% figure sums "reconciled" + "summary doc" + "PDF inconsistency" — the cases where Bank2XL either produced data that reconciles, or correctly recognized there was nothing to reconcile. The remaining 18% (11 documents) are extraction failures that the user would need to verify manually.

If you tighten "success" to extraction-with-balance-reconciled-vs-source, the number is 30 of 43 documents that had a verifiable transaction table — 70%. We think both numbers are worth knowing.

What the failures look like

Across the 11 documents that did not reconcile, the common failure modes are:

Multi-column tables on scanned PDFs. When the OCR has to guess at which column a value belongs to, debit / credit polarity can flip on isolated rows.
Mid-page table rotation. A few US bank statements pivot the transaction table 90 degrees on certain pages. Our text-layer extractor handles this fine; the OCR path sometimes misses rows in the rotated region.
Footers that look like transactions. "Total interest paid YTD: $42.18" on a final page can be ingested as a transaction if the layout heuristics misread it.
Inconsistent decimal separators within the same PDF. A small number of European statements mix `1.234,56` and `1,234.56` even within one document; we generally pick one and stick to it.

How you know which case you're in

Every Bank2XL Excel includes a Validation sheet showing per-account reconciliation status. The result page badge is the same status with a single color:

reconciled — trust the numbers. They balance.
no_balance / insufficient_data / incomplete_source — we couldn't verify; check the source.
mismatch / tx_extraction_incomplete — our numbers don't match the PDF. Investigate.

This is the single most important UI element in the product. The whole point of building reconciliation in is so that you never trust output you shouldn't trust.

Performance

Mean processing time: 6.3 seconds per document on the test corpus.
Range: 2 seconds (small text-layer PDFs) to 30 seconds (multi-page scanned statements via OCR path).
Cost per conversion: < $0.01 in API calls for text-layer; ~$0.05 for OCR-heavy statements.

What we DON'T claim

We don't claim 100% accuracy. Anyone who claims that on AI extraction is lying.
We don't claim "98%+" without a footnote anymore. Earlier landing copy quoted that number from an older 30-doc internal test. The 61-doc corpus number is 82%; we now publish that instead.
We don't claim coverage of every bank. We have specifically tested ~30 banks. Banks not in the corpus probably work (vision AI generalizes), but we cannot promise they do until we run them.
We don't quietly suppress failures. Mismatched extractions still produce an Excel; the Validation sheet says mismatch so you know.

How we'll improve this page

This page will be re-published whenever we run a new corpus pass. Planned next:

Grow the corpus from 61 to 200+ documents, with focus on under-represented banks (Citi, US Bank, regional credit unions, more European banks).
Add a per-bank breakdown so you can see, e.g., "Chase: 92% reconciled across 24 statements".
Publish a quarterly delta showing reconciliation rate over time as the model improves.

If you have a statement that didn't reconcile and you're willing to share it (after redacting), send it to [email protected]. We use shared corpora to drive prompt and pipeline improvements; reconciliation rate is the metric we optimize for.

Open about the limitations

Bank2XL is a small product built by a small team. We have intentionally chosen to publish honest numbers rather than marketing-friendly ones. The trade-off:

If your statement is a common US / UK / Canadian / Australian retail bank in a familiar layout, you'll almost certainly see reconciled.
If your statement is from a less common bank, has unusual multi-currency complexity, or is a phone-scan PDF in poor quality, you may see a warn or err status — in which case manual review is needed.
You always get a result. We never silently fail.

Join the waitlist See a sample output How it works