An Ontology for Accountability: Defining What Data Quality Means in Blockchain Analytics

We built this to be questioned

Before I came to blockchain analytics, I spent years in academia studying the formal correctness of distributed systems. My work was proving, mathematically, that code and protocols behave correctly under every possible execution variant. Not most of them. All of them. In that world, there isn’t a “good enough.” There’s proof, or there’s no proof.

I left academia to apply that thinking to real-world problems, first in financial infrastructure where failure is enormously expensive, and eventually to blockchain analytics, where failure directly impacts peoples’ livelihoods. I’ve now spent nearly a decade as Chainalysis’s Chief Scientist, and I’m ultimately responsible for the quality of our data. That responsibility is something I carry into every decision about how our systems work and the claims they make.

I mention this background because it explains something about how I approach this field, and why the following moment that happened a few years back changed everything.

The moment

A customer came to me with a problem. They had two blockchain analytics tools giving them two very different answers about the same deposit address. Ours called it a gambling service. The other tool called it CSAM.

This was not a disagreement about whether an address belongs to Exchange A or Exchange B. Rather, it was a disagreement where one answer means a person placed a bet, and the other means they may have purchased child sexual abuse material.

I looked at it and quickly understood what had happened. Small, regular payments of similar size and frequency can look alike if all you’re doing is matching statistical patterns. A small-time gambler and something far darker can produce similar transaction footprints when viewed through a narrow enough lens. That’s a dangerous conclusion, and a reckless foundation to build it on – no evidence, just appearance.

That kind of conclusion cannot easily be independently proven by the people who act on it. People rely on this data and may treat it as fact. We have proven that it can be relied on, if done right. This, however, was exploitation of that trust. And it could have terrible consequences.

Building research-grade systems for a field that didn’t exist

When I joined this space, blockchain analytics wasn’t an established discipline. There were no textbooks, no accreditation bodies, no precedent. The problems were novel. It was research. And we were building systems to be used by people who didn’t fully technically comprehend blockchains, people who relied on the correctness of our work and trusted it. Law enforcement agents building cases. Compliance teams making decisions that affect real people and their access to money. Prosecutors presenting evidence in court.

So I did what came naturally: applied academic rigor to everything we built. If it could withstand peer review, it was good enough. If it couldn’t, discard. We drew hard lines between what could be proven forensically, through deterministic, reproducible on-chain analysis, and what required intelligence tradecraft. We enumerated failure modes and safeguarded against them. We treated the structural layer of our analysis as a forensic discipline, because that’s what it is.

When you’ve spent years proving systems correct under all conditions, you don’t ship something and hope it holds up. You build it so you can prove it holds up.

Rigor is a philosophy, not a phase

That standard wasn’t something we outgrew as the company scaled or the blockchain ecosystem evolved. As new chain architectures emerged, as the complexity of on-chain activity grew exponentially, we kept the bar exactly where it was and improved our operational processes.

When our methodology was subjected to full Daubert scrutiny in federal court in United States v. Sterlingov, it was found admissible across every criterion. I never doubted that outcome, because I knew our methods could stand up to peer review.

When independent researchers at Delft University in collaboration with law enforcement conducted the only empirical validation study of attribution accuracy against ground truth from seized infrastructure, we welcomed it. We let the results be published. We treated it as what it was: the kind of external scrutiny that any legitimate discipline should invite. Another provider tried to suppress that same study through legal threats. That tells you something. No. It tells you everything.

As the field grew, new entrants arrived. And with them came a gradual erosion that I’ve watched with growing concern. Definitions got looser. Machine learning outputs started being treated as forensic facts. I started getting questions from customers about data quality issues that shouldn’t exist if the underlying methodology were sound. The CSAM-versus-gambling moment was the most visceral, but it wasn’t the only one, and they haven’t stopped coming.

The vocabulary gap

The field is still young and the technology is genuinely new. But the deeper problem isn’t that the investigators, regulators, and courts who depend on this data don’t fully understand the underlying mechanics. It’s that they shouldn’t have to. You can’t expect everyone investigating crime in this space to appreciate every nuance and technical aspect. That’s what science gives you. Blockchain analytics hasn’t formalized that yet since the space is new. We’re building toward the kind of standards that mature scientific disciplines have established.

That’s why we wrote the ontology. Neither as a product specification nor a marketing document, but formal paper that outlines the standards we have upheld since our founding that breaks down the work we do into its constituent parts, assigns each an evidentiary standard, and gives the industry a vocabulary for accountability.

It defines two tiers. The structural layer, whether addresses share common control, must meet a standard of forensic reliability: deterministic, reproducible, auditable, with known and documented failure modes. The attribution layer, linking those addresses to a named entity, follows a structured and recognizable confidence framework: source characterization with documented reasoning requirements. Both are rigorous. But they’re different kinds of rigor, and conflating them is how you confuse people, and failing to appreciate them is how you end up labeling a gambler as a CSAM purchaser.

Why this makes me proud

This paper formalizes the principles that have guided our methodology since our founding, while acknowledging that our implementation continues to evolve. It required us to name what we’ve been doing all along. The standards it defines are the standards we already hold ourselves to. The bright lines it draws are bright lines we drew years ago, because nothing less would withstand peer review.

The consumers of blockchain analytics data don’t always have the information to perform their own peer review. What makes me proud is working for a company that does not violate the trust those people place in us. Every system we designed, every heuristic we deployed, every analytical claim we made was built with the assumption that it would one day be examined by someone adversarial, skeptical, and technically capable, and that it needed to hold up. And it has held up. Because we built it that way.

But pride in our own work isn’t enough. The people affected by blockchain analytics outputs, the subjects of investigations, the users flagged by compliance systems, the defendants in criminal cases, deserve an industry that holds itself to this standard collectively. This paper is a first step, not a final word. We’re publishing our definitions because someone needs to take the lead in responsible transparency. Someone has to establish the boundaries. We believe we have. Now we’re inviting the industry to build on them with us.

This website contains links to third-party sites that are not under the control of Chainalysis, Inc. or its affiliates (collectively “Chainalysis”). Access to such information does not imply association with, endorsement of, approval of, or recommendation by Chainalysis of the site or its operators, and Chainalysis is not responsible for the products, services, or other content hosted therein.

This material is for informational purposes only, and is not intended to provide legal, tax, financial, or investment advice. Recipients should consult their own advisors before making these types of decisions. Chainalysis has no responsibility or liability for any decision made or any other acts or omissions in connection with Recipient’s use of this material.

Chainalysis does not guarantee or warrant the accuracy, completeness, timeliness, suitability or validity of the information in this report and will not be responsible for any claim attributable to errors, omissions, or other inaccuracies of any part of such material.

Blockchain analytics Data Quality Ontology

Crypto Investigations

Crypto Compliance

Crypto Security & Fraud

Crypto Data

AI

Platform

Blog

Research

Events

Customers

About

Careers

US Government

Network

An Ontology for Accountability: Defining What Data Quality Means in Blockchain Analytics

We built this to be questioned

The moment

Building research-grade systems for a field that didn’t exist

Rigor is a philosophy, not a phase

The vocabulary gap

Why this makes me proud

An Ontology for Accountability: Defining What Data Quality Means in Blockchain Analytics

We built this to be questioned

The moment

Building research-grade systems for a field that didn’t exist

Rigor is a philosophy, not a phase

The vocabulary gap

Why this makes me proud

Defining the Cluster: A Formal Ontology for Blockchain Address Analysis and Intelligence Claims

Recent posts

10 Questions to Ask Your Blockchain Analytics Provider about Data Quality

Agentic Payments Cross the Threshold: Inside x402’s Path to Meaningful Adoption

The New Compliance Floor: Organizations are Adopting Stronger Than Ever Monitoring Practices