What Are the Risks of Poor Data Provenance in KYB?
Data provenance in KYB is the ability to trace every piece of verification data back to its original, independent source, and to prove that chain to an auditor.
Do regulators demand it?
Regulators want to know whether compliance teams used unbiased evidence in their onboarding decisions.
While ‘data provenance’ is not a legal term or requirement in either British or European Union (EU) law, demonstrating the fact that compliance teams used unbiased evidence while fulfilling their anti-money laundering (AML) obligations during customer onboarding is.
Firms engaged in the fight against financial crime are obliged to use “independent, reliable” sources to verify counterparties against their own self-disclosures – and those teams must be able to prove that they have done so.
What counts as “independent, reliable” sources under global regulations?
The Financial Action Task Force (FATF) set the internationally recognised benchmark of what constitutes ‘good’ KYB data. Recommendation 10 puts it plainly for beneficial ownership verification – although the same standard applies to verifying an entity’s existence: the information must come from “independent, reliable sources”.
Article 22 (6A) of the Anti-Money Laundering Regulation (AMLR) says that obliged entities must obtain “information from reliable and independent sources, whether accessed directly or provided by the customer.” 7A goes further for beneficial owners: obliged entities must take “reasonable measures to obtain the necessary information, documents and data from the customer or other reliable sources, including public registers other than the central registers.”
The same is true in Britain. The Money Laundering Regulations 2017 state that verification must be made “on the basis of documents or information obtained from a reliable source which is independent of the customer.”
In short, good data provenance in KYB is the ability to prove that an onboarding decision was made based on sources “independent of the customer”.
The provenance test fails if you say to an auditor, “We obtained this formation document from our KYB vendor”, or “We obtained this information from our customer”, as opposed to “directly from the company registry”.
Case Study: How the Solo Group failed the provenance test
Recent regulatory actions in Britain illustrate this.
The brokers who onboarded the Solo Group's clients relied almost wholly on KYC questionnaires, completed by customers, supplied by the Solo Group itself.
While it is legal to outsource due diligence obligations to a third party, firms that do so are not absolved of their AML obligations. The Solo Group, as it turned out, were supplying biased information, as the CEO and owner was found to be in ultimate control of most of the Group’s alleged clients.
In other words, information supplied by a client does not carry the same evidential weight as information obtained from a government registry like Companies House.
Kyckr's own analysis of 22 FCA Final Notices issued between 2020 and 2025 found that 68% involved data failures, gaps that left firms unable to assess risk accurately.
In some cases, high-risk business customers were assigned low-risk ratings, which meant they were onboarded without enhanced due diligence and not adequately monitored over the customer lifecycle.
Auditors, in turn, found that the firms in question had weaker anti-money laundering frameworks and defences, leaving them exposed to regulatory action and fines, including, in one case, up to 264 million pounds.
A sound framework on paper no longer satisfies regulators. They expect firms to evidence the sources behind each verification decision, and to produce the record on demand, at onboarding and throughout the customer lifecycle.
How do commercial aggregators affect data provenance?
Third-party commercial databases periodically scrape, buy and download information from multiple sources and turn it into structured data, which they store in databases.
They play an important role in structuring data that would otherwise be trapped in manual workflows and normalising data pulled from global company registries with different data standards, making it easier to automate parts of customer due diligence.
However, commercial databases often pull KYB data from a range of sources, including customers, company registries, and other third-party databases. There is nothing strange about this. It is something that most commercial databases are fully transparent about.
What happens when aggregators source data from other aggregators?
Let's say that a commercial database pulls data from another commercial database, which in turn pulls data from another commercial database.
There is nothing inherently wrong with using other commercial providers – in fact, it is common practice. However, the more complex the data pipeline, the more potentially opaque the sourcing. It muddies data provenance and makes it harder for compliance teams to truly know whether the data was obtained from “independent, reliable” sources.
This is why the Joint Money Laundering Steering Group (JMLSG) guidance (2023, Revised Edition) explicitly advises firms to do due diligence on their data supplier.
“Before using a commercial organisation for electronic verification of identity,” the guidance says, “firms should be satisfied that information supplied by the data provider is considered to be sufficiently extensive, reliable, and accurate, and independent of the customer, and capable of providing an appropriate level of assurance that the person claiming a particular identity is in fact that person.”
What does good data provenance in KYB look like?
Here are two things to demand of commercial vendors during introductory calls:
Where was your data sourced from? Can you prove it, and how?
Are other commercial aggregators involved in your data pipeline – and can they prove that their data is pulled from “independent, reliable” sources?
Kyckr has been working to solve the data provenance problem in KYB. Instead of a database, Kyckr provides a live network to 300+ company registries worldwide, enabling compliance teams to retrieve company data and documents, in real time, time and date-stamped at the point of retrieval.
Frequently Asked Questions
Is data provenance a legal requirement in KYB?
Not by name. "Data provenance" is not a defined legal term in UK or EU law, but the obligation behind it is. Firms must verify customers using "independent, reliable sources" and must be able to evidence that they did, which is what good data provenance provides in practice.
What counts as an "independent, reliable" source under FATF Recommendation 10
A source independent of the customer and capable of corroborating their self-disclosures, most clearly, official company registry data. FATF Recommendation 10, the EU's AMLR, and the UK's Money Laundering Regulations 2017 all require verification from sources independent of the customer, with the firm able to prove the sourcing.
Can I rely on a commercial aggregator for KYB verification?
Yes, but with diligence. Using a commercial database is common and legitimate, and most are transparent about their mixed sourcing. The risk is provenance: when aggregators pull data from other aggregators, the chain back to an independent source becomes opaque, which is why the JMLSG advises firms to do due diligence on the data provider itself.
How do I prove data provenance to an auditor?
You must be able to show the data came directly from a source independent of the customer – ideally, an official registry – and produce the record to evidence it. The test fails if your answer is "we got it from our customer" or "from our vendor" rather than "directly from the registry."