Company Registry Data Normalisation in KYB

The data normalisation problem is the absence of a common schema across company registries, which forces compliance teams to write bespoke logic for every jurisdiction they operate in. 

This problem sits at the centre of every KYB workflow that crosses a border. We spoke to Kyckr’s Product Manager, Laura O’Mahony, to see what it looks like in practice and how to fix it.

The problem

Company registries were built for domestic courts, not global compliance teams. Each one stores different fields, uses different status codes, and defines "beneficial owner" differently. A bank onboarding a counterparty across five jurisdictions pulls five incompatible data sets. No two registries agree on what a company record should contain.

Case Study: The UK v France v The Netherlands 

You ask (via API): “Is this company alive?These three modern, digital company registries give different answers to the same question: 

  • Companies House (UK): Gives you multiple values, including active, dissolved, liquidation, receivership, administration, voluntary-arrangement, converted-closed, registered, and removed. Insolvency distinctions live in the main status field alongside everything else. It is more informative than Infogreffe, but the logic is entirely different. 

  • The KVK (Netherlands): Gives you neither. The KVK excludes inactive entities from search results by default. To see them, you must pass a separate parameter. Worse, a company can show as active in the Handelsregister whilst simultaneously in bankruptcy because insolvency is held in a separate register altogether, the Centraal Insolventieregister, which you must query independently. 

  • CVR (Denmark): Gives you 18 or more values, delivered via an Elasticsearch interface, in Danish, inside a timestamped array. Status is not a current state, but a history of states, each with a validity period attached. Values include Normal (Active), Under konkurs (Under bankruptcy), Under rekonstruktion (Under restructuring), Opløst efter konkurs (Dissolved after bankruptcy), Tvangsopløst (Compulsorily dissolved), and Slettet (Deleted). 

The same question is answered with three different ideas of what "status" means and where it lives. 

A compliance team querying all three must write different logic for each, reconcile incompatible concepts, and know to look elsewhere entirely for Dutch insolvency. That is the normalisation problem.

Why the LEI doesn't solve the problem

The Legal Entity Identifier gives a company a unique code and records its legal name, registered address, and entity status. That sounds useful. The problem is in how those fields are populated. 

LEI data is self-reported. A company, or its registration agent, files the information at the point of registration and updates it at annual renewal. Nothing pulls live from a company registry, so an entity that entered administration after its last renewal will still show as "Active" in the LEI record until someone files an update. In insolvency scenarios, that often doesn't happen promptly, if at all. 

Renewal is also optional in practice. According to former GLEIF CEO Stephan Wolf, global renewal rates sit at around 56% (2023 data), which means roughly half of all LEIs are potentially lapsed at any given time. A lapsed LEI is marked as such in the registration status field, but that requires a compliance team to check two separate status fields and understand that "Active" entity status and "Lapsed" registration status can coexist in the same record. 

Coverage compounds the problem. The LEI population sits at around 2.7 million entities globally, with low take-up among SMEs and in developing economies. Millions of the entities a global KYB workflow will encounter simply won't have one. 

The LEI is a useful spine for entity identification. But the flesh still must come from the registries: live-sourced, not self-reported.

The solution

The registries won't change for you. The fix has to sit between the raw data and your compliance workflow: a normalisation layer that maps incompatible inputs to a consistent output, flags what's missing rather than silently omitting it, and structures every field the same way regardless of where it came from. 

Here's how that works in practice.

Case Study: How Kyckr Does It 

When Kyckr queries a registry, it doesn't pass the raw response straight to your system. It translates it. 

Every status code gets mapped to plain English. For France, that means raw cessation and dissolution indicators from the RNE become one of three values: Active, Distressed, or Inactive. 

Where the registry holds more detail – a liquidator appointment date, a dissolution type, a cessation effective date – that sits in a separate status details field, available if you need it but out of the way if you don't.  

The same logic applies across the network, with each jurisdiction's native codes mapped to a normalised output that your compliance logic can act on without knowing anything about the underlying registry. 

The same principle applies to fields that simply don't exist. Dutch UBOs aren't held in the KVK. French UBOs aren't held in the RNE. Rather than returning nothing and leaving your system to guess whether that means the data is absent, unavailable, or simply unfiled, Kyckr surfaces the gap explicitly. You know what you have and what you don't. 

Address data follows the same logic. French addresses, for example, decompose into street number, street type, commune, and postal code, each mapped to a consistent component structure in the API response. The same approach applies across jurisdictions, so addresses arrive in a predictable format regardless of how the underlying registry stores them. 

The result is that your integration talks to one schema, not three hundred.

A note on normalisation and audit trails

Some argue that vendor-side normalisation creates risk: if your compliance record reflects a vendor's interpretation of registry data rather than the raw source, you have an extra layer to defend in a regulatory review. 

It's a fair concern, and the answer is in how the normalisation is done. "Normalisation is genuinely hard with KYB data,” explained Laura. “At Kyckr, our view is: surface both – that is, the source data and the normalised output – so downstream users have full transparency into what was transformed and why." 

When Kyckr maps a French cessation code to "Inactive", the underlying code, the dissolution type, and the effective date sit in a separate status details field in the same response. The normalised value is what your compliance logic acts on. The source detail is what you show a regulator. 

The alternative isn't neutrality. Raw registry data still must be interpreted by someone. The choice is between an interpretation that happens once, consistently, by people who have mapped that registry in detail, and an interpretation that happens every time a compliance analyst encounters an unfamiliar status code. The first approach is more defensible, not less.

Build vs buy: the real cost of registry integrations

Building a direct integration with a single company registry is a substantial project. Access to many registries requires signed data agreements, background checks, or approval from a government department, a process measured in weeks or months before a line of code is written. 

Once access is established, the technical work begins: some registries offer modern APIs, others operate SOAP endpoints or require VPN connections, and some can only be accessed by navigating websites built for humans, not machines. Then the integration goes live, and the maintenance starts. Registries change without notice – a renamed field, an authentication update, a deprecated endpoint – and each change can silently break a production system. 

That is the reality for one registry. Kyckr connects to more than 300. 

The question for a compliance team is not whether this is buildable. It is whether building and maintaining registry infrastructure is the product you are trying to build. For most organisations, it isn't. The compliance workflow is the product. The registry connections are the infrastructure beneath it. 

Kyckr's network has been built and maintained for nearly 20 years. The API abstracts that entirely, so your integration stays consistent regardless of what is happening at the registry level. 

Next
Next

Is There a European Company Registry?