Of all the components of a modern data architecture, Master Data Management (MDM) is probably the one shrouded in the most mysticism. As I hope this post will show, the concepts behind MDM are not really that complicated but the art is in its application.
Steve Crosson Smith
Sep 04, 2019

The dream whereby all activities within a company, and hence all of its data, is centralised, governed and controlled by a single system has not, in the main, come to fruition. ERP vendors have been trying to achieve this for over 30 years but rarely do I come across a company completely orientated around any one of them. It is not unusual to find one or more accounting, CRM, order management, warehouse management, product lifecycle management, logistics, HR, customer support management and a variety of other systems within mid to large-sized companies; along with a plethora of reference data held in spreadsheets. Generally speaking, each has its own set of IDs and references with no over-arching means of relating entities across systems.  Gaining a centralised view and applying common rules and conventions across such a variety of systems is a daunting, if not impractical, task.

The rise of cloud-based Software as a Service and white-label offerings is also likely to exacerbate the situation.

It is the goal of most companies to have a single view of the various domains within their organisation e.g. Customer, Supplier, Product, Employees etc. and, ideally, the inter-relationships between them. External reference data is also often required to provide context.  How on earth can this single, harmonised view be achieved when the parts of the jigsaw reside in multiple systems with different references and dozens of spreadsheets?

Let’s consider some of the challenges:

  • Different IDs within each system means that data can’t easily be related, if at all
  • Inconsistency of entity descriptions (e.g. customer names, product names, supplier names etc.)
  • Siloed and ungoverned reference data (often in Excel spreadsheets – how do you know you are using the correct version?)
  • Poor and inconsistent data quality
  • Duplicated entities
  • Lack of consistent hierarchies
  • Managing who can create, read, update and delete key entities centrally and robustly
  • Being able to find key information across multiple systems.

Master Data Management seeks to address these issues, and many others besides.

For the sake of clarity, I will put forward the following definitions for the purposes of this article:

Master Data – data which originates from within your organisation or operations

Reference Data – data which provides context to your master data and may originates from third party or internal sources.

Some MDM systems, such as TIBCO’s EBX, can handle master data, reference data and metadata (a topic for another day), but most only handle one of these.

Main Functions of an MDM System

MDM OverviewThe diagram summarises the main functions of an MDM system which are:

  • Master and Reference data is loaded, or entered, into the system (and updates are loaded on an ongoing basis), usually, but not always, with an ETL or other integration tool.
  • Data cleansing may be performed by either the integration or MDM tool.
  • Master data is fuzzy matched against existing records to detect suspected duplicates.
  • An automated merge or insert process is usually executed where the degree of confidence is high, otherwise exception records are routed to Data Stewards for action.
  • Key attributes of the data are modelled within the MDM system (e.g. data types, IDs, field names, relationships, validation, constraints etc.), independently of the source systems, and a master ID is generated by it which can be used to tie the individual source system record IDs together.
  • Descriptive elements are added by users to make it more accessible to consumers.
  • The data is stored in a centralised MDM repository.
  • Create Read Update Delete (CRUD) permissions are enforced.
  • The data is made available (published), along with descriptive elements, to users and external systems via a graphical user interface and APIs.

For many organisations, the MDM system is the only one in which new master data may be created and these new records, having been validated, are then made available to all of the relevant systems around the enterprise.

Similarly, MDM is often used as the sole repository for reference data, with other systems either using it via real-time API calls or being updated on a regular basis from the MDM system.

Key Roles Within an MDM System

The key roles that should be present in any MDM rollout are as follows:

Data Owners –  the final arbitrators relating to the use and verification of the data sources for which they are responsible. These are usually senior business representatives of the function generating or procuring the data

Data Stewards –  responsible for managing the data on a day to day basis. They would refer to Data Owners should decisions need escalating.

Data Administrators – responsible for the more technical aspects of the system, e.g. creating and maintaining the model.

Consumers – the people or systems that require access to the published data for their day to day business activities

Adopting and Utilising an MDM System

MDM permits a single, holistic view of the data distributed around your organisation and applies governance to its creation, management and utilisation. It provides a system-independent, graphical mechanism for modelling the key entities in your business and a means of defining relationships between them as well as corporate standards for the attributes that define them.

As with most transformational projects, people and processes are even more important than technology. This is especially true of MDM as it is only of value if it is adopted as the central governed source of designated data. For this reason senior sponsorship, preferably at board level, is required.

How an MDM system is integrated into the wider corporate data architecture varies according to the scenario and there is not sufficient space here to elaborate (perhaps the subject of another blog) other than to say that it is commonly used in several ways:

  • As a manual reference source via the provided user interfaces
  • To update host systems when changes in master or reference data occur, usually via integration software
  • Real-time access via, for example, API calls
  • To act as a data source within a larger BI environment via ODBC, JDBC etc.  Data Virtualization ( see previous blog ) is ideal for this scenario.

In summary, MDM is an extremely powerful tool for complex companies wishing to become data-driven enterprises. By no means do all companies require it however, where it is applicable, it solves problems that no other component of the data architecture can address. We would be happy to discuss further if this post rings true with your organisation.