No Comments

Data Profiling and MDM

Data Profiling, November 2008, Trillium


Deb Cobb

Trillium Software

Product Marketing


Both Gartner and Forrester Research say that fewer companies are planning large IT initiatives in 2009 due to the “climate of uncertainty” – corporate-speak that acknowledges early that revenues will be down since everyone has less money to spend.


But a tumultuous business climate can reveal interesting trends. For example, the global economic downturn has incented many organizations to consider how the caliber of the data that populates MDM solutions impacts the soundness of their high-cost investment.  Better data quality equals better information available for user consumption. Current market dynamics validate that data profiling has emerged as a key activity in MDM readiness, and for that matter, the readiness of any data integration initiative, such as CRM and CDI.  Profiling is not only essential for the usual technical reasons, but it is also required for the people associated with the MDM project. There are several reasons for this.


First, it’s always important to understand what you have so you can evaluate the feasibility of attaining your goal.  In data integration projects, there are numerous source data systems containing data that needs to be “assessed for use” so stakeholders can understand how well the data will support project goals. Assessing the condition of data in these source systems is a critical first step to knowing what data is available, its structure, its level of accuracy and completeness, and its consistency, because all these factors impact the scoping phase of the project. For example, if you decide to include product data in your master data view, you need to know what product data is captured in your source systems. Because profiling reveals known data issues, it helps users understand exactly what the high-level problems are and what the data looks like by revealing anomalies, input errors, duplicate values, and the level of record completeness. Profiling provides insight into the level of work required to get the data consistent and migrate it to the target MDM and lets the project manager assign appropriate resources to remediate data issues and improve data quality.


Profiling can confirm suspected data issues, such as when an application is not aligned to a business process or the use of different standards across different systems. These issues can have a huge impact on project milestones.


Automated profiling is even more powerful in that it evaluates the data in ways that user-generated queries may not have anticipated. Automated, out-of-the-box profiling can reveal unexpected data issues that users never considered. These are usually the more in-depth quality issues that dramatically increase risk, such as problems related to using data in ways it was never used before. For example, meter readers using handhelds at a large Midwestern utility indicated danger by typing “Large Dog in Yard” (LDIY) into the address field of their meter-reading application. This acronym posed a safety feature and was understood by new meter readers. When data was migrated from the meter reading application to the customer contact system, however, LDIY was the fodder of several discussions before its meaning was deciphered.


The point here is that data is not only a strategic asset, it’s a shared, enterprise asset – shared across systems and applications. Organizations need to understand current data condition to determine if it will support defined master data goals and to assess its impact on the downstream applications that will consume it.


For many companies, an MDM solution will deliver the definitive, 360 degree, master view of data. In this regard, the objective of any MDM effort is to deliver trusted enterprise data. For that to happen, companies need to delve deep into existing process to ascertain what people, systems, and applications access and/or consume the data. This is where data governance comes in. Data governance is a business strategy based on a best-fit process to optimize data value over its useful life. Data governance practices vary across companies based on their level of maturity, business priorities, MDM goals and internal competencies.  But all data governance strategies should involve a best-fit combination of process, policies, and standards that improves and maintains accurate data over time. Companies considering MDM need to define a data governance approach that protects their investment. 


Data assessment is an integral part of data governance. Companies need to understand how business process effects data condition at specific points along the data quality life cycle. Effective data governance strategies carefully manage the delicate interaction of various roles and responsibilities and deliver a roadmap that details how employees will collaborate, identify and remediate data quality issues, and maintain data consistency over time.  Because MDM project teams have multiple roles, each requiring different skill sets, it may be helpful to understand how these roles interact with data during a typical MDM project.


Prior to project kick-off, Business Stewards are charged with assessing data condition, information required for project scoping, planning, and risk assessment. They use this insight to determine if data will support project business requirements and make decisions about task duration given the now known data issues. This information also helps them allocate resources and budget to ensure project success. 


Because business users understand how data is created, its use, data nuance impacts on business requirements, and data presentation requirements, business community input is critical in all MDM initiatives. Business insight and feedback add significant value early in the project to advise about project design, needs, and priorities.


During project design phases, Business Stewards perform more detailed data assessment that includes source investigation.  This knowledge influences detailed design and process recommendations, because users focus on data value accuracy, validity, and consistency.


IT Stewards perform technical profiling during design project phases that focus on data structure, schemas, relationships, and transformation requirements. Technologists tend to focus on data flow, models, and formats.


Data Stewards and Governance Teams play a role in MDM and the day-to-day activities that measure and monitor the data over time once the MDM solution is in production. They consider what the data quality standards should be and identify impacts on downstream systems.  Their concerns span wanting to know if the data is degrading over time and develop  appropriate metrics to measure overall data “health.” 


Although Executive Management never wants to be involved in daily details, they often are interested in MDM results and performance metrics so they can evaluate how to align company resources with identified issues.  They want to know which issues are priorities and which may require process changes in a different part of the business.


Profiling not only supports all the aforementioned roles and responsibilities, but it promotes the sharing of data quality information between business and IT functions in support of data governance. For business users and business data stewards who understand the business use of data, profiling provides contextual data quality information, such as whether relationships within the data hold true and whether business defined data rules are supported by the data. Not only does this provide greater depth than profiling alone, but for data governance and MDM initiatives, it provides critical insight into how well the data can support current and planned business initiatives.


Clearly, every functional area outlined above benefits from using a unified platform that automates many profiling tasks, delivers a consistent data view, promotes collaboration around data quality, enables communication to remediate identified data issues, enforces corporate policies and standards, and monitors data quality trends and condition over time.  This is the role of technology in the grand scheme of profiling and data governance—the ability to “bring it all together” seamlessly and transparently.





admin @ January 5, 2009

Sorry, the comment form is closed at this time.