How can you prevent failure when migrating your data? How can you minimise balance cost while ensuring rapid delivery?
Follow this seven-step process.
Step1: Source system exploration
The first phase of a data migration project is to identify and explore
the source data. The most appropriate route for identification is to
group data, customer names, addresses and product descriptions
based on the target model.
Although the source systems may contain thousands of fields, some might be duplicates or not be applicable to the target system. In this stage, it is critical to identify which data is required and where it is, as well as what data is redundant and not required for the migration.
Conversely, if the initially identified sources do not contain all of the data required for the target model, a gap is identified. In this case, you may have to consolidate data from multiple sources to create a record with the correct set of data to fulfill the requirements of the target.
Using multiple data sources allows you to add another element of data validation and a level of confidence in your data.
At the end of this phase, you will have identified the source data that will populate the target model. You will also have identified any gaps in the data and, if possible, included extra sources to compensate. Optimally, you will have broken down the data into categories that enable you to work on manageable and possible parallel tasks.
Step 2: Data assessment
The next logical phase is to assess the quality of this source data. If the new system fails due to data inconsistencies, incorrect or duplicate data, there is very limited value in migrating data to the target system.
To assess the data, I recommend profiling the data.
Data profiling is the process of systematically scanning and analyzing the contents of all the columns in tables of interest. Profiling identifies data defects at the table and column level. Data profiling is integral to the process of evaluating the conformity of the data and ensuring compliance to the requirements of the target system.
The profiling functions include examining the actual record value and its metadata information. Too many data migration initiatives begin without first examining the quality levels of the source data. By including data profiling early in the migration process, the risks of project overruns, delays and potentially complete failures are reduced.Through the use of data profiling, you can:
- Immediately identify whether the data will fit the business purpose.
- Accurately plan the integration strategy by identifying data anomalies up front.
- Successfully integrate the source data using an automated data quality process.
The output of this phase of the project is a thorough understanding of the data quality in the source systems, identification of data issues and a list of defined rules to be built to correct them. You will have identified and defined your data quality rules and mappings from the sources to the target model.
At this point you will also have a good idea, at a high level, of the design of the integration processes.