Identification: Candidates for a formal data migration process typically exhibit one or more of the following characteristics:
Complex data with multiple relationships moving between two different systems or differing data schemas.
Non-backwards compatible version change in the data schema.
Sheer volume of data requires special handling; anything over five thousand (5,000) records typically requires batching or streaming to avoid import deadlock and to provide repeatability and scalability.
Data requires validation, filtering, or transformation.
Mission or business-critical data.
Audience: Data migrations typically require coordination with multiple audiences. Know your audience and their unique concerns and responsibilities. For example, identifying data owners vs. system integrators using the underlying data is key as they can be different audiences with different concerns. Each audience needs to be educated on how their data is managed before, during, and after the migration even if nothing changes.
Compliance: Ensure the systems, transport methods, and environments are accredited to hold a variety of data. Verify the data contains the proper security and compliance tags. Policies and procedures change over time, data migrations are an excellent opportunity to return to compliance.
Data Quality: Artificial Intelligence (AI) and Machine Learning (ML) require high-quality data for optimal results. Data migrations are a perfect time to improve or validate the quality of data:
Validate data against existing schemas or requirements to ensure integrity increasing quality.
Identify and remove duplication
This is a great time to create and enforce a data retention policy. Data retention policies can start simple and mature continuously over time. They protect the organization from risks incurred by having sensitive data past its useful life. A side benefit is potentially less data to migrate and often the oldest data is the lowest quality and least compliant.
Increase the quality and usability of your data by adding names and version numbers. The industry standard uses both Major and Minor versions: Major versions are typically not backwards compatible, whereas Minor versions' updates to the software are backwards compatible.
Tip: Add encoding for special characters during extraction for transit and on the receiving system.
Tip: Watch for field length differences between systems or endpoints.
Concepts:
Make data migration a repeatable, restartable, reusable process. Keep the code and/or scripts in a revision-controlled environment. Seek opportunities to incorporate the data migration process into a repeatable test pattern between environments to identify the inevitable unknowns.
Always have a data backup and a backup plan; keep back-ups of both the new and old system throughout the process and have a fallback plan for if things go wrong.
Test the migration, track, and log metrics for each data set. Do this between systems before, during, and after the migration. Automatically compare them and set alerts for differences.
Use API Keys for extremely large organizations so they can automatically track down and communicate with the teams hitting your system.
Unexpected: Hunt for the unexpected. It is there and you must be proactive to find it.
Build extra time into your schedule for the unknown. Similarly, this buffer time is supported by documenting risks and assumptions.
When external dependencies exist, ensure there is a plan that allows them to migrate on a separate timeline. Proactively communicate while helping them migrate. This includes a temporary backwards compatibility plan.
Data Migrations may be complex, but they are achievable. Follow our lessons learned and enact a solid plan to achieve success. For highly complex migrations, we have developed a Data Migration Framework to help you create a custom data migration application. In our next article, we will delve into the core concepts of this framework and how it creates success.
Stay tuned!