What is test data management?

quote image

"When looking into various test data management solutions, we found that Datamaker was very useful for getting the right kind of data for testing and development. This was incredibly important for us and one of the main factors in our tool selection. Yes, we needed data for testing and development, but we also needed many data scenarios. Datamaker gave us a small amount of data, but a rich spread of data."

Jochen Westheide,
The ARAG Group

Integrated test data management solutions and best practices

data definition

With testing procedures accounting for as much as 60% of the application lifecycle, test data management has never been more important. The term "test data management" suggests we need to find methods to manage, edit and enhance test data, not just test scenarios. Test data management does not deal with specific test cases, but rather the data going into specific test cases for specific test scenarios.

Most organizations find the hardest part of testing is finding the initial test data to use for these cases. As a manual process, this can be incredibly time consuming and tedious. Likewise, the end result is rarely effective as it is next to impossible to create high-quality test data from a large and unwieldy production database. So, how can you acquire good quality test data, and how can it be managed effectively? This is where test data management becomes an important part of your overall testing strategy.

Robust test data management processes are essential in maintaining applications and databases. In addition, the recent rise in identity theft, industry regulators and law makers continue to put pressure on organizations who use dangerous techniques to provision test data.

In recent times, test and development teams have begun to enquire about their choices. Whilst there are multiple methods and choices for tools, solutions and strategies; most organizations choose to take direct copies (or clones) of production databases and use those copies in testing and development. This very common practice has highly significant drawbacks:

  • Most likely, there are numerous security concerns with customer or employee data being held in the environment from which a direct copy has been made. Of course, organizations can write scripts to mask or obfuscate sensitive data, but in doing so, the end result may not reflect the relational and business integrity of the database. Test and development data is also often shared throughout many teams in uncontrolled environments, which is even more worrying.
  • Production databases are difficult to scale. It is nearly impossible to reduce "real-world" data in live databases into scalable test sets while still retaining the existing relationships, distributions and correlations among schema attributes.
  • The application workload is generally very dependent on the contents of "live" databases. Nearly all databases contain various date, time, and timestamp columns. It is essential that queries in the workload that compare these columns to specific literal constants (or host variables, or stored procedure parameters) utilize date and time values that match the contents of the database.

So, what are the alternatives? Do test data management tools exist?

Grid-Tools offers multiple flexible solutions which can be used alone or in combination to target the never ending problems organizations struggle with when it comes to provisioning test data.

  • We create secure, smaller, targeted versions of databases for testing and development or database subsetting
  • We can secure your sensitive data records using data masking or data de-identification methods
  • We can create high-quality test data from scratch through an innovative modeling and sampling process
  • Our software connects to a data repository, allowing the user to inherit old test cases in multiple formats, saving time and money
  • Our software enables the user to enrich test data with data manipulation, version control and data bulking
  • We can guarantee that all of these methods will retain the relational, distributional and correlational attributes from the original production database
Back to the top