We’ve talked about automating the testing, but we also need to talk about Test Data Management. You can’t automate your tests unless you know what data you are dealing with. If you pass a function the value “PI” and expect it to look up the value in a table and return “3.14159” back, but someone changed the value in the table, your test will fail, even though it worked properly.
False negatives and false positives can be the result of bad code or bad data. To reduce the problems associated with the data you need to understand the data you are dealing with and ensure that the data on test run #1 is the same as the data on test run #100.
You need to manage your test data.
Informatica states that Test Data Management
Test data management is the creation of non-production data sets that reliably mimic an organization’s actual data so that systems and applications developers can perform rigorous and valid systems tests.
CA mentions pretty much the same thing when they advertise their tools. IBM, in a blog post from years ago, stated that there were five best practices for test data management:
- Discover and understand the test data
- Extract a subset of production data from multiple data sources
- Mask or de-identify sensitive test data
- Automate expected and actual result comparisons
- Refresh test data
Everyone seems to agree that test data management is important. One of the key things that the authors of “Accelerate” want people to know, however, is that you should minimize the amount of test data. You don’t need terabytes of data to do your automated testing. I’m not even sure that you need gigabytes unless you are testing something extremely complex with a plethora of decision points in your code base.
You don’t need production data in your lower environments. You need production-like data. But you also need other data. You need data that may or may not be in production due to the rarity of the situation. You need data that you’ve designed for, but may not be in production just yet. You need to understand the data that is required and build your test data around those requirements.
If that means taking production data and masking it, so be it. If that means custom creating data, then that work too. Most likely it will be a combination of the two. What it will not be, however, is a complete copy of production as that means you don’t understand the data that you need.
Will you need tools for this? Most likely, but the tools will depend on the processes that you adopt for the management of the test data. Each organization is different so the tools used as a result will also likely be different.