Data Generation – Best practices in Test Data Generation
With emerging trends, the technology is also shifting from the code generation (data generation ) paradigm to the data model. The main idea behind test data generation is testing the competence of a software or an app. Testing an app with real data is important to bridge with real-time scenarios and make the necessary changes accordingly.
Classification of Test Data Generation
Test Data Generators can be broadly classified into:
Arbitrary Test Data Generator: As the name suggests, it is a random test data generator. It is the most uncomplicated data generation technique and is based on prospects. Thus, it can’t achieve high quality coverage of test data.
Aim-Oriented Test Data Generator: Here, input set is generated for any path, instead of entry to exit block of code. Control flow graph plays a very vital role in these types of test data generation technique, thus reducing a probability-prone and infeasible path based test data generation and providing an opportunity of direct search.
Path-Oriented Test Data Generator: This is the best test data generation technique among the lot. In this, an unsurpassed specific path is offered, instead of multiple paths for a control flow. This technique is centralized on fault based testing. Another name for this type of testing is Mutation Testing. The changes done in the code after this type of test are called ‘mutants’.
Intelligent Test Data Generator: This technique draws upon the complicated analysis of code to pave way for the search of test data. Here, test data generation method is utilized along with the comprehensive analysis of the code. This technique involves thorough analysis to anticipate different upcoming situations.
Test Data Generator Life Cycle
Steps involved in Test Data Generation are as follows:
- Control Flow Graph Creation: It consists of the representation of possible transfer of control.
- Path Selection: In this step, the path of program – especially the control flow, is identified.
- Input Data Derivation: After the selection of path, set of realistic input data are generated for the selected path, determining the control flow. This is the test data generation step.
A test data generator takes help of Program Analyzer for the same. Program Analyzer has many tasks to complete during the process. The Program Analyzer firstly retro inspects the control flow graph and approaches the path selector to gather the set of selected paths. Again, it’s the Program Analyzer which mulls over the control flow graph and data dependence and approaches Test Data Generator to generate test data set for each flow. Test Data Generator also consults the Path selector before test data set generation to ensure the authenticity of available path information.
Best Practices for Test Data Generation
- Naming Canon: The name of the test data should be in accordance of module name or functional area to make the reference very clear to the future onlookers.
- Test Data Requisites: The expected performance should be clearly mentioned. Dependencies should also be declared with clarity. The functions and modules which will use the test data should be clearly queued as well.
- Range of test data: The range should be specified well in advance for each data.
- Re usability should be taken care of: The Test data should be written in a way that they can be used in the future too.
- Maximum Coverage: Test data should be optimum in number and should have the maximum coverage.
- Anon clause should be clear: Changes to be made to test data should be clearly mentioned to be used for later test case in case of identical functionality.
- Scope field: The scope field like test boundaries, OS, database types etc. should be clearly declared.
- Description: A brief description of test data should be given, which specifies the objective of the test data.
Challenges faced in Test Data Generation
Test data generation is quite complex as there is no standard skeleton for finding out the test data. The following are the various areas which require further study for test data generation:
Arrays and Pointers: The main problem exists during the symbolic execution, especially dynamic allocation of array and pointers and index or array or structure of the input of the pointer.
Objects: The OO features intensify the complexity as objects are dynamic by nature and it’s difficult to find out the exact code that would be called at run time. Use of mutation has been attempted to combat this problem.
Loops: Which path will be followed at the run time always remains a question mark, thus making the entire process of test data generation complex.
Despite these and a few other prevailing problems and challenges, Test Data Generation is making tasks easier with various available possibilities of creating large quantities and/or random data for testing purposes, thus reducing code conversion efforts.