As with any other piece of software a DW implementation undergoes the natural cycle of Unit testing, System testing, Regression testing, Integration testing and Acceptance testing. However, unlike others there are no off-the-shelf testing products available for a DW.
Unit testing:
Traditionally this has been the task of the developer. This is a white-box testing to ensure the module or component is coded as per agreed upon design specifications. The developer should focus on the following:
a) All inbound and outbound directory structures are created properly with appropriate permissions and sufficient disk space. All tables used during the ETL3 are present with necessary privileges.
b) The ETL routines give expected results:
All transformation logics work as designed from source till target
Boundary conditions are satisfied− e.g. check for date fields with leap year dates
Surrogate keys have been generated properly
NULL values have been populated where expected
Rejects have occurred where expected and log for rejects is created with sufficient details
Error recovery methods
c) That the data loaded into the target is complete:
All source data that is expected to get loaded into target, actually get loaded− compare counts between source and target and use data profiling tools
All fields are loaded with full contents− i.e. no data field is truncated while transforming
No duplicates are loaded
Aggregations take place in the target properly
Data integrity constraints are properly taken care of
System testing:
Generally the QA team owns this responsibility. For them the design document is the bible and the entire set of test cases is directly based upon it. Here we test for the functionality of the application and mostly it is black-box. The major challenge here is preparation of test data. An intelligently designed input dataset can bring out the flaws in the application more quickly. Wherever possible use production-like data. You may also use data generation tools or customized tools of your own to create test data. We must test for all possible combinations of input and specifically check out the errors and exceptions. An unbiased approach is required to ensure maximum efficiency. Knowledge of the business process is an added advantage since we must be able to interpret the results functionally and not just code-wise.
The QA team must test for:
Data completeness and correctness− match source to target counts and validate the data.
Data aggregations− match aggregated data against staging tables and/or ODS
Lookups/Transformations is applied correctly as per specifications
Granularity of data is as per specifications
Error logs and audit tables are generated and populated properly
Notifications to IT and/or business are generated in proper format
Regression testing:
A DW application is not a one-time solution. Possibly it is the best example of an incremental design where requirements are enhanced and refined quite often based on business needs and feedbacks. In such a situation it is very critical to test that the existing functionalities of a DW application are not messed up whenever an enhancement is made to it. Generally this is done by running all functional tests for existing code whenever a new piece of code is introduced. However, a better strategy could be to preserve earlier test input data and result sets and running the same again. Now the new results could be compared against the older ones to ensure proper functionality.
Integration testing:
This is done to ensure that the application developed works from an end-to-end perspective. Here we must consider the compatibility of the DW application with upstream and downstream flows. We need to ensure for data integrity across the flow. Our test strategy should include testing for:
Sequence of jobs to be executed with job dependencies and scheduling
Re-startability of jobs in case of failures
Generation of error logs
Cleanup scripts for the environment including database
This activity is a combined responsibility and participation of experts from all related applications is a must in order to avoid misinterpretation of results.
Acceptance testing:
This is the most critical part because here the actual users validate your output datasets. They are the best judges to ensure that the application works as expected by them. However, business users may not have proper ETL knowledge. Hence, the development and test team should be ready to provide answers regarding ETL process that relate to data population. The test team must have sufficient business knowledge to translate the results in terms of business. Also the load windows refresh period for the DW and the views created should be signed off from users.
Performance testing:
In addition to the above tests a DW must necessarily go through another phase called performance testing. Any DW application is designed to be scaleable and robust. Therefore, when it goes into production environment, it should not cause performance problems. Here, we must test the system with huge volume of data. We must ensure that the load window is met even under such volumes. This phase should involve DBA team, and ETL expert and others who can review and validate your code for optimization.
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/
No comments:
Post a Comment