My scrapbook about anything which I learned or want to remember, Sometime about tech tips, thoughts and rambling. If you find anything useful don't forget to give thumbs-up :)

Breaking

Sunday, May 8, 2016

The DW Testing Life Cycle



As with any other piece of software a DW implementation undergoes the natural cycle of Unit testing, System testing, Regression testing, Integration testing and Acceptance testing. However, unlike others there are no off-the-shelf testing products available for a DW.


Unit testing:

Traditionally this has been the task of the developer. This is a white-box testing to ensure the module or component is coded as per agreed upon design specifications. The developer should focus on the following:

a)  All inbound and outbound directory structures are created properly with appropriate permissions and sufficient disk space. All tables used during the ETL3 are present with necessary privileges.

b)  The ETL routines give expected results:
All transformation logics work as designed from source till target
Boundary conditions are satisfied− e.g. check for date fields with leap year dates
Surrogate keys have been generated properly
NULL values have been populated where expected
Rejects have occurred where expected and log for rejects is created with sufficient details
Error recovery methods

c) That the data loaded into the target is complete:

All source data that is expected to get loaded into target, actually get loaded− compare counts between source and target and use data profiling tools
All fields are loaded with full contents− i.e. no data field is truncated while transforming
No duplicates are loaded
Aggregations take place in the target properly
Data integrity constraints are properly taken care of

                                 https://www.barcodesinc.com/generator/image.php?code=The%20DW%20Testing%20Life%20Cycle&style=197&type=C128B&width=375&height=50&xres=1&font=3

System testing:

 Generally the QA team owns this responsibility. For them the design document is the bible and the entire set of test cases is directly based upon it. Here we test for the functionality of the application and mostly it is black-box. The major challenge here is preparation of test data. An intelligently designed input dataset can bring out the flaws in the application more quickly. Wherever possible use production-like data. You may also use data generation tools or customized tools of your own to create test data. We must test for all possible combinations of input and specifically check out the errors and exceptions. An unbiased approach is required to ensure maximum efficiency. Knowledge of the business process is an added advantage since we must be able to interpret the results functionally and not just code-wise.

The QA team must test for:

Data completeness and correctness− match source to target counts and validate the data.
Data aggregations− match aggregated data against staging tables and/or ODS
Lookups/Transformations is applied correctly as per specifications
Granularity of data is as per specifications
Error logs and audit tables are generated and populated properly
Notifications to IT and/or business are generated in proper format



Regression testing:

 A DW application is not a one-time solution. Possibly it is the best example of an incremental design where requirements are enhanced and refined quite often based on business needs and feedbacks. In such a situation it is very critical to test that the existing functionalities of a DW application are not messed up whenever an enhancement is made to it. Generally this is done by running all functional tests for existing code whenever a new piece of code is introduced. However, a better strategy could be to preserve earlier test input data and result sets and running the same again. Now the new results could be compared against the older ones to ensure proper functionality.


Integration testing:

This is done to ensure that the application developed works from an end-to-end perspective. Here we must consider the compatibility of the DW application with upstream and downstream flows. We need to ensure for data integrity across the flow. Our test strategy should include testing for:

Sequence of jobs to be executed with job dependencies and scheduling
Re-startability of jobs in case of failures
Generation of error logs
Cleanup scripts for the environment including database

This activity is a combined responsibility and participation of experts from all related applications is a must in order to avoid misinterpretation of results.




Acceptance testing:

 This is the most critical part because here the actual users validate your output datasets. They are the best judges to ensure that the application works as expected by them. However, business users may not have proper ETL knowledge. Hence, the development and test team should be ready to provide answers regarding ETL process that relate to data population. The test team must have sufficient business knowledge to translate the results in terms of business. Also the load windows refresh period for the DW and the views created should be signed off from users.
Performance testing:

In addition to the above tests a DW must necessarily go through another phase called performance testing. Any DW application is designed to be scaleable and robust. Therefore, when it goes into production environment, it should not cause performance problems. Here, we must test the system with huge volume of data. We must ensure that the load window is met even under such volumes. This phase should involve DBA team, and ETL expert and others who can review and validate your code for optimization.





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

No comments:

Post a Comment

Disclaimer

The postings on this site are my own and don't necessarily represent IBM's or other companies positions, strategies or opinions. All content provided on this blog is for informational purposes and knowledge sharing only.
The owner of this blog makes no representations as to the accuracy or completeness of any information on this site or found by following any link on this site. The owner will not be liable for any errors or omissions in this information nor for the availability of this information. The owner will not be liable for any losses, injuries, or damages from the display or use of his information.