First - Audit Strategy in ETL #1
As shown in the above diagram, the audit services are dependent on source systems and target systems for capturing audit data; unlike error handling services are dependent on ETL process to capture error data. Audit services are created as shared services that plug in to each ETL process.
Within the proposed Next Gen BI environment there are 4 stages (Initial Staging, Clean Staging, EDW, and Data Marts) where data is landed after extracting from source systems.
There will be at least one ETL job flow process at each of these stages extracting and loading data. Audit services are called at each stage by the ETL job flow process and the audit data is captured and reported.
Some of the key audit data that will be captured by the audit services is given below :
Field # | Field | Description |
1 | Data Server Name | A name that identifies a database service. This is called a "server" by some products. Oracle refers to it by the name "instance" or "database". |
2 | Database Name | |
3 | Data Table Name | A name that identifies a Target Data Table within a Database. |
4 | Source Name | A name that identifies the file, table or other data structure that is the origin of the data to be loaded into the target table. |
5 | ETL Process Name | DataStage Sequencer |
6 | ETL Sub Process Name | DataStage Job |
7 | New Rows in Source Count | The number or rows in the source table that have been added since the prior run date. This is the number that the process attempted to add to the target table. (If the source is a file, this field should be NULL.) |
8 | Changed Rows in Source Count | The number of rows in the source table which were modified since the prior run date. This is the number that the process attempted to modify in the target. (If the source is a file, this field will be NULL.) |
9 | Processed Rows in Source Count | The total number of rows in the source table or file that the process attempted to load to the target. (If the source is a table, this is the sum of the New_Rows_in_Source_Cnt and Changed_Rows_in_Source_Cnt.) |
10 | Starting Row Count | The number of rows in a table at the beginning of a Table Load. |
11 | Ending Row Count | The number of rows in a table at the conclusion of a Table Load. |
12 | Insert Row Count | The number of relational database table rows that are processed in an INSERT operation. |
13 | Update Row Count | The number of relational database table rows that are processed in an UPDATE operation. |
14 | DW_CREATE_DTTM | Date/time at which audit data was captured |
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/
No comments:
Post a Comment