My scrapbook about anything which I learned or want to remember, Sometime about tech tips, thoughts and rambling. If you find anything useful don't forget to give thumbs-up :)

Breaking

Monday, February 29, 2016

Building Test data from Live Data in DataStage



 Datastage is providing below stages to generate test data from live data (real data). We are well aware about these stages but not aware with this kind of use (I know :-), we don't)

Think about it !



Head stage: selects the first N records from each partition of an input data set and copies the selected records to an output data set.






Tail stage: selects the last N records from each partition of an input data set and copies the selected records to an output data set.






Sample stage: samples an input data set. Operates in two modes: Percent mode, extracts rows, selecting them by means of a random number generator, and writes a given percentage of these to each output data set; Period mode, extracts every Nth row from each partition, where N is the period which you supply.


Filter stage: transfers, unmodified, the records of the input data set which satisfy the specified requirements and filters out all other records.  You can specify different requirements to route rows down different output links.



External Filter stage: allows you to specify a UNIX command that acts as a filter on the data you are processing. An example would be to use the stage to grep a data set for a certain string, or pattern, and discard records which did not contain a match



Sequential File stage: FILTER OPTION - use this to specify that the data is passed through a filter program before being written to a file or files on output or before being placed in a dataset on input.








Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

No comments:

Post a Comment

Disclaimer

The postings on this site are my own and don't necessarily represent IBM's or other companies positions, strategies or opinions. All content provided on this blog is for informational purposes and knowledge sharing only.
The owner of this blog makes no representations as to the accuracy or completeness of any information on this site or found by following any link on this site. The owner will not be liable for any errors or omissions in this information nor for the availability of this information. The owner will not be liable for any losses, injuries, or damages from the display or use of his information.