As we all aware about the best partitioning method is Round Robin but this method distribute the whole data to all the partition irrespective of Key ( Round Robin is Keyless partitioning method) which is usually we do not want and when we consider the key, It's Hash.
DataStage sorting and hashing improves the data processing speed which is one of our targets to achieve in projects. So, let's create a list of some important stages and see whether they need the partitioning or sorting to perform better.
Stages | Partition(Hash) | Sort |
Sort | Yes | No |
Aggregator | Yes | Yes |
Join | Yes | Yes |
Remove Duplicate | No | No |
Merge | Yes | Yes |
Lookup | No | No |
Like the below page to get update
No comments:
Post a Comment