Sort Stage Example
It is one of a processing stage and as the name suggests, used to perform sort operations. Some of the stages in Datastage requires sorted input data like Join and Merge stage as some stages like Aggregator uses less memory space with the presorted input.
Sorts can be performed by two ways:
· Using In-Stage sort option (or popularly called as Link Sort) :
In-stage sort can be applied when the volume of data is less than the buffer size defined as it requires buffer to process the data. If the volume of data goes beyond the size of buffer, Datastage will throw an error and will lead the job to abort.
This can be specified on the InputàPartitioning tab on stage options, when specifying a keyed partitioning need to set partitioning to anything other than Auto.
· Using Sort stage:
Sort stage provides a variety of functionality over Link sort. It can handle processing of large volume of data and can accept sorting on one or more keys.
o Stable sort: Preserves order of sorted data.
o Allow duplicates: Allows duplicates after sorting of data.
o Memory Usage: Memory space / disk space is used for better performance.
o Create Key change Column: Adds a column with value 1 or 0. 1 indicates that the key value has been changed. 0 means that the key value is the same and hasn’t
changed. It is useful for processing groups of rows in a Transformer.
In below example, we have Employee data as an input and we are going to sort the data on Employee salary.
Design the job as shown below:
Under Sort stage Properties, provide the sorting keys on which sort operation needs to perform along with the sort order.
Select the Sorting Key = ‘annualSalary’ and Sort Order=’Ascending’ to sort the data on Annual Salary and sort salary in ascending order that would make a person with lowest salary on the top, as shown below.
Under OutputàMapping tab, select the columns from input and drag and drop to output columns as shown below.
Select the Sorting Key = ‘annualSalary’ and Sort Order=’Descending’ to sort the data on Annual Salary and Sort salary in descending order that would make a person with highest salary on the top, as shown below.