Sort Stage Example

It is one of a processing stage and as the name suggests, used to perform sort operations. Some of the stages in Datastage requires sorted input data like Join and Merge stage as some stages like Aggregator uses less memory space with the presorted input.

Sorts can be performed by two ways:

· Using In-Stage sort option (or popularly called as Link Sort) :

In-stage sort can be applied when the volume of data is less than the buffer size defined as it requires buffer to process the data. If the volume of data goes beyond the size of buffer, Datastage will throw an error and will lead the job to abort.

This can be specified on the InputàPartitioning tab on stage options, when specifying a keyed partitioning need to set partitioning to anything other than Auto.

· Using Sort stage:

Sort stage provides a variety of functionality over Link sort. It can handle processing of large volume of data and can accept sorting on one or more keys.

Sort Options:

o    Stable sort: Preserves order of sorted data.

o    Allow duplicates: Allows duplicates after sorting of data.

o    Memory Usage: Memory space / disk space is used for better performance.

o    Create Key change Column: Adds a column with value 1 or 0. 1 indicates that the key value has been changed. 0 means that the key value is the same and hasn’t 

     changed. It is useful for processing groups       of rows in a Transformer.

Example:

In below example, we have Employee data as an input and we are going to sort the data on Employee salary.

Input Data:

inputData - Sort Stage Example

Design the job as shown below:

SortMain - Sort Stage Example

Under Sort stage Properties, provide the sorting keys on which sort operation needs to perform along with the sort order.

Sort2 - Sort Stage Example

Select the Sorting Key = ‘annualSalary’ and Sort Order=’Ascending’ to sort the data on Annual Salary and sort salary in ascending order that would make a person with lowest salary on the top, as shown below.

sortProperties - Sort Stage Example

Under OutputàMapping tab, select the columns from input and drag and drop to output columns as shown below.

sort4 - Sort Stage Example

Output Data:

SortAscResult

Select the Sorting Key = ‘annualSalary’ and Sort Order=’Descending’ to sort the data on Annual Salary and Sort salary in descending order that would make a person with highest salary on the top, as shown below.

sort10 - Sort Stage Example

Output Data:

sort11 - Sort Stage Example