Merge Stage Example
The Merge stage is one of a processing stage. It can have any number of input links, a single output link, and the same number of reject links as there are update input links.(according to DS documentation)
Merge stage combines a master dataset with one or more update datasets based on the key columns. The output record contains all the columns from master record plus any additional columns from each update record that are required.
A master record and update record will be merged only if both have same key column values.
The data sets input to the Merge stage must be Key partitioned and sorted. This ensures that rows with the same key column values are located in the same partition and will be processed by the same node. It also minimizes memory requirements because fewer rows need to be in memory at any one time.
Unlike Join stages and Lookup stages, the Merge stage allows you to keep or drop unmatched records by specifying several reject links. You can route update link rows that fail to match a master row down a reject link that is specific for that link. You must have the same number of reject links as you have update links. The Link Ordering tab on the Stage page lets you specify which update links send rejected rows to which reject links. You can also specify whether to drop unmatched master rows, or output them on the output data link.
In below example we are having 3 input files Emp.txt, Addr.txt and JobDesc.tx as input which contains employee details, employee address and employee job details. Here we are going to merge these 3 input files on the basis of EmpID and taking all the necessary employee details and pull those into our target file. The EmpIDs not matching are captured in respective reject files.
Design the job similar to as shown below:
Open Properties window of the Merge stage by double click on it or Right clickàselect Properties from drop down. Select the Key column on which merge operation needs to perform.
We can also select the Sort order as Ascending or Descending from the option as below.
We can provide the Link order of target and reject files under Stage à Link Ordering tab similar to shown below.
Under Output à Mapping tab, select the columns from input, drag and drop those to output similar to below snapshot to perform mapping of columns from input to output.
Save the job and compile and run it. After successful job run we can find the rows inserted in the target and reject files as below.
2. Ds_RejJob Data:
3. Ds_RejAddr Data: