Join Stage Example

The Join stage is one of the processing stage. It performs join operations on two or more data sets input to the  stage and then outputs the resulting data set. The Join stage is one of three stages that join tables based on the values of key columns.

Input links must be sorted. There would be a Left link and a Right link and the stage support additional ‘Intermediate links’.

The join is to be performed on a Key column from Left and Right links where the Column names for the same should match.

The stage supports one of the four join operations:

Inner Join: It transfers records from input data sets whose key columns contain equal values to the output data set. Records whose key columns do not contain equal values are dropped.

Left outer Join: It transfers all values from the left data set but transfers values from the right data set and intermediate data sets only where key columns match. The stage drops the key column from the right and intermediate data sets.

Right outer Join: It transfers all values from the right data set and transfers values from the left data set and intermediate data sets only where key columns match. The stage drops the key column from the left and intermediate data sets.

Full outer Join: transfers records in which the contents of the key columns are equal from the left and right input data sets to the output data set. It also transfers records whose key columns contain unequal values from both input data sets to the output data set. (Full outer joins do not support more than two input links.)

Example:

In the below example, we are having employee.txt and Job_Desc.txt as an input and we are going to perform join operations on key column ‘employeeID’.

Design the job same as below:

Join1 - Join Stage Example

Input Data:

1. Employee.txt

join4 - Join Stage Example

2. Job_Desc.txt

join5 - Join Stage Example

Open join stage properties by double click on it or right clickàProperties, under StageàProperties tab, select the Key column as below:

Join2 - Join Stage Example

Select Join operation type from the Optionsà Join Type option similar to below:

Inner Join:

join6 - Join Stage Example

Output Data:

Transfers records from both the data files whose key columns have matching values.
joinInnerRes - Join Stage Example

Full Outer Join:

join3 - Join Stage Example

Output Data:

Transfers records from both the data files whose key columns contain equal, values to the output link. It also transfers records whose key columns contain unequal values from both the input links to the output link. It treats both the inputs symmetrically. 

joinFullOuterRes - Join Stage Example

Left Outer Join:

join7 - Join Stage Example

Output Data:

It transfers all the records from the Left link and transfers values from the Right link only where key columns match. If a record doesn’t contain a value or so for a particular matched column will be updated with a NULL or a default value. Ex, for employeeID A8920,B6840,C1130 for Columns Currenjob,startDate,endDate,isManager. 

joinLeftOuterRes - Join Stage Example

Right Outer Join:

join8 - Join Stage Example

Output Data:

It transfers all the records from the Right link and transfers values from the Left link only where key columns match. If a record doesn’t contain a value or so for a particular matched column will be updated with a NULL or a default value. 

joinRightOuterRes - Join Stage Example