Look up Stage Example
The Lookup stage is most appropriate when the reference data for all lookup stages in a job is small enough to fit into available physical memory. Each lookup reference requires a contiguous block of shared memory. If the Data Sets are larger than available memory resources, the JOIN or MERGE stage should be used.
Lookup stages do not require data on the input link or reference links to be sorted. Be aware, though, that large in-memory lookup tables will degrade performance because of their paging requirements. Each record of the output data set contains columns from a source record plus columns from all the corresponding lookup records where corresponding source and lookup records have the same value for the lookup key columns. The lookup key columns do not have to have the same names in the primary and the reference links.
The optional reject link carries source records that do not have a corresponding entry in the input lookup tables.
You can also perform a range lookup, which compares the value of a source column to a range of values between two lookup table columns. If the source column value falls within the required range, a row is passed to the output link. Alternatively, you can compare the value of a lookup column to a range of values between two source columns. Range lookups must be based on column values, not constant values. Multiple ranges are supported.
· Equality match:
Match exactly values in the lookup key column of the reference link to selected values in the source row. It returns row or rows (if multiple matches are to be returned) that match.
· Caseless match:
It is like an equality match except that it’s caseless. Ex, “xyz” matches “XyZ”.
· Range on the reference link:
Here, two columns on the reference link define the range. A match occurs when a selected value in the source row is within the range.
· Range on the source link:
Here, two columns on the source link define the range. A match occurs when a selected value in the reference row is within the range.
Lookup Failure Options :
Fail: (Default): when lookup fails, Stage reports an error and the job fails/aborts immediately.
Drop: when lookup fails, unmatched records are dropped.
Continue: when lookup fails, input records are transferred to the output as it is. Reference link columns are filled with Null or default values.
Reject: when lookup fails, input row is sent to a reject link. This requires that a reject link has been created for the stage; else stage will throw an error.
In below example, we have 2 input files i.e. ‘Emplyoee.txt’ and ‘Job_Desc.txt’.
Here we are going to lookup both the files on a key column ‘employeeID’ and pushing the data to the target Dataset file.
1. Employee.txt contains Employee data.
Design the job as shown below:
Open Lookup stage properties window, we can see the columns from both input files and a blank output. Here we need to lookup ‘employeeID’ column and pull the data to target columns.
Select the ‘employeeID’ column from Lnk_Emp to ‘employeeID’ column of Lnk_Job which is an ‘Equality match lookup’ option.
Setup the lookup conditions/failure option by selecting the constraints tab from Left corner of the mapping column window.
Lookup Failure Options:
Modify the earlier job design and add a reject link to capture the rejected data into a file.
Multiple rows returned from link:
Select the link name from the Multiple rows returned from link list (note that only one reference link in a Lookup stage is allowed to return multiple rows, and that this feature is only available for in-memory lookups).