The Head Stage can have a single input link and a single output link and selects the first N rows from each partition of an input data set and copies the selected rows to an output data set.
The Column Generator stage adds columns to incoming data and generates mock data for these columns for each data row processed. The new data set is then output. (See also the Row Generator stage which allows us to generate complete sets of mock data, Row generator stage.)
The Remove Duplicates stage is one of a processing stage. It can have a single input link and a single output link. The Remove Duplicates stage takes a single sorted data set as input, removes all duplicate rows, and writes the results to an output data set. It has more sophisticated ways to remove duplicates for example we have an option to choose to retain the First or Last duplicate to keep.
The Lookup stage is most appropriate when the reference data for all lookup stages in a job is small enough to fit into available physical memory. Each lookup reference requires a contiguous block of shared memory. If the Data Sets are larger than available memory resources, the JOIN or MERGE stage should be used.
The Change Capture Stage is one of a processing stage and the purpose of this stage as the name suggests is to capture the change between two input data by comparing them based on a Key column. The two input links are linked with Change Capture stage by the two default link names i.e. ‘Before’ and ’After’. This change captured is mentioned in the output in the form of ‘Change code’ in a separate column.
The DataStage Designer is the primary interface to the metadata repository and provides a graphical user interface that enables you to view, edit, and assemble DataStage objects from the repository needed to create an ETL job.
The Join stage is one of the processing stage. It performs join operations on two or more data sets input to the stage and then outputs the resulting data set. The Join stage performs four types of join operations i.e. ‘Left Outer Join’, ‘Right Outer Join’, ‘Inner Join’, ‘Full Outer Join’.
The Funnel stage is one of the processing stage. It copies multiple input data sets to a single output data set. This operation is useful for combining separate data sets into a single large data set. The stage can have any number of input links and a single output link.