Change Capture Stage Example
The Change Capture Stage is one of a processing stage and the purpose of this stage as the name suggests is to capture the change between two input data by comparing them based on a Key column. The two input links are linked with Change Capture stage by the two default link names i.e. ‘Before’ and ’After’. This change captured is mentioned in the output in the form of ‘Change code’ in a separate column. These codes are:
‘0’ means, if the data is copied as it is from ‘Before’ Link to ‘After’ Link.
‘1’ means, the data is newly Inserted in the ‘After’ link.
‘2’ means, the data is Deleted from ‘Before’ link.
‘3’ means, the data is Edited in ‘After’ link from ‘Before’
In this example, we have two employee databases ‘BeforeEmployee’ and ‘AfterEmployee’. BeforeEmployee will have database which is the employee database and the data is goin to be pulled from it as in Before link. AfterEmployee will have the database which gets updated after certain interval of time and stores the updated version of Employee database.
Here, we are going to capture the changes happened between the BeforeEmployee and AfterEmplyoee databases.
Design the job as shown below:
1. BeforeEmployee data:
2. AfterEmployee data:
Open the Change capture stage properties by double click on the stage or right clickàselect the properties from drop down menu. Under Stage tab select the Properties option. Here select the Key column/set of columns on based of which the input data needs to be compared.
Select the Sort order to get a sorted data in the output. Nextà Select the Change Values, for the columns the change of data needs to be captured. Here in our case, we have selected ‘middlename’ and ‘dateofbirth’, which means if there is any change in after and before link data for these column, the change will be captured and a change code is generated.
Under Stage tab, select the option Link ordering where we can set the Before and After link as per our requirement.
Under Output tab, select the mapping option and provide the mapping from source to target. Here we can see that ChangeCode() column is an extra column in the Output tab along with the source which will reflect the changes that has been captured between the two input databases.
We can pull this column to target to see the changes.
Next step, configure the output Dataset to capture the target data. Save and compile the job.
Run the job to see the result.
Here we can see the change code for each row from before and after link.
0 à Data is copied as it is from before to after link.
1à Data is newly inserted in after link.
2à Data is deleted from after link. (Here the employeeID and other two columns for the deleted record can be seen. As employeeID is a key column)
3à Data is Edited in after link from before link. (Here for employeeID A8920 and K7822, Middle name has been changed in after link)