What is Datastage?

Datastage is a comprehensive ETL tool for the fast, easy creation and maintenance of data marts and data warehouses. It provides the tools we need to build, manage and expand them. It helps to extract data, transform and load data from source to the target. The data sources might include sequential files, indexed files, relational databases, external data sources, archives, enterprise applications, etc. Datastage facilitates business analysis by providing quality data to help in gaining business intelligence.

What tasks it can do?

· Design jobs for Extraction, Transformation, and Loading (ETL)

· Ideal tool for data integration projects – such as, data warehouses, data marts and system migrations.

· Import, export, create, and managed metadata for use within jobs.

· Schedule, run, and monitor jobs all within DataStage Administer your DataStage development and execution environments.

DataStage Architecture
DS Arch - Datastage Tutorials

Datastage have a client-server architecture and there are two engines i.e.Server engine which runs the Datastage Server jobs and the Parallel engine that runs the Parallel jobs.

The DataStage Client components include:

  • Administrator – Administers DataStage projects and conducts housekeeping on the server.
  • Designer – Creates DataStage jobs that are compiled into executable programs.
  • Director – Used to run and monitor the DataStage jobs.
  • Manager – Allows you to view and edit the contents of the repository.


The DataStage Server components include:

  • Server – Datastage Server/ Parallel engines that does the work of ETL and incorporates a variety of stages through which source data is processed and reinforced into target databases.
  • Repository – It is used to store Datastage objects and consists of projects, shared metadata, data, and configuration information for Information Server and is shared with other application in the suite.
Some DataStage Stages with Example :