DataStage Main Functions
In its simplest form, IBM InfoSphere DataStage performs data transformation
and movement from source systems to target systems in batch and in real time.
The data sources might include indexed files, sequential files, relational
databases, archives, external data sources, enterprise applications, and
message queues.
DataStage manages data that arrives and data that is received on a periodic or
scheduled basis. It enables companies to solve large-scale business problems
with high-performance processing of massive data volumes. By leveraging the
parallel processing capabilities of multiprocessor hardware platforms, DataStage
can scale to satisfy the demands of ever-growing data volumes, stringent
real-time requirements, and ever-shrinking batch windows.
Leveraging the combined suite of IBM Information Server, DataStage can
simplify the development of authoritative master data by showing where and how
information is stored across source systems. DataStage can also consolidate
disparate data into a single, reliable record, cleanses and standardizes
information, removes duplicates, and links records together across systems. This
master record can be loaded into operational data stores, data warehouses, or
master data applications such as IBM MDM using IBM InfoSphere DataStage.
IBM InfoSphere DataStage delivers four core capabilities:
_ Connectivity to a wide range of mainframe, legacy, and enterprise
applications, databases, file formats, and external information sources.
_ Prebuilt library of more than 300 functions including data validation rules and
very complex transformations.
_ Maximum throughput using a parallel, high-performance processing
architecture.
_ Enterprise-class capabilities for development, deployment, maintenance, and
high-availability. It leverages metadata for analysis and maintenance. It also
operates in batch, real time, or as a Web service.
IBM InfoSphere DataStage enables an integral part of the information integration
process