Datastage interview questions
1)How can we achieve parallelism ?
The degree of parallelism is achieved by configuring the
multiple nodes in the config file. In the config files we can specify multiple nodes.
2)What are Stage Variables, Derivations and Constants?
Stage Variable - An intermediate processing variable that retains value during read and doesnt pass the value into target column.
Derivation - Expression that specifies value to be passed on to the target column.
Constant - Conditions that are either true or false that specifies flow of data with a link.
3)Compare and Contrast ODBC and Plug-In stages?
ODBC : a) Poor Performance.
b) Can be used for Variety of Databases.
c) Can handle Stored Procedures.
Plug-In: a) Good Performance.
b) Database specific.(Only one database)
c) Cannot handle Stored Procedures.
4)How to run a Shell Script within the scope of a Data stage job?
select the EDIT tab in the toolbar-> choose job properties-> select the job parameters->choose the Before/ After routines ->select the EXCESH command
5)How do you merge two files in DS?
Either used Copy command as a Before-job subroutine if the metadata of the 2 files are same or created a job to concatenate the 2 files into one if the metadata is different.
6)How can we pass parameters from one job to another job by using command line prompt?
We can pass parameter to a job using two ways .. using dsjob- command line or from a sequencer.
Other way would be -
You configure single parameter set ( version 8.0 onwards) and use the same in both the jobs so that they share the same set of parameters.
7)When we are extracting the flatfiles, What are the basic required validations?
Following are some common validations performed:
a) Check for blank lines and remove them.
b) Check the number of column in each row of the file.
c) If there is a trailer line in the flat file containing additional information like total number of records,then a cross check is performed to check if the number of records specified in the trailer and the actual number of records are same.
d) Check if a column contains blank value (If it is expected to have values).
8)How do you do Usage analysis in datastage ?
1. If u want to know some job is a part of a sequence, then in the Manager right click the job and select Usage Analysis. It will show all the jobs dependents.
2. To find how many jobs are using a particular table.
3. To find how many jobs are using a particular routine.
Like this, u can find all the dependents of a particular object.
Its like nested. U can move forward and backward and can see all the dependents.
9)Types of Parallel Processing?
Parallel Processing is broadly classified into 2 types.
a) SMP - Symmetrical Multi Processing.
b) MPP - Massive Parallel Processing.
10)Do u know about METASTAGE?
MetaStage is used to handle the Metadata which will be very useful for data lineage and data analysis later on. Meta Data defines the type of data we are handling. This Data Definitions are stored in repository and can be accessed with the use of MetaStage.
11)Difference between Hashfile and Sequential File?
Difference between Hashfile and sequential file is , searching a record is too fast in hash file based on the hashkey, we can get the address of record directly in hashfile based on the hashkey, and in sequential file it should search record sequential mode only, it has to search for record by record, and we can remove duplicate records based on the hash key in hashfile, we cannot in sequential file.
12)If I add a new environment variable in Windows, how can I access it in DataStage?
U can view all the environment variables in designer. U can check it in Job properties. U can add and access the environment variables from Job properties
13)What is the difference between LOOK UP File Stage and LookUP stage ?
LookUP stage is used to deal on reference data set with source data .
where as LOOK UP File Stage is used to create the reference data set for the look up stage for to perform the look up operation with the source data.
14)What is the difference between Symetrically parallel processing,Massively parallel processing?
Symmetric Multiprocessing (SMP) - Some Hardware resources may be shared by processor. Processor communicate via shared memory and have single operating system.
Cluster or Massively Parallel Processing (MPP) - Known as shared nothing in which each processor have exclusive access to hardware resources. CLuster systems can be physically dispoersed.The processor have their own operatins system and communicate via high speed network