(转)informatica 面试题大全
1 What is the difference between a data warehouse and a data mart?
Ø Dataware house:
It is a collection of data marts. Represents historical data. a data warehouse is a relational database which is specially designed for analysis purpose rather then for transactional purpose.
Ø Data mart:
It is a sub set of data ware housing.It can provide the data to analyze query reporting
& analysis. a datmart is subject oriented database which gives the data about each and every individual department in an organisation.
2 how can one connect two fact tables ? is it possible ? how?
This confirm dimenstion methodology.If a dimension table is connected to more then one Fact table is called confirm dimension.
Fact Tables are connected by confirmed dimensions, Fact tables cannot be connected directly, so means of dimension we can connect
3 suppose data are coming from different locations and those data will not change . is there any need to use surrogate key ?
Yes, We should use the surrogate key, here we are getting data from different locations means every one have one primary key, while transforming the data into target that time more than two key not in use so if you use surrogate key it will identified the duplicate fields in dimensional table.
4 what is the difference between aggregate table and fact table ? how do you load these two tables ?
A fact table typically has two types of columns: those that contain numeric facts (often called measurements), and those that are foreign keys to dimension tables. A fact table contains either detail-level facts or facts that have been aggregated. Fact tables that contain aggregated facts are often called summary tables or aggregated fact. A fact table usually contains facts with the same level of aggregation. Though most facts are additive, they can also be semi-additive or non-additive. Additive facts can be aggregated by simple arithmetical addition. A common example of this is sales. Non-additive facts cannot be added at all. An example of this is averages. Semi-additive facts can be aggregated along some of the dimensions and not along others. An example of this is inventory levels, where you cannot tell what a level means simply by looking at it.
5 What Oracle features can be used to optimize my Warehouse system?
Partition table, bitmap index, sequence ,table function ,sql loader ,function like cube ,roll_up etc.
6 When should you use a STAR and when a SNOW-FLAKE schema?
The snowflake and star schema are methods of storing data which are multidimensional in nature (i.e. which can be analysed by any or all of a number of independent factors) in a relational database .The snowflake schema (sometimes called snowflake join schema) is a more complex schema than the star schema because the tables which describe the dimensions are normalized.Snowflake schema is nothing but one dimension table will be connected to another dimension table and so on.
Snowflake
? If a dimension is very sparse (i.e. most of the possible values for the dimension have no data) and/or a dimension has a very long list of attributes which may be used in a query, the dimension table may occupy a significant proportion of the database and snow flaking may be appropriate.
? A multidimensional view is sometimes added to an existing transactional database to aid reporting. In this case, the tables which describe the dimensions will already exist and will typically be normalized. A snowflake schema will hence be easier to implement.
? A snowflake schema can sometimes reflect the way in which users think about data. Users may prefer to generate queries using a star schema in some cases, although this may or may not be reflected in the underlying organization of the database.
? Some users may wish to submit queries to the database which, using conventional multidimensional reporting tools, cannot be expressed within a simple star schema. This is particularly common in data mining of customer databases, where a common requirement is to locate common factors between customers who bought products meeting complex criteria. Some snow flaking would typically be required to permit simple query tools such as Cognos
Power play to form such a query, especially if provision for these forms of query weren't anticipated when the data warehouse was first designed.
Star
The star schema (sometimes referenced as star join schema) is the simplest data warehouse schema, consisting of a single "fact table" with a compound primary key, with one segment for each "dimension" and with additional columns of additive, numeric facts.The star schema makes multi-dimensional database (MDDB) functionality possible using a traditional relational database. Because relational databases are the most common data management system in organizations today, implementing multi-dimensional views of data using a relational database is very appealing. Even if you are using a specific MDDB solution, its sources likely are relational databases. Another reason for using star schema is its ease of understanding. Fact tables in star schema are mostly in third normal form (3NF), but dimensional tables are in de-normalized second normal form (2NF). If you want to
normalize dimensional tables, they look like snowflakes (see snowflake schema) and the same problems of relational databases arise - you need complex queries and business users cannot easily understand the meaning of data. Although query performance may be improved by advanced DBMS technology and hardware, highly normalized tables make reporting difficult and applications complex.
7 When should one use an MD-database (multi-dimensionaldatabase) and not a relational one?
1 Because More than one dimensions can be shareble for Other Department
2 The Physical Load will be less.
3 Less Complexity of Fact
8 What is the difference between an ODS and a W/H?
An ODS is an environment that pulls together, validates, cleanses and integrates data from disparate source application systems. This becomes the foundation for providing the end-user community with an integrated view of enterprise data to enable users anywhere in the organization to access information for strategic and/or tactical decision support, day-to-day operations and management reporting.
The defination of Data Warehouse is as follows.
? Subject-oriented, meaning that the data in the database is organized so that all the data elements relating to the same real-world event or object are linked together;
? Time-variant, meaning that the changes to the data in the database are tracked and recorded so that reports can be produced showing changes over time;
? Non-volatile, meaning that data in the database is never over-written or deleted, but retained for future reporting;
? Integrated, meaning that the database contains data from most or all of an organization's operational applications, and that this data is made consistent.
Difference
Ods Transactions similar to those of an Online Transaction Processing System
Data Warehouse Queries process larger volumes of data
Ods Contains current and near current data
Data Warehouse Contains historical data.
Ods Typically detailed data only, often resulting in very large data volumes
Data Warehouse Contains summarised and detailed data,generally smaller in size than on ODS
Ods Real-time and near real-time data loads
Data Warehouse Typically batch data loads
Ods Generally modeled to support rapid data update
Data Warehouse Generally dimensionally modeled and tunes to optimise query performance
Ods Updated at the data field leve
Data Warehouse Data is appended, not updated
Ods Used for detailed decision making and operational reporting
Data Warehouse Used for ling-term decision making and management reporting
Ods Knowledge workers (customer service representatives, line managers)
Data Warehouse Strategic audience (executives, business unit management)
9 What is the difference between a W/H and an OLTP application?
Warehouse is used for high level data analysis purpose .It is used for predictions, time series analysis, financial analysis , what -if simulations etc. Basically it is used for better decision making.
OLTP is NOT used for analysis purpose.It is used for transaction and data processing.
Its basically used for storing day-to-day transactions that take place in an organisation.
The main focus of OLTP is easy and fast inputing of data, while the main focus in data warehouse is easy retrieval of data.
OLTP doesnt store historical data.(this is the reason why it cant be used for analysis)
DW stores historical data
10 What is the difference between OLAP, ROLAP, MOLAP and HOLAP?
ROLAP ROLAP stands for Relational Online Analytical Process that provides multidimensional analysis of data, stored in a Relational database(RDBMS).
MOLAP MOLAP(Multidimensional OLAP), provides the analysisof data stored in a multi-dimensional data cube.
HOLAP HOLAP(Hybrid OLAP) a combination of both ROLAP and MOLAP can provide multidimensional analysis simultaneously of data stored in a multidimensional database and in a relational database(RDBMS).
DOLAP DOLAP(Desktop OLAP or Database OLAP)provide multidimensional analysis locally in the client machine on the data collected from relational or multidimensional database servers.
11 what are the types of dimension tables
1.Conformed Dimension. 2.Junk Dimension. 3.Degenerated Dimension.
4.Slowly changing Dimensions.
12 What is a Data Cube?
Data cube is the logical representation of multidimensional data .The edge of the cube contains dimentions and the body of the cube contains datas.
13 How does data mining and data warehousing work together?
data warehousing is used to store the historical data.by using dwh bsiness users can analize thier business.data mining is used to predict the future .dwh will act as the source for the data mining
14 What is Bulk Insert?
In informatica ..bulk insert or bulk load does 2 things :-
1) Ignores the commit interval specified at the session level.
2) Do not create a database session log file.
So, advantage is it's very fast as no entry goes into log.
Disadvantage is session cannot be rolled back as no entry exists in the log file .
15 What is surrogate key?
A surrogate key in a database is a unique identifier for either an entity in the modeled world or an object in the database. The surrogate key is not derived from application data.There appear to be two definitions of a surrogate in the literature. We shall call these surrogate (1) and surrogate
(2):Surrogate (1) This definition is based on that given by Hall, Owlett and Todd (1976). Here a surrogate represents an entity in the outside world. The surrogate is internally generated by the system but is nevertheless visible by the user or application.
Surrogate (2) This definition is based on that given by Wieringa and de Jung (1991). Here a surrogate represents an object in the database itself. The surrogate is internally generated by the system and is invisible to the user or application.
16 What is a diff between joiner and lookup transformation?
Joiner will join the two different data sources based on a join condition ,and pass only the rows which satisfy that condition.discards the remaining rows.Joiner transformation supports 4 types of joins at Informatica level Normal .Master Outer.Detail Outer.Full Outer.LookUp Transformation
Lookup transformation basically for Reference ,based on the.lookup condition .when u want some data based on target data ,will take lookup on that particular table and retrieve the corresponding fields from that table.we can override the lookup transformation using the SQL
query.
17 What is the difference between View and Materialized View ?
A view has a logical existence but a materialized view has a physical existence .Moreover a materialized view can be indexed , analisied and so on....that is all the things that we can do with a table can also be done with a materialized view.
18 How could we generate the sequence of key values without using sequence generator transformation in the target ?
Do a lookup on the Target table with an Lookup SQl Override Select MAX(FIELD_NAME), field 1 , field3 from target group by field1, field2...In the Expression increment the Max values of the field
which you just got from the lookup by 1.Here MAX_FIELDNAME) is the Max value of the field you want to generate the sequence of..
19 why we need to use unconnected transformation?
Unconencted is used when ever u want to call the same transformation several times and u have one return port.We use unconnected transformation to use multiple number of tables or views without physically taking the entity into mapping.This kind of transformation is also helpful when single return port is required.Use dynamic cache if u want to update the case while updating the target table itself and static is untouched with the cache.
20 Whats the difference between $,
$
$ - These are the system variables like $Bad file,$input file, $output file, $DB connection
$$ - Can any one tell me the scenario with example for user defined variables
$SessStartTime
$SessStartTime returns the session start time as a string value. The format of the string depends on the database you are using.
21 What is Factless fact table ???
A Fact table without measures(numeric data) for a column is called Factless Fact table.
FACT LESS FACT TABLES ARE USED TO CAPTURE DATE TRANSACTION EVENTS
22 explain the scenario for bulk loading and the normal loading option in Informatica Work flow manager ???
Normal: In this case server manager allocates the
resources(Buffers) as per the parameter settings. It creates the log files in database.
Bulk: In this case server manager allocates maximum resources(Buffers) available irrespective of the parameter
settings. It will not create any log files in database. In first case data loading process will be time taking process but other applications are not affected. While in bulk data loading will be much faster but other application are affected.
Normal Load: It loads the record one by one and writes log each file.It will take more time to complete.
Bulk Load: Load the number of records at a time ,It wont fallow ant log files or trace levels,It takes less time .
23 Why is meant by direct and indirect loading options in sessions?
we use file type direct when we are loading single file into target. we use Indirect when we want to load multiple files through single session in the mapping
24 What is the method of loading 5 flat files of having same structure to a single target and which transformations I can use?
This can be handled by using the file list in informatica. If we have 5 files in different locations on the server and we need to load in to single target table.In session properties we need to change the file type as Indirect.am taking a notepad and giving following paths and file names in this notepad and saving this notepad as emp_source.txt in the directory /ftp_data/webrep/
/ftp_data/webrep/SrcFiles/abc.txt /ftp_data/webrep/bcd.txt
/ftp_data/webrep/srcfilesforsessions/xyz.txt /ftp_data/webrep/SrcFiles/uvw.txt
/ftp_data/webrep/pqr.txt
In session properties i give /ftp_data/webrep/ in the directory path and file name as emp_source.txt and file type as Indirect.
25 On a day, I load 10 rows in my target and on next day if Iget 10 more rows to be added to my target out of which 5 are updated rows how can I send them to target? How can I insert and update the record?
the best way to do this use the slowy changing dimension inthe mappings-->wizzard-->slowly changing dimension-->type1 here u need to select the source and target tables.take a look t/f and a update strategy t/f. basing on look up , if the record exits in target then reject it ,if not exit insert it,and if the record exist but it is changed then update it
26 In real time scenario where can we use mapping parameters and variables?
Before using mapping parameters and mapping variables weshould declare these things in mapping tab of mapping
designer.
A mapping parameter cannot change untill the session has completed unless a mapping variable can be changed in between the session.
Example:::
if we declare mapping parameter we can use that parameter untill completing the session,but if we declare mapping variable we can change in between sessions.Use mapping variable in Transcation Control Transformation......
27 By using Filter Transformation,How to pass rows that doesnot satisfy the condition(discarded rows) to another target?
Connect the ports of the filter transformation to the second target table and enable the 'FORWARD REJECTED ROWS' in the properties of the filter transformation. The rejected rows will be forwarded to this table.Well You can Use Router iif you need rejected rows along with satisfied rows otherwis you just give the condition for filter tx as you want it in your target table.
28 A table contains some null values .how to get ( Not Applicable) in place of that null value in target ?
with the help of ISNULL() function of the Informatica In the column properties sheet, write N/A in the Default value text box for the particular column
29 WHAT IS FACT TABLES?
Fact table is the primary table in the dimensional modeling . the numerical performance of measures of the business stored in fact table mostly used facts ara numeric and additive not every numeric is a fact ,but a numeric wich ara of type key performance indicator is called fact
30 WHAT IS UPDATE OVERRIDE . DIFFERENCE BETWEEN SQL OVERRIDE AND UPDATE OVERRIDE ?
Update Override it is an option available in TARGET instance .By defalut Target table is updated based on Primary key values .To update the Target table on non primary key values u can generate the default Querey and override the Querey according to the requiremnet.Suppose for example u want to update the record in target table When a column value='AAA' then u can include this condition in where clause of default Querey.Coming to SQL override it is an option available in Source Qualifier and Lookup transafornmation where u can include joins ,filters,Group by and order by.
31 In a mapping i have three dimensions. If i want to pass a same surrogate key value to all the three dimensions by using one sequence generator is possible?If the mapping is containing single flow? And in the same case if the mapping is contaning 3 flows for the three dimensions then by using one sequence generator can we populate surrogate key (same value) to all the three dimensions?
Use the Sequence and Expresion transfermations.first genarate the surrogate with Seq trans,then send values to exp trans,connect the exp trans o/p ports to 3 dimentions. First Seq generate the surrogate key like 1,2,3,4,5 we wil pass this column to next tran(Exp) from there we will connect o/p port to dimentions.so '1' wil go all dimentions,then '2' wil go then '3' .....
32 what r the transformations that r not involved in mapplet?
1.Normalizer transformations
2.COBOL sources
3.XML Source Qualifier transformations
4.XML sources
5.Target definitions
6.Other mapplets
7.Pre- and post- session stored procedures
8.sequence generator
33 what is data driven?
data driven is the instruction fed to informatica server
wheather to insert/update/delete row whenever using update
strategy transformation
34 what are the transformations that are used in data cleansing ? and how data cleansing takes place ?
Expression tranformation is used in data cleansing.If ur target table consist of not null column and source table consisting null columns so assign some value in expression tr. and then pass data to target.
35 TWO FLAT FILES ARE THERE , EACH HAVING NO MATCHING COLUMNS . HOW CAN U JOIN THESE TWO USING JOINER TRANSFORMATION ?
This can be done by passing all ports to an expression transformation and then creating a output port say ID=1 in both the expression transformation of each file and then join it using a joiner on ID,hope this helps..
36 HOW TO GET THE LATEST DATA IN SCD ?
IN THREE WAYS WE CAN GET THE LATEST RECORD IN SCD ...
1)IN "SCD TYPE 2 TIME STAMP" FOR LATEST RECORD THE "END DATE" FIELD WILL BE BLANK. THAT MEANS IT IS THE NEW ROW .
2)IN "SCD TYPE 2 FLAG " THE FLAG NUMBER OF THE NEW ROW WILL BE ONE .
3) IN " SCD TYPE 2 VERSION " THE LATEST RECORD IS HAVING THE MAXIMUM VERSION NUMBER
37 what is the function of 'F10' informatica ?
F10 and F5 are used in debugging process By pressing F10, the process will move to the next
transformation from the current transformation and the current data can be seen in the bottom panel of the window.. whereas F5 will process the full data at a stretch..in case of F5, u can see the data in the targets at the end of the process but cannot see intermediate transformation values.
38 two types of data are there . one is mainframe and the other is ascii format . in informatica how can you get both the data in a single format in ascii .
39 WHAT IS UPDATE OVERRIDE . DIFFERENCE BETWEEN SQL OVERRIDE AND UPDATE OVERRIDE ?
Coming to SQL override it is an option available in Source Qualifier and Lookup transafornmation where u can include joins ,filters,Group by and order by.Update Override it is an option available in TARGETinstance. By defalut Target table is updated based on Primary key values.To update the Target table on non primary key values u can generate the default Querey and override the Querey according to the requiremnet.Suppose for example u want to update the record in target table When a column value='AAA' then u can include this condition in where clause of default Querey.
40 WHAT IS THE NAME OF THAT PORT IN DYNAMIC CACHE WHICH IS USED FOR INSERT , UPDATE OPRATION ?
New lookup now
41 what are the reusable tasks in informatica ?
command task
session task
email task
42 how the server recognises , if the session fails after loading the 100 records in to the target Based on commit intervel session commits those many records
into target. suppose if commit intervel is 1000 if session fails after 100 records it won't insert not a single record into target.
43 How do you take care of security using a repository manager
REPOSITORY PRIVILAGES
FOLDER PERMISSION (OWNERS,GROUPS,USERS)
LOCKS (READ,WRITE,EXECUTE.FETCH ,SAVE)
44 if the session fails after 100 records agian we have to starts the session or we go for recovery session
informatcia server has 3 methods to recovering the sessions.
1)Run the session again if the Informatica Server has not issued a commit.
2)Truncate the target tables and run the session again if the session is not recoverable.
3)Consider performing recovery if the Informatica Server has issued at least one commit.Use performing recovery to load the records from where the session fails.
45 what r the values tht r passed between informatics server and stored procedure?
There are 3 types of data passing between informatica server and stored proceduer these are
Input/Output parametors: Stored procedure it receive the inputs and porvied the outputs.
Return Value:Ever data base to provied return value after processing of stored procedure.
Status code: It is used for error handling.
46 why do u use shortcuts in informatica.?
Shortcut is a concept of reusability.If there is a mapping that can be reused across several folders, create it in one folder and use shortcuts of it in other folders. Thus, if you have to make change, you can do it in main mapping which reflects in shortcut mappings automatically.
47 scd methodology?
Type1: No historical datas will be available, when changes are made the old data will be deleted and the new data will be insrted.
Type2: Flag:The old data will be denoted as false and the new data will be denoted as true.
Version:The changes made will be numbered as 0,1,2...so on. Date:The changes along with the date in which they are made are clearly mentioned.
Type3:The latest change which is made is alone available.
48 What is the filename which you need to configure in UNIX while installing infromatica?
in informatica 7, under $PMRootDir there is one utility (script) called pmconfig exist, through it we can configure the inforamtica
49 what is confirmed dimension?
Conformed dimension is a dimension which is connected to or shared by more than one fact table.Eg:A business which takes care of both sales and orders of products then product dimension becomes a conformed dimension for both sales fact and order fact
50 surrogate keys usage in Oracle and Informatica?
surrogate key is one type of key which is used to maintain history .it is used in slowly changing dimension (scd) If u have mulitiple records then for mentain this records we need to generate surrogate key in informatica
51 What happens if you increase commit intervals and also decrease commit Explain grouped cross tab?
if you have increased your commit interval to ~25000 rows the session will run faster but when your session fails at 24000th record you will not have any data in your target.When you decrease your commit interval to ~10000 rows yoursession will be slow when compared to previous but if the session fails at 24000th record you will lose only 4000 records.if commit interval is set to high value, performance will be high. if commit is given for every 1000 rows say for eg,it will affect the performance badly
52 What is an MDDB? What is the difference between MDDBs and RDBMSs?
MDDB stands for Multi Dimensional Database
MDDB: In MDDB, it views data in mutidimensional
(Perspective)i.e.,through various dimensions at a time with the help of cubes developed using dimensions and stores data in multidimensional i.e., stores data in power cubes. In power cubes, each axis is a dimension and each member of dimension is column.In MDDB, at a glance we can see the dimensions and data present in the dimensions
RDDB: In RDDB, it views data in two dimensional and stores data in two dimensional i.e., in rows and columns in a table .In RDDB, we can just see the rows and columns, but only after issuing select over that u can see the data.