informatica faq-14
Go To Data WareHousing Faqs Home
FACTLESS FACT TABLE
Fact tables don't have any facts ($ amounts) at all! They may consist of nothing but keys. These are called factless fact tables. The first type of factless fact table is a table that records an event. Many event-tracking tables in dimensional data warehouses turn out to be factless. One good example is: track student attendance at a college. Imagine that you have a modern student tracking system that detects each student attendance event each day. With the heightened powers of
Dimensional thinking that you have developed over the past few months, you can easily list the dimensions surrounding the student attendance event.
DEGENRATE DIMENSIONS
In transaction-oriented fact tables, treat the operational control numbers (such as the purchase order or invoice number) as degenerate dimensions. They reside as dimension keys on the fact table, but
don't join to a corresponding dimension table.
Teams are sometimes tempted to create a dimension table with information from the operational header record, such as the transaction number, transaction date, transaction type, or transaction terms. In this case, you'd end up with a dimension table that has nearly as many rows as the fact table. A dimension table growing at nearly the same pace as the fact table is a warning sign that a degenerate dimension
may be lurking within it.
Much of my work involves designing and reviewing dimensional data models, and an interesting issue that often comes up is how to deal with data items such as invoice number, order number and so on, that are not strictly facts - you're not going to want to add them up, or average them, or perform any other maths on them - but they don't seem to fit into existing dimensions.
Ralph Kimball coined the term 'Degenerate Dimensions' for these data items, as they perform much the same function as dimensions: they sit in the fact table and allow you to limit down or 'slice and dice' your fact table measures, but they aren't foreign key links through to dimension tables, as all the information you want - the invoice number, or the order number -is contained in the degenerate dimension column itself. Degenerate dimensions are useful as they tie the transactions, or events, in the fact table back to real-life items - invoices, orders and so on – and they can be a quick way to group together similar
transactions for further analysis.
The key here is not to go overboard and make these degenerate dimensions into full dimension tables – for example, an Invoice dimensions - as in all likelihood this dimension table will grow at the same rate as your fact table. If there is other interesting information to go with the invoice - for example, who the customer was, what products were ordered - this is better placed in specific dimensions for customers and products where it can be stored as a kind of 'master
copy', rather than storing it alongside each order in a ballooning Invoice dimension.
The other advantage with degenerate dimensions is that they're a lot easier to build and maintain when using ETL tools such as Oracle Warehouse Builder, as you don't have to create dimension lookup tables, create synthetic keys, sequences and so on. Indeed, if you're loading your dimensional model into a multidimensional database such as Oracle OLAP, your database will be much smaller in size and easier to handle if you can keep the number of formal dimensions to a minimum, as they tend to 'explode' in size the more dimensions you add to the database.
Judicious use of degenerate dimensions keeps your dimensional model rational and your database size reasonable, whilst allowing you to keep useful items in the fact table that help us tie the data warehouse back to the original source systems.
Parameter files, mapping variables when do we use them and why, how do they affect the performance.
Mapping parameter represents a constant value that U can define before running a session. A mapping parameter retains the same value throughout the entire session. When u use the mapping parameter, U declare and use the parameter in a mapping or Mapplet. Then define the value of parameter in a parameter file for the session.
Unlike a mapping parameter, a mapping variable represents a value that can change throughout the session. The Informatica server saves the value of mapping variable to the repository at the end of session run and uses that value next time U run the session.
Parameter files are used @ session level--
Syntax for parameter files is:
[folder_name.session_name]
parameter_name=value
variable_name=value
Enter this in any editor and give the path in properties.
( Click the Properties tab in session.
Enter the parameter directory and name in the Parameter Filename field. )
We use them generally whenever we want to change dates/pass specific date values
Why we use them is because hard coding the values is difficult for maintenance--having in a txt editor is easy to maintain /change
As far as I can think They don’t effect performance much--(but depends on how many records are there in ur source and what field ur passing...)
what factors do we need to consider/take care of when we migrate our mappings from development to testing... something in relation to connectivity objects.
Lets say we got 3 repositories DEV, QA, PROD.For migration from Dev to Qa Repository, 1st the DBA's will create all the required objects (tables, views...)in the respective schemas(1) in the QA database. We then move (2)(just drag and drop) all the required mappings in designer to the specified folder (3) in QA repository. Validate mapping in Designer. Then we move (just drag and drop again) all the tasks (wf’s) in the workflow manager from dev to qa.(4).Validate WF again.
This is the procedure followed in most of the projects. Also there is 1 more procedure given in guide to do it directly --check that
1)all the tables in a project will be in different schemas according to their functionality
2)You might need to follow the order of migration. First the Shared
Folder then the Folder which is using shared folder objects using shortcuts.
3)In a project we'll have all source to stg mappings in 1 folder and actual maps in 1 more folder and so on......so we have to take care when moving from 1 rep to other. we have move from folder A mappings in dev to folder A in QA
4)After moving WF's we got to change connections to point to QA database
how many mapplets do u on an average create in a project.
Depends---Usually say around 5.we got 2 in our project
What do u need to take care of when we r upgrading from one version to another( what changes do we need to do in our mappings).
None in our mappings except in certain instances. These are the steps for upgradation
PROCESS FLOW (STEPS FOR INFORMATICA UPGRADE):
1) Prepare the Repository
2) Create a copy of the Repository
3) Installation and Configuration of PC7.1.2 Components (client and Repository server)
4) Upgrade the Repository
5) Install and Configure the PC Server
6) Register the PC Server with the Repository
7) Test the Upgradation/Installation
will there be any difference when we r extracting from /loading data to SQL server or DB2 or flat files?
Nope, SQL sever and DB2 is just like oracle table import. For flat files its a little different. Loading is same
where do we use unix shell scripts, and what do they contain
Used for scheduling. talk abt pmcmd
how do we collect performance details of a session
The performance details provide counters that help you understand the session and mapping efficiency. Just check the tab in session.
"You create performance details by selecting Collect Performance Data in the session properties before running the session. By evaluating the final performance details, you can determine where session performance slows down. Monitoring also provides session-specific details that can help tune the following:
Buffer block size
Index and data cache size for Aggregator, Rank, Lookup, and Joiner transformations
Lookup transformations
Before using performance details to improve session performance you must do the following:
Enable monitoring
Increase Load Manager shared memory
Understand performance counters
"
Say its only selected in testing. my experience it slows down the session a lot
Go To Data WareHousing Faqs Home
FACTLESS FACT TABLE
Fact tables don't have any facts ($ amounts) at all! They may consist of nothing but keys. These are called factless fact tables. The first type of factless fact table is a table that records an event. Many event-tracking tables in dimensional data warehouses turn out to be factless. One good example is: track student attendance at a college. Imagine that you have a modern student tracking system that detects each student attendance event each day. With the heightened powers of
Dimensional thinking that you have developed over the past few months, you can easily list the dimensions surrounding the student attendance event.
DEGENRATE DIMENSIONS
In transaction-oriented fact tables, treat the operational control numbers (such as the purchase order or invoice number) as degenerate dimensions. They reside as dimension keys on the fact table, but
don't join to a corresponding dimension table.
Teams are sometimes tempted to create a dimension table with information from the operational header record, such as the transaction number, transaction date, transaction type, or transaction terms. In this case, you'd end up with a dimension table that has nearly as many rows as the fact table. A dimension table growing at nearly the same pace as the fact table is a warning sign that a degenerate dimension
may be lurking within it.
Much of my work involves designing and reviewing dimensional data models, and an interesting issue that often comes up is how to deal with data items such as invoice number, order number and so on, that are not strictly facts - you're not going to want to add them up, or average them, or perform any other maths on them - but they don't seem to fit into existing dimensions.
Ralph Kimball coined the term 'Degenerate Dimensions' for these data items, as they perform much the same function as dimensions: they sit in the fact table and allow you to limit down or 'slice and dice' your fact table measures, but they aren't foreign key links through to dimension tables, as all the information you want - the invoice number, or the order number -is contained in the degenerate dimension column itself. Degenerate dimensions are useful as they tie the transactions, or events, in the fact table back to real-life items - invoices, orders and so on – and they can be a quick way to group together similar
transactions for further analysis.
The key here is not to go overboard and make these degenerate dimensions into full dimension tables – for example, an Invoice dimensions - as in all likelihood this dimension table will grow at the same rate as your fact table. If there is other interesting information to go with the invoice - for example, who the customer was, what products were ordered - this is better placed in specific dimensions for customers and products where it can be stored as a kind of 'master
copy', rather than storing it alongside each order in a ballooning Invoice dimension.
The other advantage with degenerate dimensions is that they're a lot easier to build and maintain when using ETL tools such as Oracle Warehouse Builder, as you don't have to create dimension lookup tables, create synthetic keys, sequences and so on. Indeed, if you're loading your dimensional model into a multidimensional database such as Oracle OLAP, your database will be much smaller in size and easier to handle if you can keep the number of formal dimensions to a minimum, as they tend to 'explode' in size the more dimensions you add to the database.
Judicious use of degenerate dimensions keeps your dimensional model rational and your database size reasonable, whilst allowing you to keep useful items in the fact table that help us tie the data warehouse back to the original source systems.
Parameter files, mapping variables when do we use them and why, how do they affect the performance.
Mapping parameter represents a constant value that U can define before running a session. A mapping parameter retains the same value throughout the entire session. When u use the mapping parameter, U declare and use the parameter in a mapping or Mapplet. Then define the value of parameter in a parameter file for the session.
Unlike a mapping parameter, a mapping variable represents a value that can change throughout the session. The Informatica server saves the value of mapping variable to the repository at the end of session run and uses that value next time U run the session.
Parameter files are used @ session level--
Syntax for parameter files is:
[folder_name.session_name]
parameter_name=value
variable_name=value
Enter this in any editor and give the path in properties.
( Click the Properties tab in session.
Enter the parameter directory and name in the Parameter Filename field. )
We use them generally whenever we want to change dates/pass specific date values
Why we use them is because hard coding the values is difficult for maintenance--having in a txt editor is easy to maintain /change
As far as I can think They don’t effect performance much--(but depends on how many records are there in ur source and what field ur passing...)
what factors do we need to consider/take care of when we migrate our mappings from development to testing... something in relation to connectivity objects.
Lets say we got 3 repositories DEV, QA, PROD.For migration from Dev to Qa Repository, 1st the DBA's will create all the required objects (tables, views...)in the respective schemas(1) in the QA database. We then move (2)(just drag and drop) all the required mappings in designer to the specified folder (3) in QA repository. Validate mapping in Designer. Then we move (just drag and drop again) all the tasks (wf’s) in the workflow manager from dev to qa.(4).Validate WF again.
This is the procedure followed in most of the projects. Also there is 1 more procedure given in guide to do it directly --check that
1)all the tables in a project will be in different schemas according to their functionality
2)You might need to follow the order of migration. First the Shared
Folder then the Folder which is using shared folder objects using shortcuts.
3)In a project we'll have all source to stg mappings in 1 folder and actual maps in 1 more folder and so on......so we have to take care when moving from 1 rep to other. we have move from folder A mappings in dev to folder A in QA
4)After moving WF's we got to change connections to point to QA database
how many mapplets do u on an average create in a project.
Depends---Usually say around 5.we got 2 in our project
What do u need to take care of when we r upgrading from one version to another( what changes do we need to do in our mappings).
None in our mappings except in certain instances. These are the steps for upgradation
PROCESS FLOW (STEPS FOR INFORMATICA UPGRADE):
1) Prepare the Repository
2) Create a copy of the Repository
3) Installation and Configuration of PC7.1.2 Components (client and Repository server)
4) Upgrade the Repository
5) Install and Configure the PC Server
6) Register the PC Server with the Repository
7) Test the Upgradation/Installation
will there be any difference when we r extracting from /loading data to SQL server or DB2 or flat files?
Nope, SQL sever and DB2 is just like oracle table import. For flat files its a little different. Loading is same
where do we use unix shell scripts, and what do they contain
Used for scheduling. talk abt pmcmd
how do we collect performance details of a session
The performance details provide counters that help you understand the session and mapping efficiency. Just check the tab in session.
"You create performance details by selecting Collect Performance Data in the session properties before running the session. By evaluating the final performance details, you can determine where session performance slows down. Monitoring also provides session-specific details that can help tune the following:
Buffer block size
Index and data cache size for Aggregator, Rank, Lookup, and Joiner transformations
Lookup transformations
Before using performance details to improve session performance you must do the following:
Enable monitoring
Increase Load Manager shared memory
Understand performance counters
"
Say its only selected in testing. my experience it slows down the session a lot
Go To Data WareHousing Faqs Home

<< Home