data warehousing -1
What is a data-warehouse?
A data warehouse is subject-oriented, integrated, time-variant and non-volatile [data] collection in support of management decision making processes. (OR) A data warehouse, is a collection of data gathered from one or more data repositories to create a new, central database.
Storage of large volumes of data, Historical data, Load and save - no updates
Reporting system, Query and analysis, Trends and forecasting
What are Data Marts?
Data mart is restricted to a single business process/ group focusing more on group specific analysis which can be derived from ore generic EDW.
Union of data marts equal data warehouse
What is ER Diagram?
Entity – Relational diagram. Used to represent the OLTP systems. Highly normalized depicting the relations between the entities.
The relations could be (a) one to one (b) One to many (crowfoot style) (c) Many to many
What is a Star Schema?
Star schema: A modeling paradigm that has fact table in the middle connected to a number of dimensions tables around it radially – The dimension tables are highly de-normalized which minimizes number of joins required and gives good query performance.
What is Dimensional Modelling?
Data Warehousing modeling paradigm in which the entities and relations are remodeled to illustrate business entities and user friendliness.
The normalized structures of ER model are de-normalized and put in STAR / SNOWFLAKE schema.
What Snow Flake Schema?
Snowflake structure: Snowflake is a star schema with normalized dimensions.
What is Data cleaning?
filling in missing values, smoothing noisy data, identifying & removing outliers, correcting inconsistencies, etc.;
What are the Different methods of loading Dimension tables?
Full load & Incremental load.
What are Aggregate tables?
After fact tables are built, any necessary aggregate fact tables must be built. Aggregate tables are structured to define "totals" of data stored in granular fact tables. This pre-summarization allows for faster extracts from the warehouse, avoiding costly repeats of "sum/group by" SQL requests. Aggregate tables may be built at the staging level or may be built as a post-load process in the warehouse DBMS itself.
What are the Differences between OLTP and OLAP?
OLTP OLAP
Functional: day to day operations || Decision support
Db design: application oriented || subject oriented
Data : Current upto date || Historical data
Detailed, flat relational || Summarized, Isolated
Unit of work: Short,simple, transaction || Complex query
Transaction oriented: quick response || Analysis Oriented: User friendly, robust
What is ETL?
Processes of Extracting, Transforming (or Transporting) and Loading (ETL) data from source systems into the data warehouse (or)
Extract Transform and Load – a set of database utilities used to extract information from one database, transform it and load it into a second database. These tools are particularly useful to aggregate data from different database suppliers, e.g., Oracle to Sybase. into a data warehouse
What are the various ETL tools in the Market?
Data Stage, Data Junction, Abinitio, Informatica, Cognos, OWB
What are the various Reporting tools in the Market?
Seagate Crystal Reports, Business Objects, Microstrategy, Cognos
What is Fact table?
A table in a star schema that contains facts. A fact table typically has two types of columns: those that contain facts and those that are foreign keys to dimension tables. The primary key of a fact table is usually a composite key that is made up of all of its foreign keys. (OR)
The tables which are extracted from heterogeneous sources and used in the Data Wareahouse
What is a dimension table?
Dimension tables describe the business entities of an enterprise, represented as hierarchical, categorical information such as time, departments, locations, and products. Dimension tables are sometimes called lookup or reference tables.
What is a lookup table?
Another name for Dimension table.
What is a general purpose scheduling tool? Name some of them?
To schedule jobs. Operating system can be used for scheduling.
Tools :
What are modeling tools available in the Market? Name some of them?
Visio-based database modeling component, ERWin
What is real time data-warehousing?
An enterprise-wide implementation that replicates data from the publication tables on different servers/platforms to a single subscription table with minimum possible time lag. This implementation effectively consolidates data from multiple sources with least time lag, so that business can see the real-time reports, as in OLTP.
What is data mining?
Data mining is about the discovery of knowledge, rules, trends, or patterns in large quantities of data.
The process of finding hidden patterns and relationships in data. For instance, a consumer goods company may track 200 variables about each consumer. There are scores of possible relationships among the 200 variables. Data mining tools will identify the significant relationships
What is Normalization? First Normal Form, Second Normal Form , Third Normal Form?
Normalization is a step-by-step process of removing redundancies and dependencies of attributes in a data structure. The condition of the data at completion of each step is described as a "normal form."
First Normal:
Second Normal:
Third Normal:
What is ODS?
The form that data warehouse takes in the operational environment. Operational data stores can be updated, do provide rapid and consistent time, and contain only a limited amount of historical data.
What type of Indexing mechanism do we need to use for a typical datawarehouse?
Bitmap, Btree
A data warehouse is subject-oriented, integrated, time-variant and non-volatile [data] collection in support of management decision making processes. (OR) A data warehouse, is a collection of data gathered from one or more data repositories to create a new, central database.
Storage of large volumes of data, Historical data, Load and save - no updates
Reporting system, Query and analysis, Trends and forecasting
What are Data Marts?
Data mart is restricted to a single business process/ group focusing more on group specific analysis which can be derived from ore generic EDW.
Union of data marts equal data warehouse
What is ER Diagram?
Entity – Relational diagram. Used to represent the OLTP systems. Highly normalized depicting the relations between the entities.
The relations could be (a) one to one (b) One to many (crowfoot style) (c) Many to many
What is a Star Schema?
Star schema: A modeling paradigm that has fact table in the middle connected to a number of dimensions tables around it radially – The dimension tables are highly de-normalized which minimizes number of joins required and gives good query performance.
What is Dimensional Modelling?
Data Warehousing modeling paradigm in which the entities and relations are remodeled to illustrate business entities and user friendliness.
The normalized structures of ER model are de-normalized and put in STAR / SNOWFLAKE schema.
What Snow Flake Schema?
Snowflake structure: Snowflake is a star schema with normalized dimensions.
What is Data cleaning?
filling in missing values, smoothing noisy data, identifying & removing outliers, correcting inconsistencies, etc.;
What are the Different methods of loading Dimension tables?
Full load & Incremental load.
What are Aggregate tables?
After fact tables are built, any necessary aggregate fact tables must be built. Aggregate tables are structured to define "totals" of data stored in granular fact tables. This pre-summarization allows for faster extracts from the warehouse, avoiding costly repeats of "sum/group by" SQL requests. Aggregate tables may be built at the staging level or may be built as a post-load process in the warehouse DBMS itself.
What are the Differences between OLTP and OLAP?
OLTP OLAP
Functional: day to day operations || Decision support
Db design: application oriented || subject oriented
Data : Current upto date || Historical data
Detailed, flat relational || Summarized, Isolated
Unit of work: Short,simple, transaction || Complex query
Transaction oriented: quick response || Analysis Oriented: User friendly, robust
What is ETL?
Processes of Extracting, Transforming (or Transporting) and Loading (ETL) data from source systems into the data warehouse (or)
Extract Transform and Load – a set of database utilities used to extract information from one database, transform it and load it into a second database. These tools are particularly useful to aggregate data from different database suppliers, e.g., Oracle to Sybase. into a data warehouse
What are the various ETL tools in the Market?
Data Stage, Data Junction, Abinitio, Informatica, Cognos, OWB
What are the various Reporting tools in the Market?
Seagate Crystal Reports, Business Objects, Microstrategy, Cognos
What is Fact table?
A table in a star schema that contains facts. A fact table typically has two types of columns: those that contain facts and those that are foreign keys to dimension tables. The primary key of a fact table is usually a composite key that is made up of all of its foreign keys. (OR)
The tables which are extracted from heterogeneous sources and used in the Data Wareahouse
What is a dimension table?
Dimension tables describe the business entities of an enterprise, represented as hierarchical, categorical information such as time, departments, locations, and products. Dimension tables are sometimes called lookup or reference tables.
What is a lookup table?
Another name for Dimension table.
What is a general purpose scheduling tool? Name some of them?
To schedule jobs. Operating system can be used for scheduling.
Tools :
What are modeling tools available in the Market? Name some of them?
Visio-based database modeling component, ERWin
What is real time data-warehousing?
An enterprise-wide implementation that replicates data from the publication tables on different servers/platforms to a single subscription table with minimum possible time lag. This implementation effectively consolidates data from multiple sources with least time lag, so that business can see the real-time reports, as in OLTP.
What is data mining?
Data mining is about the discovery of knowledge, rules, trends, or patterns in large quantities of data.
The process of finding hidden patterns and relationships in data. For instance, a consumer goods company may track 200 variables about each consumer. There are scores of possible relationships among the 200 variables. Data mining tools will identify the significant relationships
What is Normalization? First Normal Form, Second Normal Form , Third Normal Form?
Normalization is a step-by-step process of removing redundancies and dependencies of attributes in a data structure. The condition of the data at completion of each step is described as a "normal form."
First Normal:
Second Normal:
Third Normal:
What is ODS?
The form that data warehouse takes in the operational environment. Operational data stores can be updated, do provide rapid and consistent time, and contain only a limited amount of historical data.
What type of Indexing mechanism do we need to use for a typical datawarehouse?
Bitmap, Btree

<< Home