Thursday, November 24, 2011

datawarehouse interview questions

  • What is Data Warehousing?
A data warehouse is the main repository of an organization’s historical data, its corporate memory. It contains the raw material for management’s decision support system. The critical factor leading to the use of a data warehouse is that a data analyst can perform complex queries and analysis, such as data mining

A data warehouse is a electronical storage of an Organization's historical data for the purpose of analysis and reporting


  • What are fundamental stages of Data Warehousing?
Offline Operational Databases – Data warehouses in this initial stage are developed by simply copying the database of an operational system to an off-line server where the processing load of reporting does not impact on the operational system’s performance.
Offline Data Warehouse – Data warehouses in this stage of evolution are updated on a regular time cycle (usually daily, weekly or monthly) from the operational systems and the data is stored in an integrated reporting-oriented data structure.
Real Time Data Warehouse – Data warehouses at this stage are updated on a transaction or event basis, every time an operational system performs a transaction (e.g. an order or a delivery or a booking etc.)
Integrated Data Warehouse – Data warehouses at this stage are used to generate activity or transactions that are passed back into the operational systems for use in the daily activity of the organization.

  • What is the difference between OLTP and OLAP?
OLTP is the transaction system that collects business data. Whereas OLAP is the reporting and analysis system on that data.
OLTP systems are optimized for INSERT, UPDATE operations and therefore highly normalized. On the other hand, OLAP systems are deliberately denormalized for fast data retrieval through SELECT operations


OLTP is abbreviation of On-Line Transaction Processing. This system is an application that modifies data the instance it receives and has a large number of concurrent users
OLAP is abbreviation of Online Analytical Processing. This system is an application that collects, manages, processes and presents multidimensional data for analysis and management purposes
Data Source
OLTP: Operational data is from original data source of the data
OLAP: Consolidation data is from various source.

Process Goal
OLTP: Snapshot of business processes which does fundamental business tasks
OLAP: Multi-dimensional views of business activities of planning and decision making

Queries and Process Scripts
OLTP: Simple quick running queries run by users.
OLAP: Complex long running queries by system to update the aggregated data.

Database Design
OLTP: Normalized small database.This adopts entity relationship(ER) model and an application-oriented database design.
OLAP: De-normalized large database. This adopts star, snowflake or fact constellation mode of subject-oriented database design.

OLTP vs. OLAP
We can divide IT systems into transactional (OLTP) and analytical (OLAP).
In general we can assume that OLTP systems provide source data to data warehouses, whereas OLAP systems help to analyze it.

OLTP (On-line Transaction Processing) is characterized by a large number of short on-line transactions (INSERT, UPDATE, DELETE).
number of transactions per second is an effectiveness measure

OLAP (On-line Analytical Processing) is characterized by relatively low volume of transactions. Queries are often very complex and involve aggregations
a response time is an effectiveness measure
http://datawarehouse4u.info/OLTP-vs-OLAP.html



Key Differences Between OLTP and OLAP

    The point that distinguishes OLTP and OLAP is that OLTP is an online transaction system whereas, OLAP is an online data retrieval and analysis system.
    Online transactional data becomes the source of data for OLTP. However, the different OLTPs database becomes the source of data for OLAP.
    OLTP’s main operations are insert, update and delete whereas, OLAP’s main operation is to extract multi dimensional data for analysis.
    OLTP has short but frequent transactions whereas, OLAP has long and less frequent transaction.
    Processing time for the OLAP’s transaction is more as compared to OLTP.
    OLAPs queries are more complex with respect OLTPs.
    The tables in OLTP database must be normalized (3NF) whereas, the tables in OLAP database may not be normalized.
    As OLTPs frequently executes transactions in database, in case any transaction fails in middle it may harm data’s integrity and hence it must take care of data integrity. While in OLAP the transaction is less frequent hence, it does not bother much about data integrity.


https://techdifferences.com/difference-between-oltp-and-olap.html
  • What is Data Mining?
Generally, data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information - information that can be used to increase revenue, cuts costs, or both. Data mining software is one of a number of analytical tools for analyzing data. It allows users to analyze data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational databases.
http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/datamining.htm
  1. What is data mart?

Data marts are generally designed for a single subject area. An organization may have data pertaining to different departments like Finance, HR, Marketting etc. stored in data warehouse and each department may have separate data marts. These data marts can be built on top of the data warehouse

  • What is ER model?

ER model is entity-relationship model which is designed with a goal of normalizing the data.


  • What is Dimensional Modeling?
Dimensional data model concept involves two types of tables and it is different from the 3rd normal form. This concepts uses Facts table which contains the measurements of the business and Dimension table which contains the context(dimension of calculation) of the measurements.

  • What is Fact table?
Fact table contains measurements of business processes also fact table contains the foreign keys for the dimension tables. For example, if your business process is paper production” then “average production of paper by one machine” or “weekly production of paper” would be considered as measurement of business process.

  • What is Dimension table?
Dimensional table contains textual attributes of measurements stored in the facts tables. Dimensional table is a collection of hierarchies, categories and logic which can be used for user to traverse in hierarchy nodes

  • What is the difference between etl tool and olap tools?

ETL tools are used to extract transformation and loading the data into data warehouse / data mart
OLAP tools are used to create cubes/reports for business analysis from data warehouse / data mart


  • What is the difference between the terms "business intelligence" and "data warehousing"?


So data warehousing is the foundation that business intelligence is built upon.

People were performing data warehousing (DW) before it had a name.
The term "data warehousing" stemmed from the terms "decision support" and "management reporting" many years ago.
Business intelligence (BI) sought to encapsulate more processes that included data warehousing.
If you notice, many vendors use the term BI to describe their services; to show that they provide more services than just data warehousing.

http://www.information-management.com/news/7260-1.html