How Data Warehouses Power Business Intelligence

What is a Data Warehouse?

What is a Data Warehouse?

A data warehouse is a storage system optimized for storing structured data to perform the high-speed SQL queries needed to deliver timely business intelligence (BI). From processing high-speed transactions to predictive analytics, data warehouses have a decades-long history as the de facto storage standard used by enterprises to power their BI.

The Benefits of Data Warehouses

Benefits of data warehouses include:

  • Consolidation of structured data from multiple disparate sources 
  • Fast analytical queries from relational databases
  • A dedicated storage solution for cheaper queries and quicker reporting

Test Drive FlashBlade

Experience a self-service instance of Pure1® to manage Pure FlashBlade™, the industry's most advanced solution delivering native scale-out file and object storage.

How Data Warehouses Work

The logistics of collecting data from different parts of your business to extract useful information can scale in complexity as your business grows. Data warehouses can give your business a reliable way to consolidate that information into a single database and data model to allow  analysts to run their queries. 

Here’s how it works:

  1. Extract: Collect raw data from the disparate sources across your organization (e.g., ERP, CRM, sales, marketing) into staging databases.
  2. Transform: Data from the staging layer is transferred into an integration layer, where data is combined and transformed into an Operational Data Store (ODS).
  3. Load: Data is moved from the integration layer into the data warehouse by defining the schema your analysts wish to use for their SQL queries before writing them into a relational database (schema on write). 

The database you interact with in a data warehouse is relational, meaning data is structured—stored in tables consisting of columns and rows. These tables are organized by schema that were defined  during the write. 

When the transformation step is handled by an ODS that is external to the data warehouse, it’s called ETL (Extract, Transform, Load). When the data warehouse handles the transformations internally, it’s called ELT (Extract, Load, Transform). Whether you use ETL or ELT, data warehouses require structured data, and schema on write, to work with relational databases.

What are Data Warehouses Used for?

Common applications of data warehouses include:

  • Online Transaction Processing (OLTP): A data warehouse can be optimized for data integrity and fast queries to handle a large volume of short data transactions. An example are transactions that occur on a high-frequency trading platform. 
  • Online analytical processing (OLAP): You can optimize a data warehouse for faster complex queries for a relatively lower volume of transactions. This is basically what an analyst uses to generate BI reports.
  • Predictive analytics: An OLAP system can be optimized to forecast future events and generate “what if” scenarios for your business, often with the help of machine learning algorithms.

Because data warehouses are schema on write, it’s important to know what type of queries you wish to perform before adding schema to a data warehouse. To manage the complexity of disparate data sources, a data warehouse may be segmented into data marts to dedicate hardware and software resources to specific business functions like CRM.

Data Warehouse vs. Data Lake vs. Data Hub

While these three concepts may sound interchangeable, it’s important to understand their differences:

  • Data warehouse: A single repository for integrating and storing structured data pulled from multiple unstructured data sources across your organization.
  • Data lake: A single unrefined repository of all the structured and unstructured raw data sources within an organization (including data warehouses). Data must still be processed to extract BI insights. 
  • Data hub: A single interface that consolidates all data—both structured and unstructured—into a central accessible data layer. It differs from a data warehouse in that it can also handle operational data and it differs from a data lake by possessing the ability to serve data in multiple formats. 

Data hubs provide the data governance needed to streamline data sharing between a diverse collection of endpoints. In this way, data hubs consolidate data lakes and data warehouses into a single access layer. Data processing is abstracted away behind the data hub, giving your organization a centralized place to extract BI insights.

Why Choose Pure Storage for Your Data Warehouse Needs?

If you need to add a new OLAP or OLTP pipeline to your existing data warehouse infrastructure, it may be time to consider investing in a more Modern Data Experience™ with Pure Storage’s all-flash storage solutions. 

As the industry’s first data hub, Pure Storage® FlashBlade® can not only handle the analytics and reporting workloads of a data warehouse but also deliver on the essential qualities of a data hub:

  • Seamless data sharing across all your data endpoints
  • Unified file and object storage
  • The ability to handle operational data in real time
  • Natively architected to scale out
  • Engineered to deliver multidimensional performance for any type of data
  • Massively parallel from software to hardware
800-379-7873 +44 20 3870 2633 +43 720882474 +32 (0) 7 84 80 560 +33 9 75 18 86 78 +49 89 12089 253 +353 1 485 4307 +39 02 9475 9422 +31 (0) 20 201 49 65 +46-101 38 93 22 +45 2856 6610 +47 2195 4481 +351 210 006 108 +966112118066 +27 87551 7857 +34 51 889 8963 +41 31 52 80 624 +90 850 390 21 64 +971 4 5513176 +7 916 716 7308 +65 3158 0960 +603 2298 7123 +66 (0) 2624 0641 +84 43267 3630 +62 21235 84628 +852 3750 7835 +82 2 6001-3330 +886 2 8729 2111 +61 1800 983 289 +64 21 536 736 +55 11 2655-7370 +52 55 9171-1375 +56 2 2368-4581 +57 1 383-2387