What Is Traditional Data?

Traditional data is structured, relational data organizations have been storing and processing for decades. Traditional data still accounts for the majority of the world’s data. Businesses can use traditional data for tracking sales or managing customer relations or workflows. Traditional data is often easier to manipulate and can be managed with conventional data processing software. However, it generally provides less sophisticated insights and more limited benefits than big data.

Big data can refer to both a large and complex data set, as well as the methods used to process this type of data. Big data has four main characteristics, often known as “the four Vs”: Volume: Big data is...big. While big data isn’t only distinguishable by its size, it’s also typically very high volume in nature. Variety: A big data set typically contains structured, semi-structured, and unstructured data. Velocity: Big data generates quickly and is often processed in real time. Veracity: Big data isn’t inherently better quality than traditional data, but its veracity (accuracy) is extremely important. Anomalies, biases, and noise can significantly impact the quality of big data.

The Differences between Big Data and Traditional Data

Several characteristics are used to distinguish between big data and traditional data. These include: The size of the data How the data is organized The architecture required to manage the data The sources from which the data derives The methods used to analyze the data Size Traditional data sets tend to be measured in gigabytes and terabytes. As a result, their size can allow for centralized storage, even on one server. Big data is distinguished not only by its size but also by its volume. Big data is usually measured in petabytes, zettabytes, or exabytes. The increasingly large size of big data sets is one of the main drivers behind the demand for more modern, high-capacity, cloud-based data storage solutions. Organization Traditional data is normally structured data that’s organized in records, files, and tables. Fields in traditional data sets are relational, so it’s possible to work out their relationship and manipulate the data accordingly. Traditional databases, such as SQL, Oracle DB, and MySQL, use a fixed schema that is static and preconfigured. Big data uses a dynamic schema. In storage, big data is raw and unstructured. When big data is accessed, the dynamic schema is applied to the raw data. Modern non-relational or NoSQL databases like Cassandra and MongoDB are ideal for unstructured data, given the way they store data in files. Architecture Traditional data is typically managed using a centralized architecture, which can be more cost-effective and secure for smaller, structured data sets. In general, a centralized system consists of one or more client nodes (e.g., computers or mobile devices) connected to a central node (e.g., a server). The central server controls the network and monitors its security. Because of its scale and complexity, it isn’t possible to manage big data centrally. It requires a distributed architecture. Distributed systems link multiple servers or computers over a network, operating as co-equal nodes. The architecture can scale horizontally (scale “out”) and will continue functioning even if an individual node fails. Distributed systems can leverage commodity hardware to reduce costs. Sources Traditional data typically derives from enterprise resource planning (ERP), customer relationship management (CRM), online transactions, and other enterprise-level data. Big data derives from a broader range of enterprise and non-enterprise-level data, which can include information scraped from social media, device and sensor data, and audiovisual data. These source types are dynamic, evolving, and growing every day. Unstructured data sources can also include text, video, image, and audio files. Leveraging this type of data isn’t possible using the columns and rows of traditional databases. Because an increasingly significant amount of data is unstructured and comes from multiple sources, big data analysis methods are required to extract value from it. Analysis Traditional data analysis occurs incrementally: An event occurs, data is generated, and the analysis of this data takes place after the event. Traditional data analysis can help businesses understand the impacts of given strategies or changes on a limited range of metrics over a specific period. Big data analysis can occur in real time. Because big data generates on a second-by-second basis, analysis can occur as data is being collected. Big data analysis offers businesses a more dynamic and holistic understanding of their needs and strategies. For example, suppose a business has invested in a training program for its staff and wants to measure its impact. Under a traditional model of data analysis, the business might set out to determine the impact of the training program on a particular area of its operations, such as sales. The business notes the sales volume before and after the training and excludes any extraneous factors. It can, in theory, see how much sales have increased as a result of the training. Under a big data model of analysis, the business can set aside questions regarding how the training program has impacted any particular aspect of its operations. Instead, by analyzing a mass of data collected in real time across the whole business, it can identify the specific areas that have been impacted, such as sales, customer service, public relations, and more.

ピュア・ナレッジ
ビッグデータの基礎
ビッグデータと従来のデータ

ビッグデータ・ビギナーズ・ガイド

ビッグデータと従来のデータ

ビッグデータは、顧客の行動に関するより重要なインサイト、市場動向に関するより正確な予測、事業全体にわたる効率性の向上など、ビジネスに計り知れない機会をもたらします。

人や企業が生み出すデータは、年々増大しています。IDC のレポートによると、2010 年に世界で新たに生成されたデータ量は、わずか 1.2 ゼタバイト（1.2 兆ギガバイト）に過ぎませんでした。この数値は、2025 年には 175 ゼタバイト（175 兆ギガバイト）以上に増大する可能性があります。¹

この膨大な資源を予測分析やデータ・マイニングに活用することで、ビッグデータの市場も拡大することが予想されます。Statista の調査によると、ビッグデータ市場は 2018 年から 2027 年にかけて 1,690 億ドルから 2,740 億ドルへと倍増すると予測されています。

しかし、ビッグデータと従来のデータでは、どのようのな違いがあるのでしょうか。また、それらは現在のデータ・ストレージ、処理方法、分析技術にどのような影響を与えるのでしょうか。ここでは、ビッグデータと従来のデータの両方で成果を上げるための戦略の重要性を強調しつつ、それぞれのデータが果たす異なる役割について解説します。

従来のデータとは

従来のデータとは、これまで多くの組織が何十年もかけて保存・処理してきた構造化されたリレーショナル・データのことです。世界のデータの大半は、依然として従来のデータが占めています。

企業は従来のデータを、売上げの追跡、顧客関係やワークフローの管理に利用しています。多くの場合、従来のデータは操作が容易で、従来のデータ処理ソフトウェアで管理することができます。しかし、一般的にはビッグデータよりも洗練された洞察力に欠け、メリットも限られています。

ビッグデータとは

ビッグデータとは、大規模で複雑なデータセットと、このような種類のデータを処理するために使用される手法の両方をさします。ビッグデータには、「4 つの V」と呼ばれる大きな特徴があります。

Volume（データの量）：ビッグデータは、名前のとおり大きなデータを表します。ビッグデータは、その規模だけで特徴づけられるわけではありませんが、一般的に膨大なデータ量を持つという性質があります。
Variety（データの多様性）：ビッグデータには、通常、構造化データ、半構造化データ、非構造化データが含まれます。
Velocity（データの速さ）：ビッグデータは迅速に生成され、多くの場合、リアルタイムで処理されます。
Veracity（データの正しさ）：ビッグデータは、従来のデータと比較して本質的に質が高いというわけではありませんが、その真実性（正確性）は極めてに重要です。異常、偏り、ノイズなどは、ビッグデータの質に大きな影響を与える可能性があります。

ビッグデータと従来のデータの違い

ビッグデータと従来のデータを区別する際には、いくつかの特徴が基準として用いられます。その特徴は、次のとおりです。

データの規模
データの構造
データを管理するために必要なアーキテクチャ
データの取得元
データの分析手法

データの規模

従来のデータセットは通常、ギガバイト（GB）やテラバイト（TB）単位で扱われます。そのため、1 台のサーバーなどに集中管理して保存できます。

ビッグデータの特徴は、規模だけでなく、データ量の多さにもあります。ビッグデータは通常、ペタバイト（PB）、ゼタバイト（ZB）、エクサバイト（EB）といった単位で表されます。ビッグデータの規模がますます大きくなっていることが、モダンで大容量のクラウドベースのデータ・ストレージ・ソリューションが求められる要因の 1 つとなっています。

データの構造

一般的な従来のデータは、レコード、ファイル、テーブルなどに整理された構造化データです。従来のデータセットのフィールドはリレーショナルであるため、それらの関係を把握し、必要に応じてデータを操作することが可能です。SQL、Oracle DB、MySQL などの従来型データベースは、静的で事前定義された固定スキーマを使用します。

ビッグデータでは、動的スキーマを使用します。ビッグデータは、ストレージ内では未加工で、構造化されていません。ビッグデータにアクセスすると、未加工データに動的スキーマが適用されます。Cassandra や MongoDB のような最新の非リレーショナル（NoSQL）データベースは、データをファイルとして保存する仕組みを持つため、非構造化データに最適です。

アーキテクチャ

従来のデータは通常、一元化されたアーキテクチャを使用して管理されるため、小規模で構造化されたデータセットに対しては、費用対効果が高く、十分な安全性も備えています。

一般的に、一元化されたシステムは、1 つまたは複数のクライアント・ノード（コンピュータやモバイル・デバイスなど）が、中央ノード（サーバーなど）に接続される構成です。中央のサーバーがネットワークを制御し、セキュリティを監視します。

ビッグデータは、その規模や複雑さが原因で一元的に管理することは不可能です。そのため、分散型アーキテクチャが必要となります。

分散システムは、ネットワークを介して複数のサーバーやコンピュータを接続し、同等のノードとして動作します。このアーキテクチャは、水平方向に拡張（スケールアウト）することができ、個々のノードに障害が発生した場合でも機能を維持することができます。分散システムでは、汎用的なハードウェアを活用してコストを削減することができます。

ソース

従来のデータは、ERP（エンタープライズ・リソース・プランニング）や CRM（カスタマー・リレーションシップ・マネジメント）、オンライン・トランザクションで生成されるデータや、その他のエンタープライズ・レベルのデータが一般的でした。

ビッグデータとは、より広範なデータをさし、エンタープライズ・レベルのデータに限らず、SNS から取得した情報、デバイスやセンサーのデータ、音声や映像のデータなどが含まれます。この種のソースは、ダイナミックに進化しており、日々成長しています。

非構造化データ・ソースには、テキスト、動画、画像、音声のファイルも含まれます。このようなデータを、列や行を使用する従来のデータベースで扱うことは不可能です。非構造化データの量が日々増加し、ソースも多様化している中で、そこから価値を引き出すためには、ビッグデータに特化した分析手法が必要となります。

分析

従来のデータ分析は、段階的に行われていました。イベントが発生し、データが生成され、そのデータの分析をイベントの後に行うというステップです。従来のデータ分析は、特定の期間の限られた範囲の指標において、ある戦略が与える影響や変化を理解するのに役立ちます。

ビッグデータの分析は、リアルタイムで行うことができます。ビッグデータは秒単位で生成されるため、データを収集しながら分析を行うことができます。ビッグデータの分析は、企業のニーズや戦略をよりダイナミックかつ包括的に理解することを可能にします。

例えば、ある企業がスタッフのためのトレーニング・プログラムに投資し、その効果を測定したいとします。

従来のデータ分析モデルの場合、トレーニング・プログラムが特定の対象（営業部門など）に与える影響を調べようとします。そのためには、トレーニング前後の販売量を記録し、それ以外の余計な要因を排除します。理論的には、トレーニングによって、どれだけ売上げが伸びたかを知ることができます。

一方、ビッグデータを利用した分析モデルでは、そのトレーニング・プログラムが業務の特定の分野に対してどのような影響を与えたかという質問を設定することができます。つまり、ビジネス全体でリアルタイムに収集された大量のデータを分析することで、営業、顧客サービス、広報など、影響を受けた分野を特定することができます。

ビッグデータと従来のデータ：将来に向けた重要な検討事項

ビッグデータと従来のデータには、それぞれ異なる目的がありますが、それらは関連しています。ビッグデータは、より大きなメリットをもたらす可能性があると思われがちですが、どのような状況にも適している（または、必要である）というわけではありません。ビッグデータには、以下のような特徴があります。

市場の動向や消費者の行動をより深く分析：ビッグデータが提供できる実用的なインサイトを得るには、従来のデータ分析は閉鎖的で制限が多すぎます。
迅速なインサイトの提供：ビッグデータを活用する組織は、データからリアルタイムに学ぶことができます。ビッグデータ分析において、このことは競争優位性をもたらします。
優れた効率性：デジタル化が進む現代社会では、人々や企業は、日々ほぼ分単位で、膨大な量のデータを生成しています。ビッグデータは、このデータを活用して実用的な方法に解釈することを可能にします。
高度な準備が必要：これらの利点を活用するには、新たなセキュリティ・プロトコルや設定手順、利用可能な処理能力の増強を通じてビッグデータに備える必要があります。

ビッグデータの台頭によって、従来のデータがなくなるわけではありません。従来のデータには、以下のような特徴があります。

セキュリティの確保が容易：機密性の高いものや、個人情報などのデータセットに適しています。従来のデータは規模が小さいため、分散アーキテクチャを必要とせず、サードパーティのストレージを必要とする可能性も低いです。
従来のデータ処理ソフトウェアと通常のシステム構成での処理が可能：ビッグデータを処理するためには、一般的に、高度な構成をセットアップする必要があり、従来のデータ処理方法で対応する場合、リソースの使用量やコストが不必要に増加する可能性があります。
操作や解釈が容易：従来のデータは、シンプルでリレーショナル型であるため、通常の機能で処理が可能であり、専門知識がなくても扱える場合があります。

結局のところ、ビッグデータと従来のデータのどちらを選択するかという問題ではありません。より多くの企業が大規模な非構造化データセットを生成するようになると、それらを扱うための適切なツールが必要になります。両方のモデルをどのように使用し、サポートするかを理解することは、ビッグデータの将来を見据えて戦略を更新するために必要なことです。