What Is Traditional Data?

Traditional data is structured, relational data organizations have been storing and processing for decades. Traditional data still accounts for the majority of the world’s data. Businesses can use traditional data for tracking sales or managing customer relations or workflows. Traditional data is often easier to manipulate and can be managed with conventional data processing software. However, it generally provides less sophisticated insights and more limited benefits than big data.

Big data can refer to both a large and complex data set, as well as the methods used to process this type of data. Big data has four main characteristics, often known as “the four Vs”: Volume: Big data is...big. While big data isn’t only distinguishable by its size, it’s also typically very high volume in nature. Variety: A big data set typically contains structured, semi-structured, and unstructured data. Velocity: Big data generates quickly and is often processed in real time. Veracity: Big data isn’t inherently better quality than traditional data, but its veracity (accuracy) is extremely important. Anomalies, biases, and noise can significantly impact the quality of big data.

The Differences between Big Data and Traditional Data

Several characteristics are used to distinguish between big data and traditional data. These include: The size of the data How the data is organized The architecture required to manage the data The sources from which the data derives The methods used to analyze the data Size Traditional data sets tend to be measured in gigabytes and terabytes. As a result, their size can allow for centralized storage, even on one server. Big data is distinguished not only by its size but also by its volume. Big data is usually measured in petabytes, zettabytes, or exabytes. The increasingly large size of big data sets is one of the main drivers behind the demand for more modern, high-capacity, cloud-based data storage solutions. Organization Traditional data is normally structured data that’s organized in records, files, and tables. Fields in traditional data sets are relational, so it’s possible to work out their relationship and manipulate the data accordingly. Traditional databases, such as SQL, Oracle DB, and MySQL, use a fixed schema that is static and preconfigured. Big data uses a dynamic schema. In storage, big data is raw and unstructured. When big data is accessed, the dynamic schema is applied to the raw data. Modern non-relational or NoSQL databases like Cassandra and MongoDB are ideal for unstructured data, given the way they store data in files. Architecture Traditional data is typically managed using a centralized architecture, which can be more cost-effective and secure for smaller, structured data sets. In general, a centralized system consists of one or more client nodes (e.g., computers or mobile devices) connected to a central node (e.g., a server). The central server controls the network and monitors its security. Because of its scale and complexity, it isn’t possible to manage big data centrally. It requires a distributed architecture. Distributed systems link multiple servers or computers over a network, operating as co-equal nodes. The architecture can scale horizontally (scale “out”) and will continue functioning even if an individual node fails. Distributed systems can leverage commodity hardware to reduce costs. Sources Traditional data typically derives from enterprise resource planning (ERP), customer relationship management (CRM), online transactions, and other enterprise-level data. Big data derives from a broader range of enterprise and non-enterprise-level data, which can include information scraped from social media, device and sensor data, and audiovisual data. These source types are dynamic, evolving, and growing every day. Unstructured data sources can also include text, video, image, and audio files. Leveraging this type of data isn’t possible using the columns and rows of traditional databases. Because an increasingly significant amount of data is unstructured and comes from multiple sources, big data analysis methods are required to extract value from it. Analysis Traditional data analysis occurs incrementally: An event occurs, data is generated, and the analysis of this data takes place after the event. Traditional data analysis can help businesses understand the impacts of given strategies or changes on a limited range of metrics over a specific period. Big data analysis can occur in real time. Because big data generates on a second-by-second basis, analysis can occur as data is being collected. Big data analysis offers businesses a more dynamic and holistic understanding of their needs and strategies. For example, suppose a business has invested in a training program for its staff and wants to measure its impact. Under a traditional model of data analysis, the business might set out to determine the impact of the training program on a particular area of its operations, such as sales. The business notes the sales volume before and after the training and excludes any extraneous factors. It can, in theory, see how much sales have increased as a result of the training. Under a big data model of analysis, the business can set aside questions regarding how the training program has impacted any particular aspect of its operations. Instead, by analyzing a mass of data collected in real time across the whole business, it can identify the specific areas that have been impacted, such as sales, customer service, public relations, and more.

ピュア・ナレッジ
ビッグデータの基礎
ビッグデータと従来のデータの違い

ビッグデータの基礎

ビッグデータと
従来のデータの違い

ビッグデータは、顧客の行動に関するより重要な洞察、市場活動に関するより正確な予測、事業全体にわたる効率性の向上など、ビジネスに計り知れない機会をもたらします。

人や企業が生み出すデータは年々増大しています。IDC 社のレポートによると、2010 年に世界で新たに作成されたデータは、わずか 1.2 ゼタバイト（1.2 兆ギガバイト）に過ぎませんでした。この数値は、2025 年には 175 ゼタバイト（175 兆ギガバイト）以上に増大する可能性があります。¹

この豊富な資源を企業が予測分析やデータ・マイニングに活用することで、ビッグデータの市場も拡大することが予想されます。Statista 社の調査によると、ビッグデータ市場は 2018 年から 2027 年にかけてその価値を倍増し、1,690 億ドルから 2,740 億ドルまでに成長すると予測されています。

しかし、ビッグデータと従来のデータでは、どのようのな違いがあるのでしょうか。また、それらは現在のデータ・ストレージ、処理方法、分析技術にどのような影響を与えるのでしょうか。以下に、それぞれのデータの目的を説明するとともに、ビッグデータと従来のデータの活用を成功に導くための戦略の重要性を紹介します。

従来のデータとは

従来のデータとは、これまで多くの組織が何十年もかけて保存・処理してきた構造化されたリレーショナル・データのことです。世界のデータの大半は、依然として従来のデータが占めています。

企業は従来のデータを、売上げの追跡、顧客関係やワークフローの管理に利用しています。多くの場合、従来のデータは操作が容易で、従来のデータ処理ソフトウェアで管理することができます。しかし、一般的にはビッグデータよりも洗練された洞察力に欠け、メリットも限られています。

ビッグデータとは

ビッグデータとは、大規模で複雑なデータ・セットと、このような種類のデータを処理するために使用される手法の両方を指します。ビッグデータには、「4 つの V」と呼ばれる大きな特徴があります。

Volume（データの量）：ビッグデータは、名前のとおり大きなデータを表します。サイズが大きいことに加えて、データ数が多いという特徴があります。
Variety（データの多様性）：ビッグデータには、通常、構造化データ、半構造化データ、非構造化データが含まれます。
Velocity（データの速さ）：ビッグデータは迅速に生成され、多くの場合、リアルタイムで処理されます。
Veracity（データの正しさ）：ビッグデータの質が、従来のデータと比較して必ずしも優れているというわけではありませんが、その真実性（正確性）は極めてに重要です。異常、偏り、ノイズなどは、ビッグデータの質に大きな影響を与える可能性があります。

ビッグデータと従来のデータの違い

ビッグデータと従来のデータは、次のような特徴によって区別できます。

データのサイズ
データの構成
データを管理するために必要なアーキテクチャ
データの生成元（ソース）
データの分析方法

サイズ

従来のデータ・セットは通常、ギガバイト（GB）やテラバイト（TB）といった単位で表せます。したがって、そのようなサイズであれば、1 台のサーバーに集約して保存できます。

ビッグデータの特徴は、サイズだけでなく、データ数の多さにもあります。ビッグデータは通常、ペタバイト（PB）、ゼタバイト（ZB）、エクサバイト（EB）といった単位で表されます。ビッグデータのサイズがますます大きくなっていることが、近代的で大容量のクラウドベースのデータ・ストレージ・ソリューションが求められる要因の 1 つとなっています。

構成

一般的な従来のデータは、記録、ファイル、表などで構成された構造化データです。従来のデータ・セットのフィールドはリレーショナルであるため、それらの関係を把握し、必要に応じてデータを操作することが可能です。SQL、Oracle DB、MySQL などの従来のデータベースは、静的で事前構成された固定スキーマを使用します。

ビッグデータでは、動的スキーマを使用します。ビッグデータは、ストレージ内では raw データであり、構造化されていません。ビッグデータにアクセスすると、raw データに動的スキーマが適用されます。Cassandra や MongoDB のような近代的な非リレーショナル（NoSQL）データベースは、データをファイルに格納する方法において、非構造化データに最適です。

アーキテクチャ

従来のデータは通常、一元化されたアーキテクチャを使用して管理されるため、小規模で構造化されたデータ・セットに対しては、費用対効果が高く、十分な安全性も備えています。

一般的に、一元化されたシステムは、1つまたは複数のクライアント・ノード（コンピュータやモバイル・デバイスなど）が、中央ノード（サーバーなど）に接続される構成です。中央のサーバーがネットワークを制御し、セキュリティを監視します。

ビッグデータは、その規模や複雑さが原因で一元的に管理することは不可能です。そのため、分散型のアーキテクチャが必要となります。

分散システムは、ネットワークを介して複数のサーバーやコンピュータを接続し、同等のノードとして動作します。このアーキテクチャは、水平方向に拡張（スケールアウト）することができ、個々のノードに障害が発生した場合でも機能を維持することができます。分散システムでは、汎用的なハードウェアを活用してコストを削減することができます。

ソース

従来のデータは、ERP（エンタープライズ・リソース・プランニング）や CRM（カスタマー・リレーションシップ・マネジメント）、オンライン・トランザクションで生成されるデータや、その他のエンタープライズ・レベルのデータが一般的でした。

ビッグデータとは、より広範なデータを指し、エンタープライズ・レベルのデータに限らず、SNS から取得した情報、デバイスやセンサーのデータ、音声や映像のデータなどが含まれます。この種のソースは、ダイナミックに進化しており、日々成長しています。

非構造化データ・ソースには、テキスト、動画、画像、音声のファイルも含まれます。このようなデータを、列や行を使用する従来のデータベースで扱うことは不可能です。非構造化データの量が日々増加し、ソースも多様化している中で、そこから価値を引き出すためには、ビッグデータに特化した分析手法が必要となります。

分析

従来のデータ分析は、段階的に行われていました。イベントが発生し、データが生成され、そのデータの分析をイベントの後に行うというステップです。従来のデータ分析は、特定の期間の限られた範囲の指標において、ある戦略が与える影響や変化を理解するのに役立ちます。

ビッグデータの分析は、リアルタイムで行うことができます。ビッグデータは秒単位で生成されるため、データを収集しながら分析を行うことができます。ビッグデータの分析は、企業のニーズや戦略をよりダイナミックかつ包括的に理解することを可能にします。

例えば、ある企業がスタッフのためのトレーニング・プログラムに投資し、その効果を測定したいとします。

従来のデータ分析モデルの場合、トレーニング・プログラムが特定の対象（営業部門など）に与える影響を調べようとします。そのためには、トレーニング前後の販売量を記録し、それ以外の余計な要因を排除します。理論的には、トレーニングによって、どれだけ売上げが伸びたかを知ることができます。

いっぽう、ビッグデータを利用した分析モデルでは、そのトレーニング・プログラムが業務の特定の分野に対してどのような影響を与えたかという質問を設定することができます。つまり、ビジネス全体でリアルタイムに収集された大量のデータを分析することで、営業、顧客サービス、広報など、影響を受けた分野を特定することができます。

ビッグデータと従来のデータの違い：将来における重要な検討事項

ビッグデータと従来のデータには、それぞれ異なる目的がありますが、それらは関連しています。ビッグデータは、より大きなメリットをもたらす可能性があると思われがちですが、どのような状況にも適している（または、必要である）というわけではありません。ビッグデータには、以下のような特徴があります。

市場の動向や消費者の行動をより深く分析できる。ビッグデータが提供できる実用的な洞察を得るには、従来のデータ分析は閉鎖的で制限が多すぎます。
洞察をより早く提供できる。ビッグデータを活用する組織は、データからリアルタイムに学ぶことができます。ビッグデータ分析において、このことは競争優位性をもたらします。
効率性に優れる。デジタル化が進む現代社会では、人々や企業は、日々ほぼ分単位で、膨大な量のデータを生成しています。ビッグデータは、このデータを活用して実用的な方法に解釈することを可能にします。
高度な準備が必要。上記のようなメリットを享受するには、新しいセキュリティ・プロトコルや設定手順を備え、処理能力の強化などを行い、ビッグデータに対応する必要があります。

ビッグデータの台頭によって、従来のデータがなくなるわけではありません。従来のデータには、以下のような特徴があります。

セキュリティの確保が容易。そのため、機密性の高いものや、個人情報などのデータ・セットに適しています。従来のデータはサイズが小さいため、分散アーキテクチャを必要とせず、サードパーティのストレージが必要になる可能性も低くなります。
従来のデータ処理ソフトウェアと通常のシステム構成で処理が可能。ビッグデータを処理するためには、一般的に、高度な構成をセットアップする必要があり、従来のデータ処理方法で対応する場合、リソースの使用量やコストが不必要に増加する可能性があります。
操作や解釈が容易。従来のデータは、シンプルでリレーショナルな性質であるため、通常の関数を使って処理することができ、専門家でなくても容易に扱うことができます。

結局のところ、これはビッグデータと従来のデータのどちらを選択するかということではありません。より多くの企業が大規模な非構造化データ・セットを生成するようになると、それらを扱うための適切なツールが必要になります。両方のモデルをどのように使用し、サポートするかを理解することは、ビッグデータの将来を見据えて戦略を更新するために必要なことです。