What Is Traditional Data?

Traditional data is structured, relational data organizations have been storing and processing for decades. Traditional data still accounts for the majority of the world’s data. Businesses can use traditional data for tracking sales or managing customer relations or workflows. Traditional data is often easier to manipulate and can be managed with conventional data processing software. However, it generally provides less sophisticated insights and more limited benefits than big data.

Big data can refer to both a large and complex data set, as well as the methods used to process this type of data. Big data has four main characteristics, often known as “the four Vs”: Volume: Big data is...big. While big data isn’t only distinguishable by its size, it’s also typically very high volume in nature. Variety: A big data set typically contains structured, semi-structured, and unstructured data. Velocity: Big data generates quickly and is often processed in real time. Veracity: Big data isn’t inherently better quality than traditional data, but its veracity (accuracy) is extremely important. Anomalies, biases, and noise can significantly impact the quality of big data.

The Differences between Big Data and Traditional Data

Several characteristics are used to distinguish between big data and traditional data. These include: The size of the data How the data is organized The architecture required to manage the data The sources from which the data derives The methods used to analyze the data Size Traditional data sets tend to be measured in gigabytes and terabytes. As a result, their size can allow for centralized storage, even on one server. Big data is distinguished not only by its size but also by its volume. Big data is usually measured in petabytes, zettabytes, or exabytes. The increasingly large size of big data sets is one of the main drivers behind the demand for more modern, high-capacity, cloud-based data storage solutions. Organization Traditional data is normally structured data that’s organized in records, files, and tables. Fields in traditional data sets are relational, so it’s possible to work out their relationship and manipulate the data accordingly. Traditional databases, such as SQL, Oracle DB, and MySQL, use a fixed schema that is static and preconfigured. Big data uses a dynamic schema. In storage, big data is raw and unstructured. When big data is accessed, the dynamic schema is applied to the raw data. Modern non-relational or NoSQL databases like Cassandra and MongoDB are ideal for unstructured data, given the way they store data in files. Architecture Traditional data is typically managed using a centralized architecture, which can be more cost-effective and secure for smaller, structured data sets. In general, a centralized system consists of one or more client nodes (e.g., computers or mobile devices) connected to a central node (e.g., a server). The central server controls the network and monitors its security. Because of its scale and complexity, it isn’t possible to manage big data centrally. It requires a distributed architecture. Distributed systems link multiple servers or computers over a network, operating as co-equal nodes. The architecture can scale horizontally (scale “out”) and will continue functioning even if an individual node fails. Distributed systems can leverage commodity hardware to reduce costs. Sources Traditional data typically derives from enterprise resource planning (ERP), customer relationship management (CRM), online transactions, and other enterprise-level data. Big data derives from a broader range of enterprise and non-enterprise-level data, which can include information scraped from social media, device and sensor data, and audiovisual data. These source types are dynamic, evolving, and growing every day. Unstructured data sources can also include text, video, image, and audio files. Leveraging this type of data isn’t possible using the columns and rows of traditional databases. Because an increasingly significant amount of data is unstructured and comes from multiple sources, big data analysis methods are required to extract value from it. Analysis Traditional data analysis occurs incrementally: An event occurs, data is generated, and the analysis of this data takes place after the event. Traditional data analysis can help businesses understand the impacts of given strategies or changes on a limited range of metrics over a specific period. Big data analysis can occur in real time. Because big data generates on a second-by-second basis, analysis can occur as data is being collected. Big data analysis offers businesses a more dynamic and holistic understanding of their needs and strategies. For example, suppose a business has invested in a training program for its staff and wants to measure its impact. Under a traditional model of data analysis, the business might set out to determine the impact of the training program on a particular area of its operations, such as sales. The business notes the sales volume before and after the training and excludes any extraneous factors. It can, in theory, see how much sales have increased as a result of the training. Under a big data model of analysis, the business can set aside questions regarding how the training program has impacted any particular aspect of its operations. Instead, by analyzing a mass of data collected in real time across the whole business, it can identify the specific areas that have been impacted, such as sales, customer service, public relations, and more.

초심자들을 위한 빅데이터 가이드

빅데이터 vs. 전통적인 데이터

빅데이터는 고객 행동에 대한 중요한 인사이트, 시장 활동에 대한 정확한 예측, 전반적인 효율성 향상 등 비즈니스에 엄청난 기회를 제공합니다.

사람들과 기업들은 매년 점점 더 많은 데이터를 생성하고 있습니다. IDC 보고서에 따르면, 2010년 전 세계에서 생성된 새로운 데이터는 1.2제타바이트(1.2조 기가바이트)에 불과했습니다. 그러나 2025년에는 175제타바이트(175조 기가바이트) 또는 그 이상으로 증가할 수 있습니다¹.

그리고 비즈니스가 예측 분석 및 데이터 마이닝을 통해 늘어나는 리소스를 활용함에 따라 빅데이터 시장도 성장할 것입니다. Statista 보고서는 빅데이터 시장이 2018년에서 2027년 사이 1,690억 달러에서 2,740억 달러로 두 배 이상 성장할 것으로 예측합니다.

그런데 빅데이터와 전통적인 데이터의 가장 큰 차이점은 무엇일까요? 그리고 현재 데이터 스토리지, 프로세싱 및 분석 기술에 이들이 어떤 영향을 미칠까요? 여기서는 빅데이터와 전통적인 데이터를 모두 활용하여 성공을 계획하는 전략의 중요성과 데이터의 유형별 용도에 대해 설명합니다.

전통적인 데이터란?

전통적인 데이터는 정형화 되어있고, 관계형 데이터 조직이 수십 년 동안 저장 및 처리해 왔습니다. 전통적인 데이터는 여전히 전 세계 데이터의 대부분을 차지합니다.

기업은 전통적인 데이터를 활용하여 매출을 트랙킹하거나 고객 관계 또는 워크플로우를 관리할 수 있습니다. 전통적인 데이터는 조작하기 쉽고 기존의 데이터 처리 소프트웨어로 관리할 수 있습니다. 그러나 전통적인 데이터는 빅데이터에 비해 제한된 인사이트와 이점을 제공합니다.

빅데이터란?

빅데이터는 대규모 혹은 복잡한 데이터 세트와 이러한 유형의 데이터를 처리하는 데 사용되는 방법을 의미합니다. 빅데이터는 "V4"로 알려진 4가지 주요 특성이 있습니다.

볼륨(Volume): 빅데이터는...방대합니다. 빅데이터는 크기로 구분할 수 있을 뿐만 아니라, 본질적으로 볼륨도 매우 큽니다.
다양성(Variety): 빅데이터 세트에는 일반적으로 정형, 반정형 및 비정형 데이터가 포함됩니다.
속도(Velocity): 빅데이터는 빠르게 생성되며 실시간으로 처리되는 경우가 많습니다.
정확성(Veracity): 빅데이터는 본질적으로 전통적인 데이터보다 품질이 우수한 것은 아니지만, 그 정확성은 매우 중요합니다. 이상치, 편향성 및 노이즈는 빅데이터 품질에 상당한 영향을 미칠 수 있습니다.

빅데이터와 전통적인 데이터의 차이점

빅데이터와 전통적인 데이터를 구별하기 위해 몇 가지 특성이 사용되며, 그 특징들은 다음과 같습니다:

데이터의 크기
데이터 구성 방법
데이터 관리에 필요한 아키텍처
데이터가 파생되는 소스
데이터 분석에 사용되는 방법

크기

기존 데이터 세트는 일반적으로 기가바이트와 테라바이트 단위로 측정됩니다. 따라서, 서버 한 대에도 중앙 집중식 스토리지를 사용할 수 있습니다.

빅데이터는 크기뿐만 아니라 볼륨으로도 구분됩니다. 빅데이터는 일반적으로 페타바이트, 제타바이트 또는 엑사바이트 단위로 측정됩니다. 점점 더 커지는 빅데이터 세트의 규모는 현대적인 고용량 클라우드 기반 데이터 스토리지 솔루션에 대한 수요를 뒷받침하는 주요 요소 중 하나입니다.

구성

전통적인 데이터는 일반적으로 레코드, 파일 및 테이블로 구성된 정형 데이터입니다. 기존 데이터 세트의 필드는 관계형이므로 서로의 관계를 파악하고 그에 따라 데이터를 조작할 수 있습니다. SQL, Oracle DB 및 MySQL과 같은 기존 데이터베이스는 사전 구성된 스태틱 스키마를 사용합니다.

빅데이터는 다이내믹 스키마를 사용합니다. 스토리지에서 빅데이터는 원시적(raw)이며 비정형입니다. 빅데이터에 접근하면 다이내믹 스키마가 원시 데이터에 적용됩니다. Cassandra 및 MongoDB와 같은 최신 비관계형 또는 NoSQL 데이터베이스는 데이터를 파일에 저장하므로 비정형 데이터에 적합합니다.

아키텍처

전통적인 데이터는 일반적으로 중앙 집중식 아키텍처를 통해 관리되며, 이와 같은 아키텍처는 소규모의 정형화된 데이터 세트에 보다 비용 효율적이고 안전할 수 있습니다.

일반적으로, 중앙 집중식 시스템은 중앙 노드(예: 서버)에 연결된 하나 이상의 클라이언트 노드(예: 컴퓨터 또는 모바일 장치)로 구성됩니다. 중앙 서버는 네트워크를 제어하고 보안을 모니터링합니다.

빅데이터는 규모와 복잡성 때문에 중앙에서 관리할 수 없습니다. 따라서 분산 아키텍처를 필요로 합니다.

분산 시스템은 네트워크를 통해 여러 서버 또는 시스템을 연결하여 동일한 노드로 작동합니다. 아키텍처는 수평 확장이 가능하며(스케일 "아웃") 개별 노드에 장애가 발생하더라도 지속적으로 작동합니다. 분산 시스템은 상용 하드웨어를 활용하여 비용을 절감할 수 있습니다.

출처

전통적인 데이터는 일반적으로 ERP(전사적자원관리), CRM(고객관계관리), 온라인 트랜잭션 및 기타 엔터프라이즈 레벨 데이터에서 파생됩니다.

빅데이터는 소셜 미디어, 디바이스 및 센서 데이터, 시청각 데이터 등 다양한 엔터프라이즈 및 비엔터프라이즈 레벨 데이터에서 파생됩니다. 이러한 소스 유형은 동적이고 진화하며 매일매일 증가하고 있습니다.

비정형 데이터 소스에는 텍스트, 동영상, 이미지 및 오디오 파일도 포함될 수 있습니다. 전통적인 데이터베이스의 열과 행으로는 이러한 유형의 데이터를 활용할 수 없습니다. 점점 더 많은 양의 데이터가 비정형 구조를 띄고 있으며 여러 소스에서 제공되기 때문에 데이터에서 가치를 추출하려면 빅데이터 분석 방법이 필요합니다.

분석

전통적인 데이터 분석은 점진적으로 이뤄집니다. 이벤트가 발생하면 데이터가 생성되고, 이 데이터의 분석은 이벤트가 발생한 후에 수행됩니다. 전통적인 데이터 분석은 기업들이 정해진 기간 동안 특정 전략이나 변경 사항이 제한된 범위의 메트릭스에 미치는 영향을 이해하는 데 도움이 될 수 있습니다.

빅데이터 분석은 실시간으로 가능합니다. 빅데이터는 초 단위로 생성되므로 데이터가 수집되는 동안 분석할 수 있습니다. 빅데이터 분석은 기업의 요구사항과 전략에 대해 보다 동적이고 전체적인 이해를 제공합니다.

예를 들어, 기업이 직원을 위한 교육 프로그램에 투자했는데 그 효과를 측정하려고 한다고 가정해 보겠습니다.

전통적인 데이터 분석 모델에서는 기업이 세일즈와 같은 특정 운영 영역에 대한 교육 프로그램의 영향을 파악하고자 할 수 있습니다. 기업은 교육 전후의 판매량을 기록하고 관련 없는 요소는 제외합니다. 이론상으로는 교육의 결과로 매출이 얼마나 증가했는지 알 수 있습니다.

빅데이터 분석 모델을 활용하는 기업은 교육 프로그램이 특정 운영 영역에 어떤 영향을 주었는지에 대한 질문을 하지 않습니다. 대신, 전체 비즈니스에서 실시간으로 수집된 대량의 데이터를 분석하여 세일즈, 고객 서비스, 홍보 등 영향을 받은 특정 영역을 식별할 수 있습니다.

빅데이터vs. 전통적인 데이터: 미래를 위한 중요한 고려 사항

빅데이터와 전통적인 데이터는 서로 다르지만 관련성이 있습니다. 빅데이터가 더 큰 잠재적인 이점을 제공하는 것처럼 보일 수 있지만, 모든 상황에서 적합하거나 필요한 것은 아닙니다. 빅데이터는:

시장 동향 및 소비자 행동에 대한 심층 분석을 제공할 수 있습니다. 전통적인 데이터 분석은 빅데이터가 제공할 수 있는 인사이트를 제공하기에는 범위가 좁거나 제한적일 수 있습니다.
인사이트를 더 빠르게 제공합니다. 조직은 빅데이터로부터 실시간으로 많은 것을 얻을 수 있습니다. 따라서 빅데이터 분석은 기업에게 경쟁 우위를 제공합니다.
더 효율적입니다. 디지털화가 가속화됨에 따라 사회와 기업은 매일, 심지어 매분마다 엄청난 양의 데이터를 생성하고 있습니다. 빅데이터는 이러한 데이터를 통해 의미 있는 인사이트를 도출할 수 있습니다.
사전 준비가 필요합니다. 이러한 이점을 활용하기 위해 조직은 새로운 보안 프로토콜, 구성 단계 및 사용 가능한 처리 역량을 향상시켜 빅데이터에 완벽히 대비해야 합니다.

빅데이터의 부상이 전통적인 데이터의 소멸을 의미하지는 않습니다. 전통적인 데이터는:

보안이 쉬워 매우 민감하거나 개인적인 데이터 세트 또는 기밀 데이터 세트에 적합합니다. 전통적인 데이터는 크기가 작기 때문에 분산 아키텍처가 필요하지 않으며 서드파티 스토리지의 필요성 또한 적습니다.
기존의 데이터 처리 소프트웨어와 일반 시스템 구성을 사용하여 처리할 수 있습니다. 빅데이터를 처리하기 위해서는 일반적으로 더 높은 구성 설정이 필요합니다. 전통적인 데이터 방법만으로도 충분할 경우, 이처럼 리소스 사용량과 비용이 늘어나는 조치는 불필요합니다.
조작 및 해석이 더 쉽습니다. 전통적인 데이터는 본질적으로 단순하고 관계형이므로 일반적인 기능을 사용하여 처리할 수 있으며 비전문가도 활용할 수 있습니다.

궁극적으로 빅데이터와 전통적인 데이터 중 하나를 선택하는 것이 관건이 아닙니다. 점점 더 많은 기업들이 비정형화된 대규모 데이터 세트를 생성할 것임으로, 이에 따른 적절한 툴을 갖추는 것이 중요합니다. 그리고 빅데이터의 미래를 대비하기 위한 전략을 수립하는 데에는 두 모델의 사용 및 지원 방법을 이해하는 것이 필수적입니다.