Data hygiene is the practice of ensuring that all structured or unstructured data inside of databases or file shares is “clean,” meaning it’s accurate, up to date, and error-free. Data hygiene is also referred to as “data cleanliness” and “data quality.”
In general, poor data quality comes from:
Data hygiene drives security, productivity, regulatory and compliance adherence, and efficiency. It does this by ensuring your applications and business processes are only using data that’s clean, correct, and relevant—and that includes removing sensitive personal data that’s no longer needed. Without good data practices, you’ll be following clues and bread crumbs to dead ends and bad decisions.
Here are some examples of issues that poor-quality data can create in organisations.
Sales and Marketing
A study by DiscoverOrg found that sales and marketing departments lose approximately 550 hours and as much as $32,000 per sales rep from using bad data.
In marketing, bad data can lead to overspending. It can also annoy or even drive away prospects if they receive the same content more than once due to data duplication (i.e., duplicate records with the same name spelled a little differently within the same database).
In online sales, poor data hygiene could lead you to try to sell the wrong product to the wrong client if you’re lacking data about your products and target audiences.
Finance
In financial reporting, bad data can give you different answers to the same question due to data inconsistency, leading to inaccurate and misleading financial reports. These reports could potentially give you either a false sense of financial security or an alarming sense of financial insecurity.
Supply Chain
Bad data can also wreak havoc on supply chains because it makes it very hard to automate processes if those process decisions are based on unreliable location information.
Overall Corporate Goals
On the corporate level, data quality issues can significantly impact your ability to meet your long-term goals. They can cause:
As important as good data hygiene is, many companies struggle to maintain the quality of their data. According to one study published by the Harvard Business Review, on average, 47% of newly created data records have at least one critical (e.g., work-impacting) error and only 3% of data quality scores were rated “acceptable” using the loosest-possible standard.
Various factors can make it challenging to optimise your data hygiene. These include:
Although data quality standards are still maturing, there are certain established data hygiene best practices you can adopt right now to ensure your data quality is—and stays—high.
Best practices include:
Data auditing is key to maintaining good data hygiene and typically the first step in any data cleansing process. Before taking any action, you need to assess the quality of your data and establish a realistic baseline of your company’s data hygiene. A typical data audit involves taking a close look at your IT infrastructure and processes to see where your data lives, how it’s used, and how often it’s updated.
It’s critical to define policies regarding what data is collected and why, especially if the data comes from consumers. This includes solidifying data retention and removal policies. Retention schedules dictate how long data is stored on a system before being purged. Hygiene means knowing what data you’re storing, why, where, and when it needs to be purged. Learn more about data compliance best practices.
Data governance is the collection of processes, roles, policies, standards, and metrics that ensure the effective and efficient use of information in enabling an organisation to achieve its goals. Data governance defines who can take what action, upon what data, in what situations, and using which methods. Good data governance is essential for ensuring high data quality across an organisation.
Finally, good data hygiene comes from automating your data quality-related processes. This primarily means automatically updating your data as frequently as possible to ensure it’s always up to date and correct. Data cleansing systems can sift through masses of data and use algorithms to detect anomalies and identify outliers resulting from human error. They can also scrub your databases for duplicate records.
There are several attributes that comprise data quality. High-quality data is:
If your data meets all of these criteria, you, your systems, and your applications will be working with the best possible information to drive better customer service, better customer experience, and better business outcomes.
Data deduplication, also known as dedupe, is the process of eliminating duplicate copies of data within a storage volume or across the entire storage system (cross-volume dedupe). It uses pattern recognition to identify redundant data and replace them with references to a single saved copy. With Purity Reduce, Pure Storage uses five different data-reduction technologies to save space in all-flash arrays. Learn more here.
Join us for a Pure//Accelerate event happening in a city near you.
Let’s talk. Book a 1:1 meeting with one of our experts to discuss your specific needs.
Have a question or comment about Pure products or certifications? We’re here to help.
Schedule a live demo and see for yourself how Pure can help transform your data into powerful outcomes.
Call Sales: 800-976-6494
Media: pr@purestorage.com
Pure Storage, Inc.
2555 Augustine Dr.
Santa Clara, CA 95054
800-379-7873 (general info)