Secure Your Data In Hadoop – Follow These Five Steps

May 27, 2016 | Big Data, Hadoop, Technology

Read Time: 10:00

Which is the biggest concern for data professionals? Ask this question and you will get a unanimous reply – Data Security. Although there are numerous ways of securing your data, here are five steps which would help a data professional secure their data in a Hadoop environment.

data security

Audit And Understand Your Hadoop Data

First things first, you should take an inventory of the data that you wish to store in your Hadoop environment. By doing this, you will be helping yourself to know what’s going in; thus, enabling you in understanding and ranking the sensitivity of that data. At the face of it, it might look like a daunting task, but avoiding this exposes your data to potential attackers who can grab your data at whim and sort it at their will. If these attackers are willing to invest time in finding what you have, you should also take measures to prevent this from happening.

Perform Threat Modelling On Sensitive Data

Threat modelling has a simple goal which is to identify the potential vulnerabilities of at-risk data and understanding how this data could be used against you upon being stolen. This is a simple step. Let us understand this with an example – it is a well-known fact that personally, identifiable information has a high black market value at all times. But measuring data vulnerability is not so simple after all. Your date of birth might not seem as sensitive information to you but provide with an area code, a date of birth gives criminals a lot to go rogue on you. One should understand how different types of data can be combined to be used unethically to harm someone.

Identify The Business-Critical Values Within Sensitive Data

It wouldn’t make any sense to secure the data if the security measures put to use neutralize its business value. As a data professional, it is very important for you to understand whether the data characteristics are critical for downstream business processes. Take a look at your credit card; there are certain numbers (digits) on it which are critical to identify the issuing bank, whereas the other digits are only useful for the purpose of transactions. It is only by recognizing the digits you need to retain that you can know whether to use data masking and encryption techniques which make re-identification possible.

Apply Tokenization And Format-Preserving Encryption On Data As It Is Ingested

A data professional has to choose between these two techniques to protect any data that requires re-identification. Though there are various techniques of obscuring data, tokenization and format-preserving encryption are particularly suited for Hadoop as they keep collisions at bay which prevent you from data analysis. Both techniques have their respective use-cases; you should expect to use both on the basis of the characteristics of data which is to be masked. Format-preserving technologies enable most of the analytics to be performed on the de-identified data; thus, securing data-in-motion as well as data-in-use.

Provide Data-At-Rest Encryption Throughout The Hadoop Cluster

We have already spoken about how Hadoop data gets replicated immediately as soon as it enters the environment which makes it difficult to trace where the data has gone wrong. This technique comes in handy when hard drives age out of the system and require replacements. Encrypting data-at-rest puts an end to your worries about what could be found on a scrapped drive once it moves away from your control. Of all the above-mentioned steps, this one is most likely to be overlooked as it is not a standard feature offered by Hadoop vendors.

There is so much to learn about Big Data and Hadoop and simultaneously, so much that one can work on in these environments. Big Data and Hadoop are two of the hottest skills in demand these days and present great opportunities. At Cognixia, we have the specialised training program on Hadoop Administration as well as Hadoop Development. These trainings are designed in a manner which teaches you the nuances of Hadoop and acquaints you with its environment.

Workforce Transformation

Quick Link

Hire Skilled Talent

Quick Link

Upgrade Your Digital Skills

Quick Link

Get Hired

Quick Link

Industry

Quick Link

Application Development

Quick Link

Big Data and Analytics

Quick Link

Business Intelligence

Quick Link

Cloud and DevOps

Quick Link

Cyber Security

Quick Link

Development

Quick Link

Internet of Things

Quick Link

ITIL® and IT Service Management

Quick Link

Java/J2EE

Quick Link

Machine Learning and Analytics

Quick Link

Management

Quick Link

Microsoft Technologies

Quick Link

Mobile

Quick Link

Web Technologies

Quick Link

Master Class

Quick Link

Webinars

Quick Link

Workshops

Quick Link

Blog

Quick Link

Podcast

Quick Link

Tech News

Quick Link

Awards

Quick Link

Careers

Quick Link

Our Culture

Quick Link

Locations

Quick Link

Referrals

Quick Link

Audit And Understand Your Hadoop Data

Perform Threat Modelling On Sensitive Data

Identify The Business-Critical Values Within Sensitive Data

Apply Tokenization And Format-Preserving Encryption On Data As It Is Ingested

Provide Data-At-Rest Encryption Throughout The Hadoop Cluster