Businesses are experiencing exponential growth in data as more devices get deployed at the edge and business processes become increasingly digital— causing their data repositories to reach capacity. For Intelligent Enterprises to fully reap the benefits of software intelligence and embrace a collaborative workforce model of humans and machines (or what Accenture deems Workforce Reimagined), it will be critical to securely process and protect big data. For instance, evaluating and optimizing the performance of human and machine interactions as they work side by side, and “teaching” machines to evolve as the task changes, will all be based on big data analytics. While big data presents a multitude of business opportunities to generate insights and guide actions, it also presents substantive privacy concerns. As part of a strategy to strengthen cyber laws, the US President recently announced a privacy plan for big data, which includes policy recommendations and pending draft legislation to protect consumers’ privacy.10 But despite new compliance requirements, big data breaches are on the rise. Businesses are finding it more difficult to secure big data, especially as traditional database management systems cannot scale enough to handle the data volume, acquisition velocity or data variety–what is often referred to as the three Vs.
The volume challenge Few businesses have mastered the concepts and techniques of effective data protection. To deal with the volume, computations on big data are processed in parallel often using MapReducelike frameworks, where distributed mappers independently process local data during the Map operation, before reducers process each group of output data in parallel. Google originally created Hadoop—the open source implementation of the MapReduce programming model—to store and process public website links; security and privacy were an afterthought. Since security is not inherent, it is difficult to retrofit mappers that perform data analytics with security. In order to secure the computations in these distributed frameworks, businesses must also ensure that the data is secured against potentially compromised mappers. The variety challenge Big data is composed of a variety of data elements, which makes it subject to different regulatory and compliance requirements. For example, an insurance company that collects medical records and financial information about its customers may have to build different data stores for each type of data.11 Since different stakeholders require access to various subsets of data, businesses must use encryption solutions that enable fine-grained access and operations on the data. Today, many organizations still deal with the big data challenge by creating a data lake, a huge repository of raw data in its native format. Such organizations probably need to revisit their data storage practices, segregating that data based on sensitivity level and compliance requirements, and then applying proper security controls. The velocity challenge Businesses do not always know in advance the sensitivity levels of big data because it is being collected in real-time (streaming data) or near real-time. Some data items may not look sensitive on their own, but could reveal private details when combined with other pieces of information; in the aggregate, the data might result in a comprehensive picture that requires protection. To manage the data velocity, businesses should perform data sensitivity analysis more frequently, and apply the right security policy and access controls while the data is fresh. Secure big data processing platforms As organizations build big data repositories and apply big data analytics, various types of data are mixed together, such as business performance and sensor information. When that data combines, it becomes a target. To ensure that only the proper people and algorithms have access, it is vital to secure big data platforms and monitor access through a combination of security controls. More security features are fortunately moving into big data platforms. Hadoop now offers Kerberosbased authentication, which can also be integrated with LDAP and Active Directory for security policy enforcement. Zettaset’s sHadoop was designed to mitigate Hadoop’s known architectural and input validation issues, and improve user-role.
audit tracking and user-level security for Hadoop. sHadoop also gives administrators the ability to establish and store a baseline security policy for all users, who can be compared against current security policy. Finally, sHadoop offers encryption for data at rest and in motion as it gets transmitted between Hadoop nodes. Another option for big data protection is Gazzang (purchased by Cloudera in 2014), which offers a product for end-to-end encryption of data stored and processed in Hadoop environments, data coming from streaming engines such as Apache Sqoop, metadata, and configuration information about a Hadoop cluster. Cloudera is also partnering with Intel on a chip-level encryption initiative called Project Rhino.12 Embed security into data Most businesses choose to build their big data environment in the cloud, where all-or-nothing retrieval policies of encrypted data may push them to store data unencrypted. In these situations, businesses should consider attribute-based encryption to help protect sensitive data and enable fine-grained access controls and encryption. With this technique, the attributes of a secret key are mathematically incorporated into the key itself. When attempting to access an encrypted file, policy checking within the decryption process checks that the policy is satisfied—the cloud does not know the individual file access policies. Sqrrl Enterprise, another big data platform, takes a data-centric security approach: data is embedded with security information that determines access and governance. Fine-grained access control is enabled at the cell level by evaluating a set of visibility labels that are embedded within the data each time a user attempts an operation on that data. Even search indexes, which may constitute a source of data leakage, are secured through termlevel security, ensuring that indexing respects the security policies of the underlying data elements. The platform is built on top of Accumulo, a distributed, hybrid column-oriented, key-value data store originally developed by the National Security Agency, and later submitted to the Apache Foundation. Conclusion Hadoop and other big data platforms are helping businesses analyze and derive insights in entirely new ways. To tap into the full benefit, however, businesses must amplify security measures to protect their information assets and reduce risk. Accenture recommends businesses apply the basic principles of information security to big data platforms, but progressively narrow the perimeter around enterprise data. Taking a data-centric security approach opens the door to processing big data analytics and producing even bigger insights for digital business strategies.