Article

The Data Shell Game: The Best Way to Protect Corporate and Institutional Data in the Cloud

Posted August 26, 2014 | Leadership | Technology | Amplify

-- Muhammad Ali

The headlines on data security of late have been chilling. Consider the case of Target Corporation. After a massive 2013 holiday season data breach disclosed 70 million customer records and 40 million credit card numbers, consumer outrage -- and the falling sales that resulted from it -- forced Target CEO Gregg Steinhafel to resign this past May.1 But the Target case is not an isolated one. In June 2014, P.F. Chang's CEO Rick Federico had to admit that the restaurant chain's database had been compromised and that hackers had gained access to thousands of its customers' credit and debit card records. They then made this information available for sale in the same seedy underground of the Web where, just months before, data on those Target customers had been bought and sold like eBay collectibles.2

Furthermore, what these data pirates want is not always financial -- and not only companies find themselves vulnerable. In May 2014, the leadership of a network of Texas healthcare facilities, the St. Joseph Health System, revealed a massive network intrusion that had taken place over three days in December 2013. This security breach compromised the medical and personal records of over 400,000 past and current patients in Central Texas and allowed the hackers to gain access to the facilities' employee records.3 And this summer, the New York Times detailed how Chinese hackers broke into databases of the US government's Office of Personnel Management, gaining access to the records of tens of thousands of federal employees who had applied for top-secret security clearances.4 This comes on the heels of a well-publicized case earlier in 2014 in which digital intruders gained access to personal data on employees and contractors of the US Department of Energy.5 In addition, federal officials are still scrambling to deal with the mounting damage done -- both in actuality and in reputation -- to the US government from the Edward Snowden case just a year ago.6

Of course, data security is not just a US concern; increasingly, it is a global problem with staggering consequences for organizations and their customers worldwide. In 2013, two huge data breaches took place in Japan. First, in May of that year, Yahoo! Japan reported that 22 million usernames had been stolen by hackers in an attack that was only stopped from becoming worse when the company -- literally -- pulled the plug on its servers.7 Just a month later, the company reported that its Club Nintendo site had been the subject of over 15 million fraudulent login attempts that may have exposed personal data of over 4 million of its club members.8 More recently, one of France's leading telecom firms, Orange France, reported two separate data breaches -- one in April 2014 and another in May (after it had implemented new security measures) -- in which hackers gained access to data on more than a million of its customers. Orange France had to warn its mobile clients that they may be vulnerable to "phishing" scams, as hackers had retrieved extensive info on them, including their names, email addresses, mobile and landline phone numbers, dates of birth, and more.9

Taken together, such attacks on the data of companies, organizations, and even governments exact a high cost around the world. Furthermore, efforts undertaken both to prevent unauthorized data access and to help firms recover from the aftereffects of such data breaches take a toll on the global economy as a whole.

The 2014 edition of the Ponemon Institute's annual report on the impact of data breach incidents documents the staggering costs of such data intrusions.10 The report examined data relating to unauthorized data access -- stemming from a "malicious or criminal attack, system glitch, or human error" -- in the following leading economies:

  • Australia
  • Brazil
  • France
  • Germany
  • Italy
  • India
  • Japan
  • Saudi Arabia
  • United Arab Emirates
  • United Kingdom
  • United States

The institute's research uncovered some startling -- and scary -- statistics on the frequency and cost of such unauthorized data access across surveyed organizations in the 10 countries, all of which had experienced a significant data breach within the past year:11

  • The average total cost of a data breach for all firms was US $3.5 million. This represents a 15% increase in the average damage assessment over 2013. However, for US companies, the economic toll was far higher, reaching $5.9 million per incident.

  • The average cost per compromised record (meaning information that can identify an actual person) rose by 9% in a single year, climbing from $136 in 2013 to $145 in 2014. And again, US firms experienced the highest cost, with each lost or stolen record costing an average of $201. Data breaches in healthcare had the highest cost per compromised record ($359), with retail ($105) and the public sector ($100) having the lowest-cost impacts -- but also some of the greatest vulnerabilities. Furthermore, the per-record cost for compromised data was far higher globally from instances stemming from a malicious and/or criminal attack than those attributable to internal data breaches, including system glitches and employee mistakes.

  • The costs of the data breach to the targeted organization varied greatly, ranging from just over $100,000 to over $23 million per incident. Not surprisingly, the larger the data breach -- in terms of the number of records accessed -- the higher the recovery cost to the company, agency, or institution.

  • The costs to organizations to "fix" a data breach are rapidly rising around the globe as such occurrences unfortunately become more common. These include not just internal costs (arising from such activities as "forensic and investigative activities, assessment and audit services, crisis team management and communications to executive management and board of directors," etc.), but external costs as well. The latter customer recovery costs include both direct costs (from such activities as "engaging forensic experts, hiring a law firm or offering victims identity protection services," etc.) and indirect costs (from activities such as "the use of existing employees to help in the data breach notification efforts or in the investigation of the incident, the loss of goodwill, and customer churn"). Moreover, organizations have to take into account the postincident response costs (including legal fees, settlements, and payments to victims, along with marketing costs in response to the incident) and the opportunity costs incurred when dealing with such occurrences rather than engaging in other opportunities.

  • At present, organizations around the world stand an approximate 1 in 5 chance (22%) of experiencing a material data breach -- one involving more than 10,000 records in a single incident -- in a calendar year, with Indian and Brazilian firms having the highest estimated vulnerability (30%).

All of this is even more noteworthy for the fact that, in their methodology, the Ponemon Institute researchers specifically excluded catastrophic data breach incidents of over 100,000 lost or stolen records so as to not skew the overall results.12 Thus, major incidents such as the Target and P.F. Chang's cases do not even factor into these already scary statistics.

As those figures glaringly show, despite the many billions of dollars in expenditures and millions of people being impacted by data breaches around the world, what we have today is a very expensive -- and seemingly unwinnable -- game of cat and mouse being played by hackers with the companies and agencies they target. And let's face it, the cats (whether they be lone individuals or, as is more common today, organized or even state-sponsored groups of professional hackers) are winning handily. Thus it's time for a paradigm shift in data security -- a new approach that will change the game entirely. It's time for the Data Shell Game. In the remainder of this article, I will explain how this new form of data security works and what implementing this new way of managing and protecting data will mean for IT and for organizations as a whole.

WHAT ARE WE TRYING TO PROTECT?

From the time human beings started painting images on cave walls, information has always been stored and transmitted in neat little bundles. From cave paintings, to rock carvings, to clay tablets, to papyrus scrolls, to books, to modern digital libraries containing billions of electronic files, that is the way it has always been done. The problem is, it is no longer expedient to handle information that way. Storing and transmitting information in neat little bundles (i.e., complete data sets) is very conventional and psychologically comforting, but it makes information vulnerable to cyber attack.

Traditional data security measures are primarily focused on (1) preventing hackers from accessing data, (2) detecting access should intrusion occur, and (3) mitigating potential damages resulting from intrusion. Traditional security measures protect data by hardening transmission networks and data centers to become intrusion-proof data bunkers. In brief, conventional cyber security wisdom relies on the dream of the "impregnable data fortress" to protect information that is stored as complete records in these data sets.

The truth is, even with all the tens of billions of dollars that are spent every year on cyber security worldwide, we cannot stop an army of sufficiently skilled, well-provisioned, highly motivated hackers from reading, stealing, or destroying these complete data sets -- whether they be complete credit card account profiles, bank account information, or medical records. All that our best cyber security practices are able to do is temporarily keep the hacker-hordes at bay -- until now, that is.

CHANGING THE GAME

In the world of cryptography, secret sharing is much more than what teenage boys and girls might do in school hallways. Rather, the concept of secret sharing as a cryptographic method was simultaneously advanced in 1979 by both American professor George Blakley13 and Israeli professor and cryptographer Adi Shamir.14 The fundamental premise of secret sharing is that a sensitive piece of information -- a secret -- can be divided into parts, giving each participant his own unique part, where some or all of the parts are needed in order to reconstruct the secret. While such methods can work well when it comes to, say, dividing missile launch codes or safe combinations among two, three, or four people, dealing with thousands, millions, and even billions of bits of information can be a far, far more complex proposition.

Divide and Conquer

In 1989, Michael O. Rabin, then a computer science professor at both Harvard University and Hebrew University of Jerusalem, built upon the work of Shamir and Blakley. Rabin invented a mathematical method -- the information dispersal algorithm (IDA) -- for dispersing information in data sets.15 With an IDA in place, every character, digit, symbol, or other primary data structure of a sensitive piece of data -- whether that be a file, an account, an image, a record, etc. -- gets reduced to "data primitives."

The scientific principle of data dispersion, properly applied, is the best way to protect corporate, government, and institutional data in cloud-based communication networks. Technology developed by BitSpray Corporation, based in Canton, Mississippi, USA, accomplishes this by mathematically deconstructing primary data into sets of data primitives and dispersing them to a number of share volumes (subsets of the original data) by means of an IDA. The IDA is based on the concept of secret sharing, applied in this case to data. Shamir expressed this complex method quite simply, stating that the cryptological algorithm allows one to "divide data D into n pieces in such a way that D is easily reconstructable from any k pieces, but even complete knowledge of k-1 pieces reveals absolutely no information about D."16 Consequently, no complete data set exists anywhere. Using this dispersion technology, the resulting share volumes generated by the IDA contain incomplete subsets of the original data that are distributed throughout a group of globally distributed data storage locations.17

The Data Shell Game is based on the simple premise that attackers can't access what they can't find, nor can they steal what isn't there. At present, when cyber attackers penetrate a data center and steal a file or group of files, they take away a complete data set of intact information. The data may be encrypted, but the purloined files are nonetheless complete data sets. Dispersed files are not. Since IDA-dispersed share volumes only contain partial data sets, and no single data storage location contains the minimum number of share volumes to reassemble a complete data set, hackers are forced to penetrate multiple geographically separated data centers in order to steal enough share volumes to reassemble a complete data set. That becomes a daunting task, as the Data Shell Game makes it a near-impossibility to find additional share volumes, since share volume files cannot be sorted or discovered by their filename, size, metadata, or attributes. And even if a hacker were to find all of the share volumes for the originally targeted data set, these items would be of no use in that they are scrambled, partial sets of the original data. The hacker would have no way to unscramble, let alone reassemble, the file or data set she was after.

This form of Advanced Data Dispersion Technology (ADDT) does not dismiss or downplay the importance of creating impregnable data fortresses, either from a physical or a networking perspective. Rather, the development and use of ADDT merely acknowledges the growing awareness that hackers will find a way to penetrate even the most elaborate cyber defense systems. What differentiates ADDT from conventional data protection systems is that it does not protect networks or data centers; instead it empowers data to protect itself. ADDT should be viewed as the last line of defense. After hackers have penetrated a data center's concrete and Kevlar walls; cut through all the firewalls; defeated IPSec, TLS, and a myriad of other encryption schemes; and are standing in the Queen's Chamber, where the crown jewels are stored, all they find are files full of "bit soup" -- incomplete subsets composed of data primitives. Metaphorically, if all the characters, digits, symbols, and graphics on a typewritten page were peas that were boiled down into pea soup, and the soup were ladled out to, for example, six geographically separated vaults, what would it take to find, steal, and reassemble the original bag of peas?

The Data Shell Game thus defines a standard for obfuscating data in a globally dispersed network topology whereby no complete data set ever travels over a single network path, nor is a complete data set ever stored in a single storage location. The major difference between ADDT networks and conventional networks is that ADDT networks are dispersed networks that are configured in such a manner that the network itself becomes a computer-to-computer, cryptographic, communication network. These networks make it a near-impossibility to ever suffer catastrophic loss of, or experience unauthorized access to, ADDT-protected information. In addition, the enormity of a network becomes a security asset due to the power of the Data Shell Game.

Self-Protecting Share Volumes

If -- and when -- hackers break into a data center and steal an ADDT-protected share volume, which is the principal form of ADDT-protected data, they are forced to break into additional data centers to find the missing pieces for the files they stole. The Data Shell Game makes this feat extremely unlikely, because there is no way to identify or sort one share volume from another. Consider the following obstacles:

  • The filenames are extremely long and contain no information whatsoever that would reveal the original filename, the filenames of any related share volumes, or their storage locations.

  • All share volumes from an original file are of different sizes.

  • The file metadata (when a file was created, modified, or accessed) for all share files is set to a past date, such as 4 July 1776, and that common date is controlled so it never changes.

  • File attributes of all share volumes are set to be the same: Encrypted, Archive.

  • There is no way to determine what the splitting ratio was. Without knowing that, it is impossible to reassemble the original.

  • Each share volume from an original can be encrypted with a different algorithm.

  • Each share volume from an original can be encrypted with a different key.

  • Each metadata package contained in the share volume can be encrypted with a different encryption algorithm and key than that used to protect the share volume itself.

  • Finally, it is impossible to crack a ADDT-protected share volume without first reverse-engineering the underlying software, which contains unpublished proprietary security algorithms.

While each of these points is important in the data protection methodology, the combination of them is lethal -- to hackers, that is. Some of these elements do merit further explanation as to why they are so vital to confounding hackers. The first of these relates to the date. That common date for all share files works to protect them -- and confuse data intruders. If all share files have the same unvarying date, there is no way for hackers to lurk inside a data center and determine which files are more actively accessed or modified (an indicator of a file's relative importance). Files appear in the data center when they are first created, but all share files accessed by the IDA are automatically zeroized and destroyed when accessed. Consequently, when an original file is reassembled, all its share files disappear; when the original is resaved, the share files reappear (maybe in the same data center and maybe not) wearing completely different filenames with unchanged dates and attributes. The practice of using common file attributes -- that is, setting them to a status of "Encrypted" (meaning that the files show that they are encrypted to protect data from unauthorized access) and "Archive" (meaning that the files show they are a backup of an original file, with no original file to be found) -- serves a similar purpose. Such elements help to confound anyone seeking to make sense out of the share volumes, making it truly appear as a jigsaw puzzle with all the same pieces, when what a hacker needs is distinguishing information that is not to be found in the puzzle pieces.

ADDT is principally based on exploiting the "missing information effect" inherent in the operation of all IDAs. The technology further builds on that by employing a variety of proprietary techniques, comprising the corpus of Data Shell Game technology, to provide data-level security. When hackers can no longer access complete data sets from a single source, such as a storage location (data-at-rest) or a network transmission path (data-in-motion), they are forced to hack into additional locations to try to find the missing pieces. The enormity of today's communication networks, combined with the obfuscation power of the Data Shell Game, make finding additional pieces nigh well impossible. So data complexity and the cloud only serve to make the ADDT method a more secure way of protecting all forms of information today.

IMPLICATIONS FOR IT MANAGEMENT AND ORGANIZATIONS

The advent of the ADDT method of protecting data has significant implications for both private sector firms and government organizations alike as they face up to the massive challenge of protecting their data in the age of cloud computing. On a practical level for users, the use of ADDT eases their day-to-day work. This is because of the following characteristics of the technology:

  • Users do not need to know anything about the resources protecting their sensitive information; the protection ADDT provides all happens in the background, beneath their level of awareness.

  • ADDT is especially useful because it is no longer necessary to periodically back up information, since every time a file is saved, new share volumes (each bearing a new name) are sprayed to all data locations.

  • If desired (or required by law), share volumes can be conventionally backed up to preserve archival snapshots at any time.

  • There is no longer a need to determine which file is the most current original file, since there is only one file comprising multiple subsets. Since there is only one file, it is -- by definition -- the latest.

Finally, even with the emphasis on multiple locations for housing share volumes, the actual data storage utilization -- and cost -- of the ADDT system is considerably less than what is required with conventional data storage and security methods.

There is also a "bigger picture" element to all of this -- and one that is quite positive for all of us. ADDT, employing the patented Data Shell Game approach, provides a totally new way of safeguarding data. Yet this is not simply a version 2.0, 3.0, or even 40.0 approach to data security. It is indeed a paradigm shift -- a disruptive innovation. It reinvents how we approach data protection, and this is ever more critical with the move to putting more and more computing services and data in the ephemeral cloud. If we can defeat hackers by not just coming up with a better mousetrap, but by changing the very nature of the game, this will produce a dividend that companies can use for good.

Data protection and computer security costs will not fall to zero with ADDT. However, with more productive security spending, IT spending itself can be made more productive, with more emphasis on features, accessibility, collaboration, and connections. We may well find that the advent of ADDT will enable more innovative companies to thrive with a newfound sense of security in their operations, while consumers -- with fewer Target- and P.F. Chang's-like stories in the front of their minds -- will be more willing to engage in meaningful collaboration with companies to take advantage of the mass-customization of offerings available to customers everywhere today. And this all begins with a new perspective on how we store, share, and protect data going forward.

ENDNOTES

1 Rupp, Lindsey, and Lauren Coleman-Lochner. "Target CEO Steinhafel to Step Down Following Data Breach." Bloomberg News, 5 May 2014.

2 Hellmich, Nanci. "P.F. Chang's Confirms Breach in Credit Card Data." USA Today, 13 June 2014.

3 Carr, David F. "Texas Hospital Discloses Huge Breach." InformationWeek Healthcare, 5 February 2014.

4 Schmidt, Michael S., David E. Sanger, and Nicole Perlouth. "Chinese Hackers Pursue Key Data on US Workers." The New York Times, 9 July 2014.

5 Kirk, Jeremy. "US Department of Energy Hack Disclosed Employee Information." Computerworld, 4 February 2013.

6 Sledge, Matt. "One Year After Edward Snowden's Leaks, Government Claims of Damage Leave Public in Dark." The Huffington Post, 5 June 2014.

7 Schwartz, Mathew J. "Yahoo Japan Data Breach: 22M Accounts Exposed." Dark Reading, 20 May 2014.

8 Phneah, Ellyne. "Club Nintendo Site Hacked, Customer Data Exposed." ZDNet, 8 July 2013.

9 Pauli, Darren. "Orange France Hacked AGAIN, 1.3 Million Victims Seeing Red." The Register, 8 May 2014.

10 "2014 Cost of Data Breach Study: Global Analysis." Ponemon Institute, May 2014.

11 Ponemon Institute (see 10).

12 Ponemon Institute (see 10).

13 Blakley, George R. "Safeguarding Cryptographic Keys." Proceedings of the 1979 AFIPS National Computer Conference, Vol. 48. AFIPS Press, 1979.

14 Shamir, Adi. "How to Share a Secret." Communications of the ACM, Vol. 22, No. 11, November 1979.

15 Rabin, Michael O. "Efficient Dispersal of Information for Security, Load Balancing, and Fault Tolerance." Journal of the ACM, Vol. 36, No. 2, April 1989.

16 Shamir (see 14).

17 Runkis, Walt, Donald E. Martin, and Christopher D. Watkins. "Secure Storage and Accelerated Transmission of Information over Communication Networks." US Patent Number 8,700,890. US Patent and Trademark Office, 15 April 2014.

 

About The Author
David Wyld
David C. Wyld currently serves as the Robert Maurin Professor of Management at Southeastern Louisiana University in Hammond, Louisiana, USA. He is the Director of the College of Business's Strategic e-Commerce/e-Government Initiative, the Founding Editor of the International Journal of Managing Information Technology, and a frequent contributor to both academic journals and trade publications. He has established himself as one of the leading… Read More