“Dark data” is the data we routinely store during day-to-day business activities, but which we fail to put to any use. And it’s a double-edged sword.
On the one hand, we now have the computing power to analyse and use this data for commercially valuable purposes. The problem in the past has been that this information was unstructured – either plain text or disorganised, without categorisation or metadata. Or it was kept in disparate repositories: our emails, financial statements and employee data, for example, are usually spread across different agencies and databases. But we now have the machine learning capacity to bring these data sources together with enough accuracy to be useful. By interrogating it effectively, companies can engage better with customers and even predict their behaviour or anticipate demand.
On the other hand, dark data is a hacker’s dream. Much dark data is the ever-increasing record of transactions and activities kept by companies for audit and compliance purposes. With some irony, this is just the sort of data that hackers can use. In the Sony hack in 2014, for example, over 100TB of data was stolen, including embarrassing private emails and 47,000 unique social security numbers. Sony set aside $15m to cover ongoing claims for damages, and that doesn’t account for the further cost in reputational damage, both to the company and, in this case, some of its beleaguered senior executives.
More important is the fact that the stolen data may be shared on the black market and stored for later use. In April 2016, thousands of UK families received spurious emails which included their correct home addresses, obtained from one or more dark data hacks. With one further click, the unsuspecting victim was subject to a vicious piece of ransomware called “Maktub Locker”, which charges over £500 to restore the user’s hard drive. The availability of home addresses through hacked databases turns this scam from an unlikely long shot into a highly profitable business.
But let’s stay positive. There’s only going to be more dark data created, and it’s going to present an ever more granular digital picture of our lives and activities. When our fridge orders more milk by itself, and we travel to work by a route designed for us in real-time each morning, paid for with tickets that are an automated purchase, it’s easy to realise that our lives will be monitored in increasing detail. Apps associated with the Internet of Things are expected to create 507ZB (that’s Zettabytes, or 1 Trillion Gigabytes) of data by 2019.
Clearly, there’s serious competitive advantage to be had by those organisations prepared to put this information to use. In manufacturing, for example, a factory will be able to optimise output and minimise supply chain costs by analysing machine logs, product telematics, social media and equipment sensors. Even weather reports can be corralled and used to anticipate demand.
And in marketing, the precision enjoyed by brands online will be extended to much more of the human experience. Today, brand managers have a good idea of who has seen, or clicked, on an advertisement on the web. Tomorrow, they will know which trains are the most effective ones in which to put a poster, exactly who has seen those posters, and what sentiments you may have; based on your financial activities and taste in TV. This is, of course, either engaging or petrifying.
Our future is one in which every event is also digital data-point which can be measured, stored and analysed. Smart businesses will put these data-points together and yield even more useful information. And smart hackers will use this same interpretation for less savoury ends.