The Data Dilemma: Part 2

As I search around YouTube for movies, presentations, etc., I begin to realize that with a bit of judicious use of Google, I can find nearly anything. That is, anything about businesses, personnel, corporate structures, and even personal information. Yes, even without spending money, there is data available about every conceivable part of your corporate and personal life. After spending a few dollars, there is quite a bit more. With all this data, is there really any privacy? We seem to think our data is private. Why? This is the next part of the data dilemma.

In the past, it was difficult, but not impossible, to find data about any organization. Most of this data took the form of public or government records. There were records at the town, county, and state levels. There were also records at the federal level. We may not have known what was being bought by organizations or people, but there was a good chance to know just about anything else we wanted to know. Now, however, we can find out much more. Data is everywhere. It is possible to find out what was shipped to an office, when it was received, and from where.
As humans, we share too much. During WWII, the phrase was “loose lips sink ships.” And yet, we continue to overshare. Social media is the new place to chat about things better left unwritten, or even unsaid. Data is the new battleground. Many companies want as much of it as possible. They want data not specifically, but generally. They wish to use the data to refine artificial intelligence and create thinking systems—systems that will know your every desire simply by what you do online. This may sound farfetched, but it is probably closer than we know.
We know where people are at companies, we know when they will be around, and we know even more about organizations than we ever did before. The phone books within older corporations used to be considered intellectual property. We no longer need them; email addresses, pictures, and phone numbers are readily available for nearly everyone anyone might wish to contact. This is the data used by those who do phishing or use other malware.
The real issue with our data is that it is already out there. The Internet has made it easier to gather. Easier to find. Easier to use. With the implementation of APIs, services that provide the data make it easy to mine. Even those who think they make it harder in fact make it easy, but not for automation and orchestration tools. The data becomes wide open. Since all this data I am talking about is public data, the data is still there—for the taking. Corporations use that data. Criminals use that data.
This is the data dilemma. Data that organizations can use is also the data that criminals want to use. Most of that data is public. The data of the new breed of business that creates data is also possible to mine. How do we defend against this? It is nearly impossible. We need to be vigilant, as the data is not going away.
In fact, data is increasing just from government regulation alone. Each new form gets scanned in somehow. This is an ongoing issue, but one we cannot address at the moment. Think about how your business is registered, how it leases property or buildings, how it sends and receives shipments, and how it pays taxes. All of this information ends up in public records, all searchable in some form. Google itself does an admirable job of collecting all this information in one place. Businesses themselves overshare information. Every site has quite a bit of personnel and other information.
We are in the realm of data sharing, yet we overshare, and that data is what criminals like to see. That is the dilemma. If we can see it, so can the bad guys. We, as technologists, should know what we share, know the risks, and know how what we share can be used negatively. Google hacking, Shodan hacking, and other data-gathering techniques are paramount to finding those negatives: negatives that we can turn into positives. That is our dilemma. Knowledge is power.
How do you gain knowledge of your organization?