A few weeks ago I threw out some ideas with regards to how we can start to work towards better data governance. I’m going to take each section of those ideas I had in turn and explore each of them further. Maybe we can get a few nuggets of mapping goodness to fall out.
I had started down a path where my analogy was planting a garden and the first section was labeled: Understanding our Plot of Land.
If you were to go to the intertubes and look up data mapping, you’d get all of these articles explaining that all you have to do is start listing the data that you hold. Managers read this, instruct their IT department to do just that and here is where the trouble lies. It is not so easy to always tell what data is held. You first need to list what systems (or software) you are running throughout your business.
The reason for this is that systems capture information without people knowing it. Or a better way to put that would be that only a handful of people know exactly what systems capture which details. And sometimes those people do not work for your organisation.
So an easier starting point and one which some organisations have begun to explore is listing out the systems throughout the organisation with basic details: Vendor, Name, Description, Business Owner and ICT Contact. If you were feeling adventurous you might also include: Does the system contain personal info?, network location, vendor contact details, link to a user manual, is the data accessible?, approximate volume of data, approximate velocity of data, the approximate veracity of data, retention period, and deletion method.
This piece of work gives a total overview of the entire organisation’s systems. But care must be taken to remember that it’s not just the tools and large databases that need to be captured. Which email service is being used – these emails will contain a significant amount of data (just think of all the information and documents you keep in your email) Are web servers set up in-house – these generally keep IP addresses if not for general logs, definitely for error logs.
Mapping Data behind Systems
Once there is a master list of systems that are in use within the organisation can we start to work on the actual data that sits behind. This is the slow and painstaking work that involves calling up vendors. Do we have access to the information behind the system? If so, what format does it come in? Is there an API? Can we build a data structure for everything it captures, not just the fields that you see on the screen?
Much like web servers capturing IP addresses, software that has security controls will maintain personal information. Who accessed it and for how long, sometimes what they did. An extremely common example is the Windows operating system. Log data behind the operating system can tell us a lot about who is (or how many are) running the device, when the device is used, what it is used for, and with a bit of ingenuity we can start to figure out why the device is being used. Ever wonder why your phone seems to know when you’re traveling to or from work. Perhaps you have alerts about the bus you normally take or suggestions for a takeaway? That is Microsoft, Apple or Android throwing their weight behind deciphering these logs plus other information to give you the most tailored service they can. Creepy or not.
Mapping those Hard to Reach Digital Data Stores
We all remember or if we’re a bit naughty still use shared drives, USB drives or perhaps we rely on our email to store more than it really should? Dropbox?, Google Drive? Or perhaps OneDrive? All of these hidey holes hold our data and regardless of our initial intention tend to get fairly messy fairly quickly. Have you ever found a well-kept shared drive? If you have an example and can share it, we’d all love to see the screenshots!
We know the data behind the systems, but these are the last bastions of crud that live in our networks.
Mapping those Even Harder to Reach Analogue Data Stores
If that last step was difficult this is going to hurt a bit. While it’s easy to slip into the feeling that data only exists digitally, our heads remind us about hardcopy HR records, surveys, visitor books, forms that are part of a process, delivery slips etc. We need to go all Indian Jones and seek out those filing cabinets. Where are they? What is the access like? And who has access?
Time for a Rest
Moving through these four steps allows us to build a very real picture of data within our organisation. If you wanted to be thorough you could start to map out what is called data flows. These are not just a static picture of what is where, but how information flows between these systems. Maybe we’ll get there one day, but I’d expect the four steps above are already giving us some work to carry out.