In exploring this topic, broadly across industries and solutions, it’s become increasingly clear that we all speak a different language, which makes it difficult to understand the topic and to build a comprehensive and connected strategy. Now some folks might ask, why do I need a connected strategy? Well for the obvious reason. We are not silos, our lives are not silos, the data we generate is not siloed, and therefore people analytics cannot be siloed. For example; healthcare analytics is not just about analyzing clinical data, but needs to be able to combine lifestyle, consumer, and even social data. Whether I am building a solution that spans industries, or I just want to be able to combine data generated from different contexts, I need to have a connected understanding of the data.
And a connected strategy isn’t just about having a consistent understanding of data across industries and solutions, but also across different data collection mechanisms. For example; if I am looking to incorporate behavioral insights into my solution, there are many data collection mechanisms that may contribute. I may need to collect clicks, monitor views, gather searches, capture interactions, monitor transactions, analyze content, or simply ask questions, all of which combine into a picture of an individual and their digital behaviours.
For all these reasons, the first step in building a strategy requires that we have a consistent way of describing the data. So what is personal data? And what concepts do I need to consider as I explore it’s value?
Below are some of the concepts around which I believe we need to create consistency. These are not legal definitions, but rather represent my personal perspective which influences how I build people analytics systems. Since it’s critical that we integrate data protection regulation into our mental model, I have tried to use similar language to that of the new European Global Data Protection Regulation (GDPR), but that’s as far as the connection goes. I should also note that being in Europe, my perspective is quite conservative and may appear more restrictive than considered neccessary. Only time will tell in terms of which direction the market moves and whether we see a significant shift in the balance of power between the data subject and the data controller.
Data Subject: A natural person who could be identified, directly or indirectly, by the data. This means that even if your data doesn’t contain direct identifiers, such as an email address or user ID, it could still be considered to identify a data subject. For example; if you have data about a person who (a) works for IBM, (b) is based in Ireland, and (c) is analytics strategist, you have effectively identified Marie Wallace. While taking indirect data into consideration is completely reasonable since analytics can very easily reverse engineer de-identification, it does create challenges with joining datasets and demonstrates why understanding the uniqueness of resultsets is so important. It’s a bit of a chicken and egg in that we need personal analytics to secure personal analytics and ensure compliance.
Personal Data (PII): Any information relating to the data subject, which I tend to break down into 5 data types. For those interested in GDPR, I’ve suggested how I believe my perspective loosely maps to the GDPR classification, however since I’m not a GDPR expert it is a bit of a guess and shouldn’t be taken as gospel.
- Information about the Individual: Their name, address, contact details, medical records, employment records, skills, tastes, demographics, etc. This might loosely map to GDPR classification for Identities, Characteristics, Capabilities, and Assets.
- Information about what they do: The files they access, websites they visit, buttons they click, search queries they issue, products they buy, things they say, meetings they attend, places they visit, etc. This might loosely map to GDPR classification for Locations and Habits.
- Information about their connections to others: Their relationship/activities with family, friends, employers, colleagues, etc. This might loosely map to GDPR classification for Networks.
- Information about what they can do: Data that describes their access, authentication, preferences, and rights. This might loosely map to GDPR classification for Technical.
- Information about what others can do with their data: Data that describes their privacy, consent, access, and security settings. This type of data is not explicitly called out under the GDPR classification, but I’m assuming it loosely maps to GDPR classification Technical.
Sensitive Personal Data (SPI): This is personal data that is afforded extra protection and under GDPR(9.1) is defined as data revealing race or ethnic origin, political opinions, religion or beliefs, trade-union membership, and the processing of genetic data or data concerning health or sex life or criminal convictions or related security measures.
Profile Service(s): When I think about building a people analytics system, at the heart of the system is a service through which personal data is made accessible. It may be centralized, decentralized, or dispersed on a functional or geographical basis. This appears to map to the term filing system in GDPR. This definition is broad and has been challenging for folks that still think of profiles as “Information about the Individual” (category 1 above), however as we look to consume, consolidate, and analyze a broader variety of personal data, we need to think more broadly.
Processing: Any operation or set of operations which is performed upon personal data.
Data Controller: The entity that determines the purposes, conditions and means of the processing of personal data.
Data Processor: The entity that processes data on behalf of the data controller.
Recipient: An entity to which the personal data are disclosed.
Consent: Freely given, specific, informed, and explicit indication by which the data subject, by statement or action, signifies agreement to the processing of their personal data. The Legitimate Interests of a controller may provide a legal basis for processing, provided the interests or fundamental rights and freedoms of the data subject are not overriding.
Ok, that’s the most boring part of the series completed, and I hope the next blog post will be a bit more interesting, and much less less pedantic :-)