Challenge 1: Graph Modelling & Construction

This is the 1st of 3 challenges that I raised in my post “A semantic perspective on the challenges of analyzing human networks

Anyone who has ever built an analytics solution will attest to the fact that it generally tends to be 80% brawn and 20% brains. By that I mean that we spend the majority of our time manhandling data; collecting, cleansing, verifying, mapping, and then massaging it into the shape we need for analysis. At the best of times this can be a total pain in the neck, but it’s particularly painful when building human networks. There are many reasons for this, not least of all the fact that the data required to build these networks tends to be scattered across many lines of business all with different vocabularies and schemas, located within various collaboration, communication, and social systems, buried inside business applications, spanning intranet and internet, and generally comprising a mix of structured and unstructured data. However the biggest modelling challenge is that we tend to look at the data quite differently than the applications that created it in the first place.

In the world of human networks the person is the center of our universe, whereas with most other applications it’s all about the business object (piece of content, code, product, …) or event (sales transaction, code checkin, content creation) with the person as a secondary piece of meta-data. For example; a content repository might have a record for each piece of content with a person referenced as a UUID in the Author field. In our would this would all translate into two nodes (Person and Content) with an edge (Create) in between and a extensible list properties on the nodes and edges (such as date, time, source, frequency, duration, location, …). Now while this may seem like a trivial transformation, it can be tricky for several reasons.

Firstly, we frequently run into issues of implicit semantics that we need to make explicit in our graph. In this example its implicitly understood that “Author is a Person” which implies that “Person created Content”, or at least created the initial file. Secondly, once we create these new structures we tend to run into additional mapping challenges which often require us to conflate, split, or infer new mappings, and generate new meta-data. For example; how do we represent co-authors? In a business object centric world only one person can physically create the file, so co-authors are frequently represented as Editors. Do we model co-authors as editors? But what does that say about their contribution? Does the Create edge imply a stronger relationship to the content than the Edit edge? Maybe representing everyone through the Create edge and using a Contribution property is a better approach, where Contribution is measured by frequency of edits or size of deltas, or even type of reaction (comments on their edits).

This is a very simple example, but as you can see it can get messy very quickly; and our world is littered with much more complex mapping challenges.

What we need are not just tools that make this mapping easier and that allow common semantics to be described (semantics that can support a wide range of analytics), but tools that make it easy to share these models between projects. What I’ve found on my travels, is that designing models for human networks is a very specialized job. It requires people who understand the applications that generate the source data (to provide context) and the applications/scenarios that will use (query/analyze) the transformed data/network. It also requires (and this is generally the hardest to find in the business) people who understand people; social scientists that are able to inject the humanity into the human network. Data is sterile and doesn’t always tell us what we think it tells us; this is where the human dimension within modelling becomes so important

We can’t afford every project team that wants to use a specific type of human network to have to create it from scratch each time. The networks can be distinctly different (social network, e-mail network, telecoms network, content network), however are fundamentally similar from a real world perspective. A piece of content is a piece of content irrespective of whether it’s a blog post, a forum reply, or an e-mail, however what these pieces of content tell us about the people interacting with them is very different.

We need a mechanism to allow these models to be built, shared, customized, extended, and done so collaboratively where the social scientists, data scientists, application developers, and business users can all contribute their perspective. This is why we need semantic modelling and tools that allow collaborative development of human network models. Once these models & transformations have been defined we still need to create the optimal graph structure and make it easy for applications to use the graph and data to be ingested into it; a topic I will cover next under “Challenge 2: Opening a Graph for Business”.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: