Beyond Sentiment @ Semantic Tech & Biz Conference in London

Earlier this week I spent two days at the Semantic Tech & Business Conference in London, organized by our friends at semanticweb.com. It was a really interesting couple of days where developers, researchers, and business users got to share their work. There are several really good synopses of the event out on the blogosphere so I am not planning to summarize the event here, but I did want to just share my thoughts and perspectives regarding the panel discussion I was involved in, entitled “Beyond Sentiment” with Tom Reamy (KAPS Group), Marcello Pellacani (Expert System) and Fabio Lazzarini (CRIBIS D&B). We were each asked to share our personal perspectives on this space to get the conversation going, which worked really well as there was lively discussion and no shortage of questions. The mix of the panel was also well thought out as we had a variety of perspectives, with Tom’s strong NLP/content analytics background, Marcello with a deep semantic perspective, Fabio an applied line-of-business background, and myself talking about my pet subject “socio-semantics”. I couldn’t have been happier :-) { Ok, I confess… I am a total nerd. }

So what does a socio-semantic network mean to me, and why do I think its such an important concept? Firstly some quick background which I hope will explain why I feel so passionately about this space. I joined IBM in 2001 to head up a new team tasked to build IBM’s first UIMA-based Natural Language Processing solution, IBM LanguageWare. In 2006 I was fortunate enough to have the opportunity to join an EU FP6 project called NEPOMUK, where I first got introduced to semantic technologies. And over the last couple of years I’ve been working on social analytics. Therefore when I think about “social networks”, in much the same way that semantic networks (ontologies, linked data, rdf, …) have helped me better understand content by adding context, I see the social network as just layering on person-centric context.

Why is this so important? When we look at analysis of social media, I believe content analytics has hit a wall. Social media is an extremely noisy medium providing very short snippets of text with minimal context, stuffed with ambiguities, analogies, sarcasm, irony, slang, etc. and there is only so much that we can do with the content snippets themselves. In order to improve our analysis to an acceptable level we need to increase context beyond the text and inject more background knowledge into the analysis process. Background knowledge that I believe can be effectively described and applied using semantic technologies.

To give a specific example; earlier this year I built (and showcased at Lotusphere) a prototype designed to identify what people are interested in, and then dynamically filter content (in this case Twitter) to best match this dynamic interest profile. When looking at the filtering part, early in the process it was clear that I couldn’t rely on content analytics alone, but needed to integrate the socio-semantic graph into my analysis process. This gave me 3 things:

  • Improved Content Analytics: Using the graph I was able to (per conversation) (1) disambiguate topics, (2) identify primary & secondary topics, (3) rank conversations according to the Semantic Value.
  • Influence Scores: Using the graph of social interactions I was able to calculate the influence of a specific tweeter against a set of topics.
  • Filtered TweetStream: By combining the Semantic Value and the Influence Score I was able to generate a more valuable stream of content for users.

So what does all this have to do with the topic of the panel discussion? Well I believe this hybrid approach (combining traditional content analytics with socio-semantic graphs) is critically important to move sentiment to the next level, and specifically to move from Sentiment to Opinion. Frequently when people talk about sentiment analysis, they are not really talking about whether “John is positive or negative about Product X” but rather whether “a snippet of content is“. And not being too facecious (OK, maybe a little), content doesn’t really have an opinion, it only contains indicators that cannot be measured in isolation but only map back to the person who created the content and collectively represent an aspect of that person’s opinion.

In order to really ascertain sentiment / opinion we need to integrate the person (their socio-semantic network or social profile) into the analysis process so we can make accurate informed decisions. For example;

  • I am a huge fan of Android, and not so much of Apple. Therefore, anything “negative” I say about Apple has to be couched in those terms.
  • I am also a bit of a drama queen when it comes to giving feedback — things Rock! or Suck! but rarely in between — which means my feedback needs to be normalized.
  • I also like to use car analogies (God knows why) and irony as a way to make a point, which makes my content somewhat ambiguous.

All these subtleties of my personality (my affiliations, agendas, interests, activities) are all coded (or could be) in my socio-semantic network available to be used by content analytics algorithms. So in a nutshell, I see socio-semantic networks as a way for us to fill some of the gaps within the text to help us deliver more informed and accurate analysis.

Some questions…

Is a single influence, sentiment, opinion, or expertise score for a person? This question exactly hits on one of my pet peeves which I blogged about some months ago, so rather than repeat here I will just link to the blog posts; We don’t need Influence Scores, we need flexible People Recommendation Systems! and Do we need a Social Rating Agency? The S&P, Moody, & Fitch for people.

What are some of the big challenges that need to be resolved? There are no shortage of chalenges from Information Overload & Social Silos to Privacy & Security, but for now I am going to respond to Scale. Today you can’t talk about analytics without talking about BigData, however I don’t see this as the single solution to scale. In much the same way as corporate archiving strategies are getting more and more sophisticated and using content analytics to better ascertain what should be kept and what dumped, we will need to do the same in the social media analytics space. But that’s only part of it and what we really need to do is perhaps somewhat unintuitive. We need to increase volumes (over-collect) in same cases and decrease volumes (aggressively dump) in others. For example;

  • Re: Over-Collecting
    In this case we need to grab more than the snippets that immediately relate to my company, iteratively expanding our net which does mean significantly increasing content volumes. However, imagine if we were listening to a conversation and only heard every 5th sentence.

  • Re: Dumping
    In this case we need to leverage streaming analytics to help us filter out the unnecessary content. Even if we had infinite storage and processing power, indiscriminately ingesting everything doesn’t serve us well for 3 reasons; Firstly we end up storing a huge amount of worthless content => increases our hardware, electricity, and maintenance costs. Secondly this unnecessary content gets pulled our subsequent deep analysis => increases our CPU and memory needs. Thirdly, the noise may end up distorting our analysis results.

Where do I see this progressing in the next year?

  • If we look to the announcements from the F8 conference last week, its clear that Facebook are going to be big players in the socio-semantic space through their platform story around OpenGraph. We will see a huge variety of socio-semantic apps and data being generated on the Facebook platform over the coming year. I guess my only concern with all this goodness is the openness issue. Will Facebook become the single source of all truth about people? And if yes, do I feel comfortable with this?

  • Google Plus will now be under pressure to step up to the plate as a social platform, but the jury is still out on how that will evolve.

  • Companies like Klout, Peerindex, and others, with rapidly evolving socio-semantic networks built for their influence analysis, will become more important in the area of sentiment and opinion. Not just for contributing graph data that can help refine the analysis, but also in defining engagement strategies. Perhaps they may even get into the sentiment space themselves, as they could be well positioned to do so.

  • And clearly companies like IBM will continue to focus on evolving both the population and management of enterprise social profiles, and the role that they can play in improving business outcomes, ranging from collaboration to business intelligence, and everything in between.

10 Comments to “Beyond Sentiment @ Semantic Tech & Biz Conference in London”

  1. Thou art the single source of truth about your being :-)

    Like

  2. >So in a nutshell, I see socio-semantic networks as a way for us to fill some of the gaps within the text to help us deliver more informed and accurate analysis.

    Spot on and a terrific post !

    Like

  3. Very interesting post Marie! Thanks for sharing! There is surely a very interesting but, untapped dynamics of People, Content and Network dimensions, which are foundation of a social network. And it can be understood well by studying them together. I’m also exploring this direction in my PhD work, please see the following for more details: http://archive.knoesis.org/research/semweb/projects/socialmedia/

    Like

  4. I fully agree that some “sentiment” approaches currently out there are too coarse-grained to provide actionable value, but I’m not so sure (yet) which domains really benefit from building up a “social graph” to interpret the findings. Yes, there are fanboys out there (esp. around apple/android.-) whose sentiment you may want to take with a grain of salt – but then again, is everybody reading their posts aware of that, or will too many people take this at face value, so you end up treating this sentiment as importantly as a negative sentiment from one of your advocates? And there are other domains (lower-cost CPG, for example), where you have very sparse interactions, making it harder to enhance your sentiment analysis with a graph.
    I’m fully with you when you say the social graph has an essential role around topic-based influencers, which help to “decrease volumes”, as you rightfully put it.
    To me, currently, extracting user behavior (also by building up a social profile) beats both sentiment and graphs in terms of actionable information – but we’ll see what 2012 brings.-)

    Like

  5. Fantastic post. I am working on similar idea and recently we have a workshop paper which will be in proceedings at SPIM 2011 (ISWC 2011). Please have a look at http://wiki.knoesis.org/index.php/Personalized_Filtering_of_the_Twitter_Stream
    Adding sentiments to the interests of the users is in our agenda.

    Like

Leave a comment