TechWeb

Expert Analysis: A Case for Socialization of Data

Jul 26, 2010 (11:07 AM EDT)

Read the Original Article at http://www.informationweek.com/news/showArticle.jhtml?articleID=226200273


Just about every organization with customers (and prospects, competitors, and stakeholders) who use social media realizes by now that online social sources can be mined for significant enterprise business value. People post facts (true or not), experiences and opinions -- to blogs, Facebook, Twitter, YouTube, you name it -- related to brands, product and service quality, and competitive position. Yet the social-media analytics state-of-the-art is siloed and unsophisticated.

Industry needs a socialization of data that would bring social data into BI and enterprise-analytics platforms. This would drive social-media strategy in coordination with other enterprise channels, incorporating social-media insights into everyday operations. We're talking comprehensive analytics -- total enterprise awareness -- for fully informed business decision making. The goal is not unlike that of DARPA's abandoned Total Information Awareness program (living on at an archival site), albeit built for business.

As I've written in earlier articles and blogs, true 360-degree views must encompass enterprise feedback and social media and location intelligence. Turn the equation around: comprehensive social-media strategies will rely on the full set of enterprise data sources, including operational and transactional systems, to the greatest feasible extent.

Social Data

Let's define social data as data extracted from social media. I see three principal elements:

  1. People, organizations: connectors
  2. Connections
  3. Messages
These elements constitute a social graph, a network of nodes (connectors) linked by edges (connections), plus content. By message, I mean essentially the information transmitted when a connection is used: a blog posting or comment, an article, a Facebook status update or posted item or even a poke, and a phone-text or e-mail message; also, actually, phone conversations and in-person encounters. To use the metaphor of a letter, a message has an envelope with From and To addresses and, sometimes, markings that indicate Fragile or Special Delivery or Delivery Receipt Requested. This envelope information is distinct from the message content, which may even be encrypted and inaccessible to anyone but the recipient.

There's no good reason, in this set-up modeling exercise, to distinguish electronic from real-world social networks. For that matter, when someone concludes a transaction online or in a store -- makes an inquiry or a purchase or a service request -- how is that so different from sending a message over what's conventionally seen as a social network?

This non-difference is the basis for my advocacy of integrating social-media-sourced data with data from corporate transactional and operational systems. We (in the business world) have, for years, sought that illusive "360 degree customer view": a portrait of the customer that assimilates information drawn from every customer touchpoint. A coherent, coordinated, complete view lets us better serve and profit from customers, right? Think of social platforms as additional customer touchpoints.




Message Flow

So social data consists of connectors, connections and messages.

Social-network analysis is study of the social graph and the message flow. I'll come back to that later.

You can track message flow using metadata that describes the sender, recipient, and routing; creation and receipt timestamps; the software used on each end; the subject specified by the sender, and so on. Think of e-mail headers or a (more limited) call-data record. This information is generated in the course of operations of social platforms whether Twitter, Facebook, or, given an expanded definition of “social,” e-mail or telecom systems.

Note that connections are not necessarily reciprocal. I, @SethGrimes, follow @CAnalytics (the Smart Content conference, actually myself in a different guise) on Twitter but @CAnalytics does not follow me. Note also that each message goes from one connector to another along an edge (via a connection), that is, in a single direction. And note that many messages fall on figurative deaf ears: E-mail and tweets are often unread; I never do view the video or photos you share view Facebook because, sorry, I don't know anyone in your college-reunion photos or I regret having accepted your friend request but unfriending you would be rude.

Time and Place

Everything related to social-network analysis beyond connectors, connections, and message-flow data is an attribute, examples being the identity and profile of each social-network member. Two attributes are special, worth calling out. Extending my list of three principal elements are:

  1. Temporality (time)
  2. Location
Temporality is important because connections are made and dropped, message transmission isn't instantaneous, and the multiple messages exchanged among network members in the course of conversations make complete sense if their sequence is understood.

Two sorts of location are important: social platform and physical-world location.

Social-platform location is important because a given social-network member may have a presence on multiple social-networking platforms. Restating: Social networks span multiple platforms. Same person, multiple faces, not all visible to any given observer. I, for instance, have two personal Twitter accounts as well as LinkedIn, Facebook, YouTube, and other accounts; I blog for three different sites; I have several e-mail accounts. The union of my friends/links/followers-followees/RSS feeds, etc., that are distinctive to each platform, with my extended-social interactions, is my social network. Updates from my work Twitter account flow automatically to my LinkedIn account and to a box on my company Web site. Updates from my friends & family Twitter account are posted as Facebook status updates. I post links to my blog and other articles to Twitter. My @SethGrimes followers are, in effect, blog subscribers.

Practical solutions are possible only when we tackle tractable problems, but let's put cross-platform analyses, not currently possible with any social-network analysis tools I know of, on our to-do list.

Physical-world location is important because so many of our information needs -- our immediate interests and intent -- are linked to geography. We seek situational intelligence, related to the where it's at of what we're doing and what we seek to do. Physical location is provided by GPS enabled devices, by the user’s network (IP) address, by locations as published via too-much-information services such as FourSquare. There are many crossing points among electronic platforms and the physical world -- ranging from mobile-device apps that annotate a street image with local shopping and entertainment to the by-now old-fashioned-seeming concept of tweet-ups -- really far too many to name.

Content Analysis

With all due respect to Marshall McLuhan, the message is the message, and content is still king. If we want to automate message-data harvesting -- we should, given the huge volume and variety of business-relevant messages -- we need to automate content analysis. Text analytics to the rescue -- technology that applies statistical and linguistic methods to model natural language and extract salient, business-relevant information. Load extracted information to a data warehouse or analytical database and you have, in one place, the raw material for integrated analysis, the complete awareness that can drive social-media strategy and CRM and other outward-facing operations in a coordinated manner.




Data integration

Accurate data fusion is hard, particularly given the diversity of social-enterprise sources. Data from social platforms is typically anonymous, and even when not, it is rarely specified and managed in a manner likely to win the approval of staff responsible for corporate data governance. Data integration will rely on semantics, on automated methods that classify data both according to standardized definitions and in conformance with enterprise master-data management (MDM) programs.

Data integration will allow organizations to weigh customer lifetime value -- the pattern of past interactions and the projected profitability of expected future sales -- and factors such as churn likelihood. This might play in crafting both a personalized response to a customer's tweets and, using customer profiles, better designed and targeted marketing. (But even before that, wouldn't it help to know whether someone who posted is a current or past customer?) It will allow a hotel property to jump from a TripAdvisor review that complained of a dirty room or a long check-in wait to problem correction by identifying the responsible housekeeping staff or registration-desk manager. It will aid in linking off-line outcomes -- store visits, sales, returns -- to online communications, allowing organizations to discern and respond to intent signals in forum postings and status updates.

A Socialization of Data Manifesto

Lest anyone detect either totalitarian tendencies or a red tide in the language describing 21st century data (further examples include the concept of data liberation as embraced by a number of governments and corporations), one should observe that social data is not necessarily free data or open data nor in any real sense our own and private. Sure, Facebook proposes a social graph that opens the platform to external use, but neither they nor other social-platform providers cede control nor, where not forced by privacy and data-protection mandates, the right to set the data rules. Set the data rules, these providers do.

Lymbix is a good, very recent example: an e-mail "tone checker" meant to save us from intemperately harming ourselves. Lymbix specifies terms of service that include, "When you view an email in your Outlook inbox, we will automatically send that message sender's email address to the API (Application Programming Interface) of partner companies." Further, "As we develop our business, we may buy or sell assets or business offerings. Customer, e-mail, and visitor information is generally one of the transferred business assets in these types of transactions."

Similarly, individuals' business interactions with and within corporations certainly are neither free nor open (and should typically not be).

Here I'm describing the customer data world as it is, rather than as it perhaps should be. Social platforms are a form of commons and social-platform data has become, essentially, a community good. It is hard to argue with the forces that have allowed this to happen; the fact is that people readily accept rights-greedy terms of service, and our posting behaviors even suggest an outright disdain for privacy concerns. There are benefits, and there are real and potential costs.

Our world, the place where we work, learn, play, and buy, has become a data world. The socialization of data is about interconnections and interactions and the data they produce, spanning platforms, with an erasure of on-line/off-line boundaries, creating a complete picture that will reframe business and personal decision making.