Misguided Notions: A Study of Value Creation in Real-Time Search
This past week has seen a ton of hype that real-time search is different. That the content creation and behaviors expressed and implied are fundamentally new and exclusive to places like Twitter and mining this data has incredible potential. Having spent a decade in search and the last four years working with dynamic content, APIs and what is now being called real time web I want to add some much needed perspective.
Getting Up to Speed
Real-time web did not start with social updates. Representative State Transfer or REST has been a key component of social media from the start of blogging to sites like Flickr and now to the cloud with services like Amazon S3. Cross platform synchronization (e.g. mobile, XMPP) is also not new. Many of these web services have had this ability for sometime. Certainly there is a slow boil with the opening of structured web of data, but technically there is nothing new here.
The “real-time” threat to Google has also been mentioned more than once. As far as search, nothing going on somehow surpasses Google’s ability to index it in real-time (milliseconds). I have personally seen my blog posts indexed immediately and presented on SERPs. The idea that Twitter has some magical real-time stack that Google should be concerned about it is quite simply laughable.
While I’m on the subject of GOOG an important real-time aspect to Google that has gone overlooked in all the chatter is their advertising platform. AdWords is a dynamic marketplace with numerous synchronous and asynchronous real-time rules. Last fall Google moved to real-time quality score calculations meaning these computations are taking place at the time of the query, based on the query. Let’s be clear, Google has a real time advertising optimization platform for content that works insanely well. It is to date the ultimate achievement in real-time web and search. Nothing comes close. Positioning them as a laggard is naive. In fact if you want to know the real web of sentiment you might be best off looking at keywords and bid history.
Now that we put aside the technical and competitive aspects the question remains, what is the value of real-time search? The answer must be looked for not from the perspective of content creation (new content is always being added to the web) but in how and why people search. Does “real-time” data to sentiment and opinion provide increased relevancy, new relevancy or changes behavior in a way that’s different or more beneficial. Keep in mind, these numbers are not about how many people will search for real-time data they are about how many queries conceivably could benefit.
My guide through searcher behavior is the landmark 2004 research “Understanding Goals is Web Search” by Daniel Rose and Danny Levinson that built off the seminal work “Taxonomy of Web Search” by Andrei Broder. Broder came up with the original trichotomy of web search “types”: navigational, informational, and transactional that Rose and Levinson expanded upon. To this day their work remains search’s de facto query classification system.
To get an idea of representative percentages of queries that would benefit from real-time data contribution within each category I incorporated my own research on both in AdWords and Google Trends for query volumes looking at about 100 queries with real time relevance (e.g. “what’s on tv right now”) vs. those without. Lastly, I added my years of experience in search data and behavior to extrapolate the results. Also, the benefit number assumes result sets that simply don’t exist yet and as such these percentages are forward looking.
Was this scientific? No. Do I think the numbers are pretty accurate? Yes.
Key: Query Type (Overall query %)* Real-Time Data Benefit %
Informational Queries: My goal is to learn something by reading or viewing (61%) 14%
· Directed: I want to learn something in particular about my topic (7%) 2%
· Undirected: I want to learn anything/everything about my topic (22%) 5%
· Advice: I want to get advice, ideas, suggestions, or instructions (5%) 2%
· Locate: My goal is to find out whether/where some real world service or product can be obtained (24%) 5%
· List: My goal is to get a list of plausible suggested web sites each of which might be candidates for helping me achieve some underlying, unspecified goal (2%) <0.5%
Resource Queries My goal is to obtain a resource (not information) available on the web (25%) 5%
· Download: My goal is to download a resource that must be on my computer or other device to be useful (5%) <0.5%
· Entertain: My goal is to be entertained simply by viewing items available on the result page (6%) 2%
· Interact: My goal is to interact with a resource using another service I find on the web (6%) 2%
· Obtain: My goal is to obtain a resource that does not require a computer to use. I'm not obtaining it to learn some information, but because I want to use the resource itself (8%) <0.5%
Navigational Queries: My goal is to go to specific known website that I already have in mind. The only reason I'm searching is that it's more convenient than typing the URL, or perhaps I don't know the URL. (14%) <0.5%
Overall I feel the query numbers for real-time benefit (about 19% of all queries) are optimistic. I have made some very large assumptions both about the ability to index and query real-time data in a manner that is useful and about the changes in people wanting or needing to query this data once knowing that it is available. Also, I did not want to discount any category as being useless even though it is hard to see at the present time how navigational or resource>obtain queries stand to benefit from real-time data. In every instance I gave the benefit of the doubt to real-time.
The technology to present real-time data as helpful to queries has not yet emerged. Even so, while it real-time updates might be helpful for a small percentage of queries it is not even close to being more helpful in any one category. That’s the biggest problem for real-time search.
The largest percentage is not surprisingly informational queries. If Twitter search can become anything it would be more a discovery engine than a search engine -- more Craigslist, Wikipedia or Yelp than Google. It is after all a publishing and communications platform. Put another way, people search on the NY Times for opinions, sentiment and news of the day but that does not make the NY Times a search engine.
The underlying value of temporal content correlates to benefit it provides to the searcher at that moment of attention. Those ‘real-time’ moments are fleeting and once they are gone the value of the content disappears with it. Thus to have substantive value you need millions of fleeting moments, all the time, that can best be helped by understanding what is happening right now. That’s an interesting idea but it is simply not the way people search. Understanding the way people search the opposite actually holds true. The greatest value rests in content that retains usefulness or importance the longest. In fact, that’s one idea that transcends search. Though with the vigor that Google is scanning books, maybe not for long.
Also for search to work properly there must be a level of authority associated with the results set. I just don’t see how to filter through this noise in real-time. Even trying to do so begins to destroy the value of an open real-time system where the benefit of "right now" matters more than "who."
It’s great to imagine what can be possible with the web but we’re not going to build anything that changes human nature, only stuff that amplifies it. Certainly Twitter does that incredibly well, just not in a way that benefits most searchers.