Search Terms are Expressions of Interest

Limn

One of the things I like about blogging is having the opportunity to see what search terms on Google get people to my page. I like going back to the referring address to see what search term they used and where on the search results page my blog ended up.

We tend to think of Google as an output mechanism. We enter a search term and get the output; the search results are the valuable thing. However, I think the input stream of search terms is equally interesting as it is a stream of real time expressions of interest. All of these people all over the world are providing this rich stream of data that collectively says what people are interested in at that moment.

Google seems to think this too as they are now making search terms available to us in the aggregate as “trends” and are correlating the search term usage to events that may have been either a cause or an effect. Take a look at the trends page and try a search term like “war with Iran” to see what it looks like.

Unfortunately Google’s algorithms for this service seem to require a fairly high threshold of activity to permit the kind of statistical sampling they do; so, search terms like SOSCOE, NCES, TBMCS, DoD Open Source, or other things I might be interested in generally don’t meet the minimum. NECC does, but that’s because there is another NECC outside of the DoD space.

What I wish I could do is subscribe to search terms and receive an event for each time it was used (or aggregate events if it is used a lot). This would be interesting for two reasons.

First, it would take away the threshold requirement that their current algorithms require and would let me look at the data any way I want to. Is the use of “DoD Open Source” as a search term growing over time for example?

Second, and perhaps more interesting though would be the ability to use these near-real-time expressions of interest as causal signals in investment models based on complex event processing. Today many investment firms are using web crawlers to essentially automate the reading of the news. They then attempt, through complex models, to correlate the release of news stories to the effect on various investments. Search term “expression of interest” streams could be the more-real-time event driven equivalent. For example, a regional increase in the search term “hybrid car” might be a leading indicator for increased sales at Toyota (or, it might be a lagging indicator of increased sales last month… there would be a lot to test). Comparing the event stream to the equivalent crawled search terms, it might be possible to determine how much of the event stream is leading vs. lagging the news – which is the cause and which is the effect?

If it turns out it can be proven that expression of interest event streams have value as leading indicators rather then it seems like only a matter of time before Google and other search engines would productize the event stream.

Returning to the DoD space for a moment, what got me thinking about this today was the number of recent referral searches I’ve gotten for “Cyber Command.” There has been a lot in the news about the new Cyber Command lately and that is probably driving much of the interest (lagging rather than leading indicator) but it still be really interesting to see where the searches are originating; who is expressing the interest?

Comments

  1. James Lorenzen - October 1, 2007 @ 11:41 pm

    Not sure if it’s what you are looking for but I can see things like search terms using Google Webmaster and statcounter.com. Here is my blogs keyword analysis: http://my9.statcounter.com/project/standard/keyword.php?project_id=2821167.
    With webmaster I can see when google last indexed my site, top search queries, and lots more.

  2. Mari - October 2, 2007 @ 10:44 pm

    This in a sense is what Google ads do. If you’ve paid to have your ad show up when certain terms are searched, you’re able to see which terms are popular and which are not. Though having not explored Google ads personally, I have to wonder how much of that information Google provides to advertisers up front to help them decide where to spend their money.

  3. Jim S - October 2, 2007 @ 10:58 pm

    Google is certainly using the popularity of search terms in time delayed aggregate for both their trends product, and more fundamentally, for their ad market.

    What I’m suggesting is that the the near-real-time event stream itself might be interesting to support these kinds of applications.

  4. Kit Plummer - October 2, 2007 @ 11:28 pm

    Hey Jim. I do something of the “aggregate” sort with del.icio.us. It is possible to get a feed off any tag. For example, I’m currently interested in OSGi so the “osgi” tag is of interest. This is similar to what you were thinking with Google, but in this case I’m only really getting what people have found interesting enough to “bookmark”. Kinda cool. Then I use Google’s Reader and read myself to sleep from my iPhone. ; }

Leave a Reply

Your email address will not be published / Required fields are marked *