October 1, 2007 by jimstogdill
Search Terms are Expressions of Interest
One of the things I like about blogging is having the opportunity to see what search terms on Google get people to my page. I like going back to the referring address to see what search term they used and where on the search results page my blog ended up.
We tend to think of Google as an output mechanism. We enter a search term and get the output; the search results are the valuable thing. However, I think the input stream of search terms is equally interesting as it is a stream of real time expressions of interest. All of these people all over the world are providing this rich stream of data that collectively says what people are interested in at that moment.
Google seems to think this too as they are now making search terms available to us in the aggregate as “trends” and are correlating the search term usage to events that may have been either a cause or an effect. Take a look at the trends page and try a search term like “war with Iran” to see what it looks like.
Unfortunately Google’s algorithms for this service seem to require a fairly high threshold of activity to permit the kind of statistical sampling they do; so, search terms like SOSCOE, NCES, TBMCS, DoD Open Source, or other things I might be interested in generally don’t meet the minimum. NECC does, but that’s because there is another NECC outside of the DoD space.
What I wish I could do is subscribe to search terms and receive an event for each time it was used (or aggregate events if it is used a lot). This would be interesting for two reasons.
First, it would take away the threshold requirement that their current algorithms require and would let me look at the data any way I want to. Is the use of “DoD Open Source” as a search term growing over time for example?
Second, and perhaps more interesting though would be the ability to use these near-real-time expressions of interest as causal signals in investment models based on complex event processing. Today many investment firms are using web crawlers to essentially automate the reading of the news. They then attempt, through complex models, to correlate the release of news stories to the effect on various investments. Search term “expression of interest” streams could be the more-real-time event driven equivalent. For example, a regional increase in the search term “hybrid car” might be a leading indicator for increased sales at Toyota (or, it might be a lagging indicator of increased sales last month… there would be a lot to test). Comparing the event stream to the equivalent crawled search terms, it might be possible to determine how much of the event stream is leading vs. lagging the news – which is the cause and which is the effect?
If it turns out it can be proven that expression of interest event streams have value as leading indicators rather then it seems like only a matter of time before Google and other search engines would productize the event stream.
Returning to the DoD space for a moment, what got me thinking about this today was the number of recent referral searches I’ve gotten for “Cyber Command.” There has been a lot in the news about the new Cyber Command lately and that is probably driving much of the interest (lagging rather than leading indicator) but it still be really interesting to see where the searches are originating; who is expressing the interest?