We Will Survive the Linguistic Shift

Keywords or questions: is the difference really that remarkable?

**Keep this in mind: Anyone familiar with the game of Jeopardy knows that an integral part of getting the answer right is the ability to quickly frame it as a question….

A couple of sources I’ve recently read have tried to make a valid argument against any seamless implementation of a semantic or natural language web. An arugument? People are trained to keyword search. I hesitate to consider really the root of their beef–are they saying we won’t make the shift from phrases to questions? That’s the bare bones difference between a keyword (Google-type) search query and a natural language (Hakia, Powerset) search query. Query being the actual thing you type into the search field, the “prompters” intended to stir up the information for which you’re looking.

Okay, I know that journalists and pundits are in the biz of creating talk, the more controversy the better, but an arugment like this–that we’re already patterned to work in a keyword inspired environment, just seems lame to me. It also relegates our species back to that of caveman. Remember the Jeopardy rules…

Steven Pinker, in his brand new book, The Stuff of Thought, argues that even as children we are driven to instinctively emulate simple to complex language patterns even well before we have formal instruction. Linguistic pattern as a right of belonging, as part of our human-ness, is elementary, it is fundamental in our collective DNA.

So, NO, we’ll catch on quickly if and when a natural language engine rises to the top. AND if our businesses rest on it, you better believe the learning curve will be short-lived. I mean look at AdWords….when it first launched only the very maverick web marketers and SEOs took to it–were able to instantly rope and tie it. But now, just a few years later (and, yes, many millions of businesses make their money from it) it’s an absolute essential component in any savvy business strategy. If people can learn: how to build a marketing strategy with AdWords, how to navigate and sell everything on eBay, and shop for anything online, then casting a search query in natural language, framed as a question, seems quite….um, natural.


Who is Cyc?

Some search engines vie for fairy dust, others just ante up the goods.

tolkien’s middle earth map

Semantic search engines vie to harness the same fairy dust as did Google–once upon a time. But charismatic, enigmatic, and dismissive geeky upstarts that make billions upon billions of dollars of course earn as many foes as they do dough. My point is that Google’s limelight is still enviable and new search engines like the mysterious Powerset and Hakia are in line.

But, outside the West Coast Search celebrity there are other semantic web forces to be reckoned with that have actually been chipping away at the natural language lexicon for years.

Cyc, “the world’s largest and most complete general knowledge base and commonsense reasoning engine,” is already a working product for corporate and industrial users. I’ve just happened on it and am still trying to digest the literature, but it reminds me of a mini IBM WebFountain, without thecycorp logo supercomputing gusto, but, a powerful engine, nevertheless, that has already “learned” <…..THIS much……>

“What does Cyc know?” According to founder and developer, Doug Lenat, Cyc is able to negotiate this question: “Which American city would be most vulnerable to an anthrax attack during summer?”

Where can I get my very own Cyc?

Like most open source systems designed for industrial applications, Cyc is not nearly as consumer-friendly as a Powerset or Hakia. There is no intuitive, slick little interface. Instead the Cyc main page smacks of the magic language of software developers and Unix users and computer wizards. This is middleware land, Middle Earth, Hobbit-ville. Which makes it all the more enigmatic–I want one.

Powerset Tease


Powerset is playing its launch very safe, measured doses, just a little bit at a time. I just read their schpiel on Power Mouse and Use Cases. Clearly Use Cases capabilities really show the versatility of language nuance that’s being built into their engine.

The Question Asked

What Does Hakia Have that Google Does Not and What Does Google Have that Hakia Does Not?

Google’s Basics of Search tips say that words like who, what, and how are summarily dropped from Google search queries simply because this is how keyword-centric engines operate. I now know this is occasionally the reason for weak and untargeted results that send me clicking on over to Hakia.com to give the semantic search engine a drive. But maybe this is exactly what I’ll do the rest of my search years. There are instances in which Google returns more satisfying results and vice versa, so which search engine is better? Maybe it’s exactly in the question asked.

Who, what, how, and why questions clearly get more specific results with Hakia, but not always enough to fully answer my question and some that have left me with nothing, still.

My simplistic question posed to both, who defines a minority student? clearly illustrates the divergent results. Google is able to return a couple of results that happen to directly reflect my query with the phrase define minority students, but without any direct association with a who.

Hakia, on the other hand specifically returns results, highlighted, too, but specifically associated with the who part of the query.

Proprietary Processes Hakia Boasts

A deeper dip into Hakia reveals a bit of the proprietary processes on which this search engine is built:

OntoSem, or Ontological Semantic parser is “a linguistic theory of meaning in natural language.” OntoSem maintains a highly developed “language-independent ontology of thousands of interrelated concepts; an ontology-based English lexicon of 100,000 word senses, and counting (plus, the lexicons for several other languages under construction); and an ontological parser which ‘translates’ every sentence of the text into its text meaning representation, approximating the complete understanding of the sentence by the native speaker.”

QDEX, or Query Detection and Extraction, is an does a thorough “decomposition” of the WWW prior to any search queries being posited and stores all its possible queries waiting for a user to ask some semantic twist of its data. “The critical point in QDEX system is to be able to decompose sentences into a handful of meaningful sequences without getting lost in the combinatory explosion space.” QDEX interfaces with OntoSem in the miasma of semantic meaning. OntoSem is able to determine which of the billions of semantic options are most meaningful and worthy of indexing.

Hakia’s QDEX

Semantic Rank, if it sounds similar to Google’s Page Rank, the similarity stops there. While Google is very good at determining the authority (may not indicate relevancy) of a webpage based on linking strategies, Hakia and semantic search engines have no such algorithmic variables. Semantic Rank then ranks results by pure meaning, “based on advanced sentence analysis and concept match between the query and the best sentence of each paragraph.

Hakia SemRank