Phrase Based Optimization: A second Look

The following guest post was written by David Harry of

Hello folks…. Steve decided to keep playin’ Hooky so the loonies are still running the asylum it would seem… we will return you to your regular programming shortly;

Is your SEO world dull? Just doing that same ‘ol thing time and time again? Are you looking for something new and interesting to sink you teeth into? Well my friend.. Step right up and we make what was old new again!!

Ok, that’s my best carnival barker routine… needs work, certainly not my forte. I do though, want to take a little trip to a far away place that I once wrote about back at the beginning of this year. That locale of delight and wonder is the land of Phrase Based Optimization.

Stop Leaning on the Keyword

The world SEO is dominated by the term ‘keyword.’ It has been ingrained in the lexicon for all time and I used to be hard pressed to get away from it. In truth it is ‘phrases’ that we should be considering far more than mere singular words. Recent data suggests that singular search terms account for only 14% of all searches and that 2 and 3 word phrases are more popular at roughly 32% and 27% respectively. This was even further askew in earlier research on the European market.

Obviously we can argue that combinations of (key) words create a ‘phrase,’ but it is not that simple. More often than not a phrase is a combination of words that seek to define a concept, not simply a set of singular ideas. Identifying these concepts and more importantly how a search engine seeks to understand them is always good to look at once in a while.

The main thing to consider when looking at this from a SEO perspective is that ‘phrases’ impart a concept. As such, there is always more than one way to describe something and many ways that seemingly unrelated items (in a Boolean world) are cousins in a phrase based one. As phrases on a given web page are digested they can be sorted into ‘semantically meaningful groups of phrasings’ which teaches the search engine new phrases that are related in any given space. In the end there is a library of related phrases/terms to any given topic – a theme so to speak.

How Phrase Based Optimization works

Now let’s consider value; in each category and its subsequent sub-categories there is an occurrence rate. That means that when a certain phrase is used, how often it is used in other documents and how often. This same process can be used to identify other ‘common’ terms that the search engine would ‘expect’ to see. This is much different than a simple ‘keyword density’ approach as there would be not only a ‘phrase density’ expectation but a ‘related phrase occurrence’ factor as well. Repeating the same term over and over in a high density, with no expected related phrases, would be a futile effort.

The ‘related phrase information’ is stored for a given phrase and when accessed it looks for a count of known related phrases. Documents that score highest on related phrases move higher up the weighting ladder. Through this a page can be given strength on a variety of related phrases concerning a single or multiple concepts or ‘topics.’

Building the results sets

What is important to understand is that this methodology would merely be another layer in presenting the ultimate results page to the end user. It is neither a stand alone concept nor a magic bullet. So let’s have a look under the hood and see what makes it tick.

The identification would begin as such during the indexation process:

  • Collect possible and good phrases,* along with frequency and co-occurrence statistics of the phrases
  • Classify possible phrases to either good or bad phrases based on frequency statistics
  • Prune good phrase list based on a predictive measure derived from the co-occurrence statistics

*A ‘good phrase’ is one that; ‘appears in a minimum number of documents, and appear a minimum number of instances in the document collection.’

When a document (web page) is analyzed, collections of ‘good phrases’ can be stored in relation to it; this score and defined ‘phrases’ may be accessed later in the retrieval process. Now let’s look at the ranking process.

Our happy surfer comes along looking for a ‘wicker basket’

  • Receive query or previous ranking information
  • Create related phrases list
  • Retrieve document set based on ‘good phrase’ thresholds
  • Compare results for phrase thresholds on the result set
  • Re-rank result documents for next phase

Let’s say we have a 2 phrase search such as ‘NHL Hockey, Stanley Cup’

  • Searches for documents relating to ‘NHL Hockey’
  • Secondary processing of those looking for ‘Stanley Cup’
  • Compare results for phrase thresholds on the result set
  • Rank documents for next phase

With a ranking process as such, one would be well served in understanding how to create a ‘theme’ surrounding a given target page for an SEO program. This is not limited to your ‘on-site’ efforts alone as we will see in the next section.

How to leverage Phrase Based Optimization

On site: much of utilizing PaIR is understanding the theme of a given term you are targeting. We can do this through a number of ways to increase singular (page level) and over-all site theme.

  • Domain name
  • Site structure and page naming conventions
  • Content creation
  • Outbound link strategies

By utilizing not only the target term, but related terms we can begin to build a theme to the core target. The obvious one on the list is the actual page content and assigning a strong sense of relevance within the content. For most in the business the domain name is equally as obvious. What is less obvious is that we can utilize the page naming conventions (and content therein) to create a stronger chain.


Along this line we can also create focussed content on each step in the chain. What’s important to remember that each step is an opportunity to tie in more ‘relevance’ through related phrases within that category or theme. The same can be said of outbound links on your target page in that they can also be used to target related phrases on authority sites which themselves are scored on relevance from a phrase score perspective.

Much of the legwork can be done during the keyword/phrase research and targeting phase. There are many long-tail opportunities which now can have a secondary role in strengthening the core money terms. By creating sub-lists of semantically related phrases we can better target the eventual content on a web page.

Off-site: inbound links enter into the PaIR world in the sense that they can be valued depending upon;

  • Link Text
  • Page content ‘good phrase’ score
  • Page Title relevance
  • Site theme

Once again, this is not related to PageRank, simply a standalone metric that can be used to weight in-bound links beyond the PR concepts we all know and love. Further developing your link profile by considering relevance strengthening with such concepts can only help in the long run. This is all that needs to be taken away from it.

What’s the Point?

What should be taken away from our little adventure is that there are many other ways to look at what we are doing. The search engineers are always looking to evolve and so must we. The concepts of Phrase Based Indexing and Retrieval are not an end-game scenario, merely another angle from which to observe this beast. I stayed away from trying to teach a ‘how to’ and opted for a virtual tour through ideas which may give birth to more ideas.

I have been living and working with these ideas for nearly a year now and have had a great deal of experience in playing with them… Is it the basis of my SEO programs? Naw… just a passing interest; another tool in the toolbox. Of late I have been more obsessive with User Performance Metrics and Personalized Search. There are always new things to get one interested in SEO… never get bored.

“By becoming attached to names and forms, not realizing that they have no more basis than the activities of the mind itself, error rises and the way to emancipation is blocked.”
— Buddha

Thanks for letting me drop by…. Until next time; Play Safe.

Download a free sample from my book, Design Fundamentals.


  1. Thanks for the post Dave. I agree completely with putting the focus on a phrase based theme as opposed to several specific keywords. A phrase based approach lends itself more to natural writing, which is good for both people and search engines.

    How you begin the process of research? My own thought is the research would start out as one might expect by brainstorming phrases and then grouping those phrases together into themes. Brainstorming would be followed by making use of keyword tools to further build your phrases and themes. Rinse and repeat.

    I’m curious how you deal with multiple pages of content that are part of the same or similar themes. Take for example this post. It would fit with a general theme of keyword research for seo and more specifically phrase based keyword research. So might others on one site. Do you have a way to decide what subset of a theme to use for optimization on a specific page when more than one page fit into the same subset of a given theme.

    I know I’d read the how to post if you decided to write one or have one written.

  2. Sounds like a plan.. a little ‘how to’ session. I find that the process of the term research produces many of the verticals. Places such as NicheBot produce poorly termed ‘LSI Lists’ of related terms which can be the basis for that. Another little trick is to see what Google finds related by using the ‘*’ before or after a core term. This will show what Google feels is relevant or related to the term. As for structuring, it is best to plan ahead in creating a ‘relevance chain’ via site structure. If that isn’t readily available interlinking related items within the pages, also helps.

    I will put together some specific ideas later in the week. See what we can come up with.

    Thanks for letting me visit with yer peeps… I hope you are well rested now bro.


  3. Thanks for the ‘*’ trick. That’s a new one to me. I’ve used the AdWords keyword tool to see what Google considers a page to be about to find related phrases, but I always like new tricks.

    So you like NicheBot even though the ‘LSI Lists’ are poor?

    I’m in favor of building site structure around keyword themes. I didn’t do it very well here, but am hoping to change that in a coming redesign. I have done it for clients with positive results.

    Looking forward to what you come up with for a how to. Feel free to guest post here or just let me know when you publish the post so I can point readers your way.

    I’m mostly rested and started to get back into the swing of things today. I’m still looking forward to the weekend though.

  4. I have marked it on my list of ‘things to write about’. I have one more in the set of four in the Personalized Search rant-a-long… I should be back to this soon :0)

    I don’t mind the NicheBot stuff, it’s more the term ‘LSI’ that get’s me heeby jeeby. I prefer them being lists of ‘semantically related’ words :0) Otherwise you get folks huffing about Google and LSI which gives me shivers…. ugh…

    I shall be back!


Leave a Reply

Your email address will not be published.