Extended Boolean: Proximity and Weighting
Most sourcing, recruiting, and staffing professionals are familiar with the “standard” Boolean operators of AND, OR, and NOT. However, I have found that few are familiar with “extended” Boolean functionality, such as proximity (or adjacency) and term weighting.
Beyond Basic Boolean
Extended Boolean offers sourcers and recruiters significantly more control, power and precision when executing searches, and in the hands of an expert – extended Boolean can enable semantic search. Semantic search uses the science of meaning in language to produce highly relevant search results rather than have a user sort through a list of loosely related keyword results.
Relevance is Key
Ultimately, any sourcing or recruiting professional knows that what’s most critical in running Boolean searches on the Internet, a job board, or in an internal resume database, is getting relevant results. According to Wikipedia, “relevance” denotes how well a retrieved set of documents (or a single document) meets the information need of the user.
For sourcing and recruiting, relevant results are typically defined as resumes or profiles of (or information about) potential candidates whose experience and capabilities closely match the hiring profile or job opening that the sourcer or recruiter is trying to find candidates for.
I’d argue that the value of any source of information (resume database, the Internet, etc.) lies less in the information contained within, and more in the ability of a user to extract out precisely and completely what the user needs – finding and retrieving any and all appropriately qualified candidates. Information has no value to you if you are unable to find it and take action on it.
So how can extended Boolean help sourcers and recruiters find more relevant results? Let’s take a look at term weighting first.
Variable Term Weighting
Talented sourcers and recruiters know that not all terms are equally important in a query. In most queries and searches, certain search terms are more important than others. When running standard Boolean queries, all search terms are considered/weighted equally. Unfortunately, many search engines and database search interfaces simply assign relevance to results by the number of search term “hits” in each document. In most cases, the simple frequency of search terms does not correlate to relevant results. This is where the derisive description “buzzword matching” comes from, most often used to denote that there is little skill involved in running Boolean searches counting matched keywords.
Using an Information technology hiring profile as an example – if a sourcer was looking for candidates who have significant experience administering Windows servers and Exchange email servers they might create a simple Boolean query such as this: Windows AND Exchange AND server* and admin*. That search is highly likely to return and rank candidates who are Windows systems administrators who mention Windows many times in their resume/profile and happen to mention Exchange once or twice as highly relevant because of the number of “hits” for Windows – which is by nature a very common term in resumes. This would leave the sourcer with having to sort through a large volume of results to find the candidates who actually have been primarily responsible for administering Exchange servers as well as Windows servers.
Search engines that offer users the ability to assign different weights to each search term enable sourcers and recruiters to move beyond simple buzzword matching and take control of the relevance of the results. Essentially, with variable term weighting you can assign a number value to words to increase their weight when ranking retrieved documents – which does not change the TOTAL number of results, but the ORDER of the results.
Using the same example as above, a sourcer using a search engine that supports variable term weighting could create a Boolean search string such as this: Windows AND Exchange:30 AND server* and admin*. That Boolean query will pull the same number of results as the first search that had no term weighting – however, it will sort and rank the results heavily favoring resumes/profiles that mention Exchange more often in relation to the other search terms, increasing the likelihood that the sourcer can quickly identify candidates who have had experience being responsible for administering and supporting Exchange servers. By employing variable term weighting, the sourcer has increased the relevance of the results.
Now, let’s take a look at proximity functionality:
Proximity
Proximity search functionality enables a user to search for specific terms that are mentioned close to other specific terms. An adept sourcer or recruiter knows that documents with the word “computer” mentioned close to the word “science” will often have a different meaning and relevance than documents that simply mention the words “computer” and “science” anywhere within them.
There are 3 main types of proximity searching: fixed proximity, variable proximity, and adjacency. For the purposes of this post – I will focus only on fixed and variable proximity.
Fixed Proximity
Fixed proximity is most commonly represented by the NEAR operator. The search engines that do recognize and support the NEAR operator typically define NEAR proximity as within 1 to 10-16 words (specific search engines can differ – check their documentation).
Using the example of a Windows and Exchange administrator, a sourcer could craft this search using the NEAR operator: Windows and Exchange NEAR admin* and server*. That search will ONLY return results of resumes/profiles that mention Exchange within 1 to 16 words of any word starting with the root of admin (administrator, administration, administer, administered, etc.). Being able to control the fact that Exchange MUST be mentioned within close proximity to admin* will dramatically affect and improve the relevance of the search results, typically returning results of candidates who either have a title using both terms and/or candidates that talk about being responsible for Exchange administration.
- Managed & administered more than 300 Exchange Servers
- Provisioned & administer multiple Exchange 5.5/2003 servers
- Not only are there administration duties for Exchange and Blackberry…
- Exchange/RightFax administrator
- Installing, Configuring, and Administering Microsoft Exchange 2000 Server
- Administer a Microsoft Exchange 2003/2007 environment
- 8+ years of expertise as a System Administrator in Windows 2003 family, Windows 2000 family, MS Exchange 5.5, MS Exchange 2000, and Exchange 2003
- I am proficient with the following skills; planning, installation and administration of Windows Active Directory, Windows Servers, Exchange Server
- Windows Server Support, Active Directory,Exchange Server 2000, 2003 administration and Blackberry Server administration
- Administer Exchange 2003 Server for corporate email
As you can see, being able to control the proximity of specific search terms essentially increases the likelihood of returning results of candidates who have had administrative responsibility for Exchange servers, effectively increasing the relevance of the results.
Fun fact:
- Did you know that Monster and Exalead support the NEAR operator?
Configurable Proximity
A search engine that supports configurable proximity affords users the ability to precisely control the distance between specific search terms. This can produce even more relevant results than the NEAR operator, because the NEAR operator’s maximum range of 10-16 can allow for some non-relevant results to be returned. The farther words are mentioned apart from each other, the less likely it is that they are semantically related. In fact, at 10-16 words, each could be mentioned in separate bullet points or sentences on a resume and be completely unrelated.
However, with configurable proximity, a sourcer or recruiter can choose the maximum distance between search terms. Although search engines vary with their exact syntax, here is an example of the Windows and Exchange admin search using configurable proximity: Windows and Exchange w/5 admin* and server*. That search can ONLY return results of resumes or profiles that mention Exchange within 5 words of any word starting with the root of admin (administrator, administration, administer, administered, etc.), regardless of order. A maximum distance of 5 words will dramatically increase the relevance of the search results because mentioning those 2 search terms at such a close range makes it more likely that they are mentioned in the same bullet point or sentence and thus more likely to be semantically related. Essentially, this search will only return results of people who specifically mention something about being responsible for administering Exchange at least once in their resume. By employing this kind of search, a sourcer is actually performing a semantic search, as they are looking specifically for people who talk about having a particular responsibility – not just looking for documents that contain words.
Fun facts:
- Did you know that Exalead supports configurable proximity searching?
- Did you know that you can integrate a free, open source search engine that supports configurable proximity and variable term weighting into your ATS or resume database? Check out Lucene.
Conclusion
Hopefully you can see how being able to control the proximity of two search terms can yield results that are FAR more relevant than results that simply mention the two terms anywhere in a document or form – this is the critical difference between the semantic similarity between a search and its results vs. the lexical similarity between a search and its results.
There are countless ways you can apply extended Boolean functionality such as variable term weighting and proximity searching to nearly any industry/hiring profile to create searches that return highly relevant results - results that are more relevant than those that can be acheived with standard Boolean logic. Using a search engine that supports both variable proximity and variable term weighting can empower sourcers and recruiters to quickly find large volumes of highly relevant results, increasing productivity and achieving JIT Talent identification and acquisition.
If you enjoyed this post, please consider to leave a comment or subscribe to the feed and get future articles delivered to your feed reader.







Saw this blog today and thought you might be able to help. I’m new to boolean logic and tasked with finding software engineers on Facebook from top schools. Can you blog about creating strings for such a search?
Better yet, could you reply with example Boolean Strings for finding software engineers on Facebook who are from Top 10 schools (Stanford, Berkeley, MIT, CMU, etc) and live in the Silicon Valley?
Thanks, Amber
headhunt AT ymail
“Did you know that AltaVista supports configurable proximity searching?”
How?? I cut and pasted your suggested search in Alta Vista and the “w/5″ function didn’t work at all – it gave me results that had the text “w/5″ instead. switching to “NEAR/5″ also didn’t work.
What’s the correct format for configuring proximity in Alta Vista?
Cyclocross,
Oops – you got me on that one – that was supposed to read “Did you know that Exalead supports configurable proximity searching?” I’ve since changed it – thank you for pointing out this error.
AltaVista used to support configurable proximity searching (http://www.searchengineshowdown.com/features/av/review.html), but since Yahoo took over, they no longer support most of the advanced search features AltaVista once boasted.
Which search engine that supports variable term weighting?
In need of a boolean search string for Sr Software Engineers with who have experience with Java, C++ and or Ojective-C will be developing software applications for mobile devices in Seattle – can you help?