Semantic Search using the NEAR Boolean Operator
This post will cover graphic examples of how to achieve semantic search using the NEAR Boolean operator on Monster and on the Internet via Exalead using Accounting and Information Technology hiring profiles.
First, if you have not done so already, be sure to read these 2 posts that throroughly explain the concepts of user-defined semantic search for sourcing and recruiting: Semantic Search for Sourcers and Recruiters, and Semantic Search for Sourcers and Recruiters Round 2.
Second, if you’re not already familiar with the NEAR operator, I highly recommend you read about it in this post: Extended Boolean – Proximity and Weighting (look for it in the middle of the post under “Fixed Proximity”) before proceeding any further.
The NEAR operator on Monster
Monster is the only major job board resume database that recognizes and supports the NEAR Boolean operator. According to Monster’s documentation, the NEAR operator has a maximum proximity of 10 words. For example, the search string software NEAR programmer returns ONLY those resumes that have software and programmer within 10 words of each other.
Let’s look at a couple of resume snippets that I used from Monster when I wrote a post on Boolean Search Strings for a Sales Tax Manager. What we’re going to do is go beyond the individual words themselves and look for how the candidates wrote sentences describing specifically what they’ve been responsible for doing. Then we will use the NEAR operator to create Boolean search strings that go beyond simply trying to match the words themselves and attempting to delve into the meaning implied by the words by targeting sentences describing responsibilities.
Notice in both resume snippets, we see verbs like prepared, supervised, managed in close proximity to words like tax, statements, returns, workpapers, schedules, month and quarter end, personnel, staff, etc.
Let’s use the NEAR operator to specifically target sentences where candidates are talking about the types of responsibilites we need them to have had experience with:
prepar* NEAR (tax* or statement* or return*) and (manag* or supervis*) NEAR (personnel or accountants or staff) and tax NEAR (manager or supervisor) and (state or local or federal) and (sales or use)
Breaking down that search – here is exactly what’s going on with the use of the NEAR operators:
#1 prepar* NEAR (tax* or statement* or return*)
This will require the results to have ANY word starting with the root of tax, statement, or return within 10 words of any word beginning with the root of prepar*
Semantic analysis:
This aspect of the search will be highly likely to return results of resumes that have sentences mentioning responsibilities such as being responsible for the preparation of tax returns, statements, and returns
#2 (manag* or supervis*) NEAR (personnel or accountants or staff)
This will require the results to have ANY mention of any word starting with the root of manag or supervis within 10 words of personnel, staff, or accountants
Semantic analysis:
This aspect of the search will be highly likely to return results of resumes that have sentences mentioning responsibilities such as managing and supervising personnel, staff, or accountants
#3 tax NEAR (manager or supervisor)
This will require ANY mention of the word tax to be within 10 words of the words manager and/or supervisor
Semantic analysis:
This aspect of the search will be highly likely to return results of resumes that have titles of tax manager or tax supervisor, as well as resumes with sentences mentioning responsibilities such as managing and supervising tax-related work and/or personnel
Here are a few examples of the results returned by the above search that clearly demonstrate the NEAR operator hard and effectively at work:
If you’re very observant, you’ll see that the second snippet has a title of “Section Manager – Income Taxes.” The reason I highlight this is that a traditional title search for titles such as “Tax Manager” or “Manager of Tax” could not return that result – a nice example of a candidate that most sourcers and recruiters would not be able to find based on a common title search approach, and proof that hidden talent pools really do exist.
However, using the NEAR operator gives us a handy alternative to specific title searching, and by design, allows us to find candidates with not-as-common titles such as “Section Manager – Income taxes.” Cool.
Now, while the first 50 results of that search had it’s fair share of false positives – such as candidates who are now controllers, CFO’s, and such, but were at some point in their careers responsible for managing tax reporting and personnel, I want to point something out:
Using the NEAR operator, that search pulled 10 direct hits among the first 50 results that had titles of Tax Manager, Supervisor, or Director.
Then, out of curiosity, I took out the NEAR operators and simply replaced them with ANDs:
prepar* and (tax* or statement* or return*) and (manag* or supervis*) and (personnel or accountants or staff) and tax and (manager or supervisor) and (state or local or federal) and (sales or use)
Only 4 out of the first 50 results happened to be direct hits of candidates who had titles of Tax Manager, Supervisor, or Director. So we were able to more than DOUBLE our highly relevant matches among the first 50 results alone by employing the NEAR operator. Sweet.
Now let’s show some love to Information Technology sourcers and recruiters
Let’s say you’re looking for software engineers that, among other things, have deep Java experience as well as specific experience designing portals. This one is easy.
Java and (design* or develop*) and portal* NEAR (design* or develop*)
The double mention of the phrase (design* or develop*) is not redundant. The first mention is to find any mention of any word starting with the roots design or develop. The second mention is in conjunction with the NEAR operator, and REQUIRES all results to also have any mention of portal or portals to be within 10 words of any word starting with the root of design or develop.
Semantic analysis:
Using the NEAR operator to ensure that any mention of the words portal or portals is within 10 words of design* or develop* increases the probability that they are mentioned in the same sentence – and if they are mentioned in the same sentence – it’s highly likely that the person is talking about being responsible for designing/developing portals. Which is exactly what we’re looking for.
This is specifically different from just throwing all of the words together and HOPING we get some people who have been responsible for portal design/development. Hope is not a strategy. So we’re using the NEAR operator to target people who ARE responsible for portal design and development, by the design of our Boolean search string.
Here are 3 resume snippets that demonstrate the NEAR operator working its magic:
As you can see, using the NEAR operator worked quite nicely – we targeted the semantics of talking about being responsible for designing/developing portals – and we got it.
The NEAR Operator on Exalead
I realize that not everyone has access to Monster, so let’s take a look at an Internet search engine that everyone DOES have access to – Exalead.
Exalead is the only decent-sized Internet search engine to support proximity searching in the form of the NEAR operator. I say “decent-sized” because it’s not a major search engine in my opinion – mostly because it does not appear to index near as many pages as Google, Yahoo, Live, or Ask, and this is especially and painfully evident when you do back to back searches using the site: command to x-ray into LinkedIn. More on that in another post (here’s a quickie – feel free to run just site:linkedin.com on both Exalead and Google and you will find 10X the total results on Google).
Let’s take a swing at the Java/portal developer search we used above on Monster and point it towards the Internet via Exalead.
(intitle:resume OR inurl:resume) AND java AND (design* OR develop*) NEAR portal* AND NOT job*
Before we look at the results, be aware that unlike Monster, Exalead’s NEAR distance is 16 words – which is getting a little “out there” in terms of proximty. Also – see how I was able to actually use the asterisk for root word/stemming? Man that feels good to be able to do that on the Internet. Anyone from Google reading this?
Pretty nice results, right? If you check the results, all of them mention portal or portals within 16 words of design*/develop*, in most cases resulting in sentences where the candidate specifically talks about being responsible for/performing portal design/development. Which is exactly what we’re looking for – NOT just people who happen to mention those words somewhere in their resume. We’ve leveraged semantics in our search approach rather than reosrting to the ”buzzword bingo” game.
But wait – there’s more! Exalead also supports configurable proximity. So if 16 words is too far of a gap for you and leads to less relevant results, we can tighen that range.
For example, let’s limit the distance between any mention of design*/develop* and portal* to a maximum of 5 words.
(intitle:resume OR inurl:resume) AND java AND (design* OR develop*) NEAR/5 portal* AND NOT job*
We’ve managed to cut the total number of results down significantly, and we’ve also increased the relevance while reducing false positives. Look at how tight those results are! It’s because every single result HAS to mention design*/develop* within 5 words or less of portal*. We are leveraging semantics heavily here because most mentions of those words in such close proximity are in fact referencing responsibility – we’ve blown past word matching to nail people talking about doing what we need them to have experience with! Am I the only person excited about this?
However, as exciting as this is and as tight as those results are, we have to be cognizant of the fact that there ARE relevant results we just eliminated. Yup – anyone who mentioned design*/develop* at a range of 6-16 words from portal* was wiped away and we did not see them. Which is okay, as long as you are aware of this and know how to go back and get them.
As I’ve discussed in many posts, I think the best approach to secondary sourcing is to start tight and highly focused on the most relevant results rather than starting broad and beginning a search buy sifting through tons of false postitives. In other words, if I were target shooting – I would start with a sniper rifle and try to hit the bullseye, rather than start with a shotgun and just be happy to hit the target.
Applying NEAR to LinkedIn
Let’s use the site: command to use Exalead to x-ray into LinkedIn and exploit Exalead’s configurable proximity search to find Java developers who have been responsible for designing/developing portals.
site:linkedin.com AND java AND (design* OR develop*) NEAR/8 portal* AND (inurl:pub OR inurl:in) -intitle:directory
Yup, NEAR works there as well. Every result mentions some mention of design*/develop* within 8 words of portal or portals.
Also – did you notice how I did not use the -/minus sign coupled with 4 to 6 things I was trying to avoid like jobs, answers, and such? I have found that using (inurl:pub OR inurl:in) is a little cleaner - it simply targets public profiles and all I need to NOT out is intitle:directory in most cases.
Conclusion
Being able to control how close words are mentioned to each other via the NEAR operator enables us to achieve semantic search – tapping into sentence structure and the power of meaning in language. Instead of throwing a bunch of words together and having to sift through large volumes of irrelevant and false positive results, we can attempt to harness semantics to find people based on what they have experience DOING, not just based on what words they happen to include somewhere in their resume.
Kudos to Monster for being the only major online job board to support the NEAR operator, and props to Exalead for not only supporting NEAR, but going a step further and supporting configurable proximity via NEAR/x.
Can’t get enough of semantic search for sourcing and recruiting? My next post will cover how to achieve semantic search for sourcing and recruiting without using any proximity operators.
Stay tuned!
If you enjoyed this post, please consider to leave a comment or subscribe to the feed and get future articles delivered to your feed reader.















according to my Information Retrieval(IR) knowledge , Google use LSI classification method basis on vector space model to classified documents,so semantic search is possible