Achieving Semantic Search without Proximity Operators
How sourcers and recruiters can achieve semantic search WITHOUT using Boolean proximity search operators such as NEAR
If you’ve read these 3 posts: Semantic Search for Sourcers and Recruiters, Semantic Search for Sourcers and Recruiters Round 2, and Semantic Search using the NEAR Operator, you already know I am a fan of leveraging semantic search for sourcing and recruiting. I believe that it is critical to go beyond basic buzzword matching when creating Boolean search strings and attempt to tap into semantics – searching for words in specific contexts to find people based on what they DO, not just the buzzwords they happen to mention in their blog posts, social media profiles, or resumes.
So far, I’ve only discussed how to achieve semantic search using Boolean proximity operators, such as NEAR.
True user-defined semantic search is best achieved though proximity operators such as NEAR and w/x mainly because of the ability to control sentence structure to search for specific responsibilities and experience rather than keywords alone.
However, although everyone has access to an Internet search engine that does support the Boolean NEAR proximity operator (Exalead), not everyone has access to Applicant Tracking Systems such as Bullhorn or online job board resume databases like Monster that support proximity searching, so I am dedicating this post to explaining how sourcers and recruiters can achieve semantic search without using any proximity operators.
Traditional Keyword Search
When most sourcers and recruiters create Boolean search strings, they tend to focus solely on titles, skillsets, and technology buzzwords, like accountant, software engineer, SAP, project manager, FASB, .Net, Java, SQL, etc.
However, while this is a commonly accepted practice, searching for words like those returns many false positive results by design. A false positive result is a result that matches the search terms, but does not meet the information needs of the user. In other words, a false positive is not a relevant result.
If you’re searching the Internet, creating Boolean search strings looking only for technologies, skillsets and/or titles doesn’t guarantee anything other than results that mention those words – not necessarily PEOPLE, which if you are a sourcer or a recruiter I’m assuming you’re targeting.
If you’re searching LinkedIn, your internal resume database/ATS, or an online job board resume database and you get results of people who mention those titles and/or keywords you’re searching for – you’re certainly not guaranteed that those people actually have experience doing what you need them to.
Just because someone mentions Java, J2EE, and Weblogic in their blog, profile, or resume, it does not mean that they have the required experience with those technologies that you need. The same applies for results that mention “accountant,” ”project manager,” or any other title or technical term.
Sometimes the results you get from “traditional” keyword searching are of people who mention the terms you’re looking for, but you can’t tell if they are remotely qualified for your needs – so you can contact them to find out if they do or not, and at least network with them if in fact they don’t. Other times the results you get from “traditional” keyword searching are pure junk – results that mention the words you’re looking for but the person is obviously not qualified for anything you might be looking for now, or in the future.
While most sourcers and recruiters accept having to slog through false positives and irrelevant results as inevitable and “just they way it is,” I prefer to ask if there is anything we can do about it.
Semantic Search
A more effective way to leverage Boolean search strings for talent indentification is to include functional, or responsibility-related terms in addition to titles and skills. Functional terms can help you target and highlight what the people you’re targeting actually DO, giving you insight into their experience, capability and level of responsibility – which is what your hiring manager really cares about. Hiring managers don’t (or shouldn’t!!!) care about titles or buzzwords – what’s really going to come out on the interview is whether or not the candidate has been responsible for the types of things that the manager needs them to have been responsible for in order to perform the job they are considering hiring them for. Right?
Here are a few examples of functional/repsonsibility-related terms you could add to your Boolean search strings in an attempt to search beyond basic skill, title, and technology keywords:
Administer, design, develop, implement, integrate, configure, manage, reconcile, audit, schedule, etc.
Remember that Google auto-stems every search term you enter unless you put it in quotes or precede the search term with a +, so adding any one of those words, as appropriate, should net you all of the word variants. You could also try adding the tilde (~) to functional terms and it will look for synonyms. For example: ~Administer, ~design, ~develop, etc.
On Exalead and the major job boards, you can leverage the asterisk for root word/stemming. For example: Admin*, design*, develop*, etc.
Coupling appropriate functional/responsibility search terms with traditional title and/or technology terms to attempt to perform imprecise semantic search. I say “imprecise” because without using proximity search operators, you cannot precisely control whether or not someone mentions Java close to the words developing, developed, developer, develop, design, etc. If responsibility terms are not mentioned close to technical/skill terms, the probability isn’t as high that there is any semantic relation between the two. We won’t be able to eliminate the false positive results of people who simply mention our search terms somewhere, but don’t have enough of, any, or the right type of experience.
However, adding functional/responsibility-related search terms does have several benefits:
- Many Internet search engines automatically favor search terms that are mentioned in close proximity within results, so some search engines may return and rank highly results that do in fact happen to mention functional/responsibility related terms close to technology/skill terms and successfully yield results that have a high semantic similarity to the intent of the search
- Simply adding more search terms such as functional/responsibility-related words to your Boolean search strings gives search engines more words to search for and determine ranking and relevance. Results will have their relevance ranked by responsibility related terms in addition to standard buzzwords such as titles and technologies.
- With functional/responsibility-related terms added to your Boolean search strings, simply having those terms in the results allows you to scan the results quickly to not just look for buzzwords, but also for words describing what the people DO and have DONE.
Search Examples:
On Google:
(intitle:resume | inurl:resume) Java J2EE (SOA | SOAP | service) (~design | ~develop) -~job -~jobs
(intitle:resume | inurl:resume) UNIX (”system” | “systems”) ~administer Linux “Red Hat” (”server” | “servers”) ~design -~job -~jobs
(intitle:resume | inurl:resume) “accountant” reconcile (bank | financial) statement -~job -~jobs -sample
On resume databases and ATS’s support the asterisk for stemming:
java AND J2EE AND (SOA* OR servic*) AND (design* OR develop*)
UNIX AND system* AND admin* AND Linux AND “Red Hat” AND server* AND design*
“Accountant” AND reconcil* AND (bank* OR financial) AND statement*
Conclusion
While most sourcers and recruiters simply accept the fact that it’s “normal” to get a large false positive/irrelevant results from Boolean search strings, there are many things you can do to decrease false positive and increase the relevance of your results.
Throwing in titles, skillset and technology terms into your Boolean search strings is like playing “buzzword bingo” - results are only required to have the words you searched for in them, and just because a document mentions a specific title, Java, Oracle, SQL, UNIX, FASB, SOX, PHP, or any title/skillset/technology term that you might be looking for, it doesn’t MEAN anything other than the words are present.
Semantics is the study of meaning. The presence of search terms like Java or SOX in a document do not have any intrinsic meaning in and of themselves – most meaning comes from context. Now, if those words are mention in the context of other words, say – being responsible for doing specific things with them (e.g., designing portal applications in Java, or performing SOX audits), there IS meaning.
If you consciously decide to add functional/responsibility-related words to your searches (such as perform or design), in conjunction with titles, skillset and tecnhology terms, you increase the probability that you can, by design, return results that are more relevant to you – results of people who don’t just mention the search terms, but have been responsible for DOING the kinds of things you need them to have experience with.
Even if you are focused soley on name generation, and certainly if you are sourcing and/or recruiting for specific positions/hiring profiles, you ARE ultimately looking for people with specific experience/capability. Leverage semantics and use functional/responsibility related terms in your Boolean search strings in conjunction with more traditional keywords to increase the relevance of your searches and find the people that have the experience you need.
By the way – if you can tweak your searches and find more of the right people more quickly – you’ve just increased your productivity. But that’s a whole ‘nother post.
If you enjoyed this post, please consider to leave a comment or subscribe to the feed and get future articles delivered to your feed reader.





There is an alternative way to do semantic searching. We have a demonstration of this at http://www.truevert.com. This one is a green search engine. It has been trained on a set of green documents to understand what words mean in a green context. It could just as well be trained on a set of job requirements and resumes to understand what words mean in that context. Then when you search for Java, it will understand that you want the programming language, not the island or the coffee.
On Truevert, when you search for CFL, it gives you pages about compact fluorescent light bulbs, not the Canadian Football League. If you search for meat, it gives pages about organic meat, and the impact of raising meat on the environment.
We are licensing this technology for use in other verticals.
No tags, taxonomies, ontologies, or thesauri required.
Thanks,
Herb
Love your articles…..I tried your semantic search on exalead and it worked fabulous. However, I do not know how to start one from scratch. Do you have an entry that goes over the basics of setting up a semantic search?
Heather,
Thank you for reading and for the compliment – I aim to please!
I will plan to write a post that will cover the basics of setting up semantic searches. If you don’t see it soon enough, feel free to give me a friendly “nudge” to get moving on it. But it is in the works – thanks for the feedback and suggestion!
Good stuff as usual!