<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Boolean Black Belt &#187; Extended Boolean</title>
	<atom:link href="http://www.booleanblackbelt.com/category/extended-boolean/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.booleanblackbelt.com</link>
	<description>Leveraging social networks, resume databases, and the Internet for sourcing and recruiting</description>
	<lastBuildDate>Tue, 27 Jul 2010 14:00:08 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.3</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>SourceCon 2010: Resume Sourcing and Matching &#8211; AI vs. Humans</title>
		<link>http://www.booleanblackbelt.com/2010/03/sourcecon-2010-resume-sourcing-and-matching-ai-vs-humans/</link>
		<comments>http://www.booleanblackbelt.com/2010/03/sourcecon-2010-resume-sourcing-and-matching-ai-vs-humans/#comments</comments>
		<pubDate>Tue, 16 Mar 2010 18:15:34 +0000</pubDate>
		<dc:creator>Boolean Black Belt</dc:creator>
				<category><![CDATA[Artificial Intelligence Matching]]></category>
		<category><![CDATA[Boolean]]></category>
		<category><![CDATA[Extended Boolean]]></category>
		<category><![CDATA[Hidden Talent Pools]]></category>
		<category><![CDATA[Human Capital Data]]></category>
		<category><![CDATA[Proximity Searching]]></category>
		<category><![CDATA[Recruiting Technology]]></category>
		<category><![CDATA[Search Process]]></category>
		<category><![CDATA[Semantic Search]]></category>
		<category><![CDATA[SourceCon]]></category>
		<category><![CDATA[Sourcing Automation]]></category>
		<category><![CDATA[Talent Mining]]></category>
		<category><![CDATA[2010]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Glen Cathey]]></category>
		<category><![CDATA[Intelligent Search and Match]]></category>
		<category><![CDATA[Keynote]]></category>
		<category><![CDATA[Recruiting]]></category>
		<category><![CDATA[Resume Matching]]></category>
		<category><![CDATA[Resume Sourcing]]></category>
		<category><![CDATA[Sourcing]]></category>
		<category><![CDATA[SoureceCon]]></category>

		<guid isPermaLink="false">http://www.booleanblackbelt.com/?p=5093</guid>
		<description><![CDATA[Here is the expanded slide deck from my SourceCon 2010 Keynote: Resume Sourcing and Matching &#8211; Artificial Intelligence vs. Human Cognition. It contains all of the talking points as text so you are not left guessing as to what I spoke to during the live presentation.  
You&#8217;ll learn about the intrinsic and often overlooked challenges [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: left; margin-right: 10px;"><a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fwww.booleanblackbelt.com%2F2010%2F03%2Fsourcecon-2010-resume-sourcing-and-matching-ai-vs-humans%2F"><img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fwww.booleanblackbelt.com%2F2010%2F03%2Fsourcecon-2010-resume-sourcing-and-matching-ai-vs-humans%2F" height="61" width="51" /></a></div><p>Here is the expanded slide deck from my SourceCon 2010 Keynote: Resume Sourcing and Matching &#8211; Artificial Intelligence vs. Human Cognition. It contains all of the talking points as text so you are not left guessing as to what I spoke to during the live presentation. <img src='http://www.booleanblackbelt.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p>You&#8217;ll learn about the intrinsic and often overlooked challenges associated with sourcing resumes (it&#8217;s deceptively complex), what artificially intelligent semantic search and match applications claim to do and how they actually work, the limits of artificial intelligence, what people can do that semantic search applications cannot, the 5 levels of semantic search,  the 5 levels of talent mining, and what I think is the ideal candidate sourcing solution.</p>
<div id="__ss_3447353" style="width: 425px;"><strong style="display:block;margin:12px 0 4px"><a title="SourceCon 2010: Resume Sourcing and Matching: Artificial Intelligence vs. Human Cognition" href="http://www.slideshare.net/glencathey/sourcecon-2010-resume-sourcing-and-matching-artificial-intelligence-vs-human-cognition-3447353">SourceCon 2010: Resume Sourcing and Matching: Artificial Intelligence vs. Human Cognition</a></strong><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="355" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=sourceconpresentationfullv5forslideshare-100316124352-phpapp01&amp;rel=0&amp;stripped_title=sourcecon-2010-resume-sourcing-and-matching-artificial-intelligence-vs-human-cognition-3447353" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="425" height="355" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=sourceconpresentationfullv5forslideshare-100316124352-phpapp01&amp;rel=0&amp;stripped_title=sourcecon-2010-resume-sourcing-and-matching-artificial-intelligence-vs-human-cognition-3447353" allowscriptaccess="always" allowfullscreen="true"></embed></object></div>
<div style="padding:5px 0 12px">View more <a href="http://www.slideshare.net/">presentations</a> from <a href="http://www.slideshare.net/glencathey">Glen Cathey</a>.</div>
<div style="padding:5px 0 12px">Additionally, you can view the video from the SourceCon event <a class="wp-caption-dd" title="Video of SourceCon 2010 Keynote: Resume Sourcing and Matching - Artificial Intelligence vs. Human Cognition" href="http://www.sourcecon.com/2010/session-descriptions/#session-85" target="_self">here</a>.</div>
]]></content:encoded>
			<wfw:commentRss>http://www.booleanblackbelt.com/2010/03/sourcecon-2010-resume-sourcing-and-matching-ai-vs-humans/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>LinkedIn Search: What it COULD and SHOULD be</title>
		<link>http://www.booleanblackbelt.com/2009/07/linkedin-search-what-it-could-and-should-be/</link>
		<comments>http://www.booleanblackbelt.com/2009/07/linkedin-search-what-it-could-and-should-be/#comments</comments>
		<pubDate>Mon, 06 Jul 2009 12:00:18 +0000</pubDate>
		<dc:creator>Boolean Black Belt</dc:creator>
				<category><![CDATA[Extended Boolean]]></category>
		<category><![CDATA[LinkedIn]]></category>
		<category><![CDATA[Boosting]]></category>
		<category><![CDATA[LinkedIn Search]]></category>
		<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Proximity]]></category>
		<category><![CDATA[Weighting]]></category>

		<guid isPermaLink="false">http://www.booleanblackbelt.com/?p=3143</guid>
		<description><![CDATA[Did you know that LinkedIn currently has the ability to deliver incredibly powerful search functionality to its users - WELL beyond what we all have access to now?  What am I talking about? 
I&#8217;m excited to tell you, but quite honestly, I actually can&#8217;t believe it&#8217;s taken me this long to put 2 and 2 together. Have you [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: left; margin-right: 10px;"><a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fwww.booleanblackbelt.com%2F2009%2F07%2Flinkedin-search-what-it-could-and-should-be%2F"><img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fwww.booleanblackbelt.com%2F2009%2F07%2Flinkedin-search-what-it-could-and-should-be%2F" height="61" width="51" /></a></div><p><a href="http://www.booleanblackbelt.com/wp-content/uploads/2009/07/linkedinpoweredbylucene.png"></a><a href="http://www.booleanblackbelt.com/wp-content/uploads/2009/07/linkedinpoweredbylucene1.png"></a><a href="http://www.booleanblackbelt.com/wp-content/uploads/2009/07/linkedinpoweredbylucene2.png"><img class="alignright size-full wp-image-3188" title="linkedinpoweredbylucene2" src="http://www.booleanblackbelt.com/wp-content/uploads/2009/07/linkedinpoweredbylucene2.png" alt="" width="134" height="90" /></a>Did you know that LinkedIn currently has the ability to deliver incredibly powerful search functionality to its users - WELL beyond what we all have access to now?  What am I talking about? </p>
<p>I&#8217;m excited to tell you, but quite honestly, I actually can&#8217;t believe it&#8217;s taken me this long to put 2 and 2 together. Have you ever <strong><em>really</em></strong> watched the video clip below that you can find on  <a class="wp-caption-dd" title="Video of LinkedIn's Next Gen Search Functionality" href="http://learn.linkedin.com/linkedin-search/#advanced_people_search" target="_self">LinkedIn&#8217;s Learning Center</a>  as well as on YouTube? </p>
<p>If you ignore the information regarding the new features and pay close attention to the video, you can hear Esteban talk about how LinkedIn is always on the lookout for talented <a class="wp-caption-dd" title="Lucene" href="http://lucene.apache.org/java/docs/" target="_self">Lucene</a> Open Source engineers and watch him search for them. Lucene is an open source text search engine that I&#8217;ve written about in multiple posts for its advanced search functionality, including <a class="wp-caption-dd" title="Extended Boolean: Proximity and Weighting" href="http://www.booleanblackbelt.com/2008/11/extended-boolean-proximity-and-weighting/" target="_self">extended Boolean</a>.</p>
<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="445" height="364" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://www.youtube.com/v/U_mAJ-Jg534&amp;hl=en&amp;fs=1&amp;rel=0&amp;border=1" /><embed type="application/x-shockwave-flash" width="445" height="364" src="http://www.youtube.com/v/U_mAJ-Jg534&amp;hl=en&amp;fs=1&amp;rel=0&amp;border=1" allowfullscreen="true" allowscriptaccess="always"></embed></object></p>
<h3>LinkedIn uses Lucene as their Text Search Engine</h3>
<p>When I first watched the video, I never gave the Lucene stuff a second thought because LinkedIn doesn&#8217;t actually offer any of Lucene&#8217;s truly advanced search functionality &#8211; LinkedIn doesn&#8217;t even support root-word/wildcard searching, let alone extended Boolean search. I figured if they were already using Lucene for their text search engine they would offer all of Lucene&#8217;s search functionality, which they don&#8217;t.</p>
<p>Then I watched the video again the other day (not exactly sure why) and I it made me curious. Had they already implemented Lucene, or were they looking to do so? I did some research to see if I could confirm a link between LinkedIn with Lucene (pun intended).  Although <a class="wp-caption-dd" title="TechCrunch fail" href="http://www.techcrunch.com/2008/11/24/linkedin-launches-streamlined-people-search/" target="_self">TechCrunch reported that LinkedIn upgraded its people search</a>, they failed to mention the technology behind the upgrade. I was then able to dig up <a class="wp-caption-dd" title="CNet article confirming Lucene as LinkedIn's text search engine" href="http://news.cnet.com/8301-13505_3-10107745-16.html" target="_self">an article that verified that LinkedIn had implemented Lucene as their text search engine</a>.</p>
<h3>So What Can LinkedIn Do With Lucene?</h3>
<p>I&#8217;m glad you asked &#8211; be prepared to be amazed! <span id="more-3143"></span></p>
<h4>Wildcard Searches</h4>
<p>Lucene supports single and multiple character wildcard searches within single terms. That means you could search for the term develop* and LinkedIn would return results of people who mention every word that begins with the root of &#8220;develop:&#8221; develop, developed, developing, developer, develops, etc. That would mean no more having to type out long OR statements where you have to think about all of the different ways a particular term can be written.</p>
<h4>Proximity Search </h4>
<p>Lucene supports configurable proximity search &#8211; or the ability to find words that are a within a specific distance from each other (3 words, 8 words, your choice). For example, if you wanted to find people who mention that they have experience configuring routers, you can use Lucene&#8217;s proximity search functionality via the tilde symbol (~) to target phrases where some mention of config* is made within 5 words of router or routers.  </p>
<p>&#8220;config* rout*&#8221;~5</p>
<p>This functionality is HUGE, as it allows sourcers and recruiters to drastically increase the relevance of search results by targeting people based on their responsibilities rather than basic keyword search (aka buzzword bingo). Without forcing some variant of the word &#8221;configure&#8221; to be within 5 words of &#8220;router&#8221; or &#8220;routers,&#8221; you can just as earily return results of people who do not mention that they have been specifically responsible for configuring routers &#8211; you could end up finding people who mention that they&#8217;ve configured other things (e.g. servers), and who make 1 mention of the word &#8220;router&#8221; in their skill summary because they have a router at home (but no paid professional experience). That would be what I call a false positive hit. The result mentioned the search terms, but it did not match the <strong><em>intent</em></strong> of my search &#8211; which is to find people who have been responsible for configuring routers.</p>
<p>When I talk about targeting people based on their responsibilities, I mean searching for responsibility verbs (administer, manage, develop, design, configure, filing, reconcile, audit, etc.) mentioned in close proximity (in the same sentence) to skill/technology nouns (oracle, statements, servers, projects, reports, Microsoft Dynamics, SAP, etc.). Being able to control how close words like those are in proximity to each other &#8211; down to the sentence level &#8211; allows sourcers and recruiters to perform semantic search (aka, natural language search). Essentially, you are able to find people based on what they DO, not just the words they happen to mention in their profile. </p>
<p>If you&#8217;re new to the concept of semantic search, I strongly suggest you read these articles (<a class="wp-caption-dd" title="Semantic Search for Sourcers and Recruiters 1" href="http://www.booleanblackbelt.com/2008/12/semantic-search-for-sourcers-and-recruiters/" target="_self">Semantic Search 1</a>, <a class="wp-caption-dd" title="Semantic Search for Sourcers and Recruiters 2" href="http://www.booleanblackbelt.com/2008/12/semantic-search-for-sourcers-and-recruiters-round-2/" target="_self">Semantic Search 2</a>, <a class="wp-caption-dd" title="Semantic Search using the NEAR Boolean Operator" href="http://www.booleanblackbelt.com/2009/01/semantic-search-using-the-near-boolean-operator/" target="_self">Semantic Search with Proximity</a>, <a class="wp-caption-dd" title="Semantic Search can be acheived without proximity operators" href="http://www.booleanblackbelt.com/2009/01/achieving-semantic-search-without-proximity-operators/" target="_self">Semantic Search without Proximity</a>) that will throughly explain the concept as well as show you how can currently leverage proximity search to your advantage on Monster and <a class="wp-caption-dd" title="Exalead Internet search engine" href="http://www.exalead.com/search/" target="_self">Exalead</a>.</p>
<h4>Variable Term Weighting </h4>
<p>Here&#8217;s the other biggie &#8211; Lucene allows you to control the the relevance weighting of your search terms. Lucene calls it &#8220;boosting.&#8221; In other words &#8211; you can tell Lucene that specific terms in your search string are more important/relevant to you than others. That&#8217;s right &#8211; instead of the search engine taking all of your search terms and &#8220;deciding&#8221; which results are the most relevant, YOU control the search relevance based on which terms you think are more critical and match the intent of what you&#8217;re specifically looking for.</p>
<p>To boost a term with Lucene you can use the caret (^) symbol with a boost factor (a number) at the end of the term you are searching. The higher the boost factor, the more relevant the term will be, so boosting allows you to control the relevance of your results by boosting specific terms.</p>
<p>For example, if you are searching for the following terms: Unix, Windows, Citrix, VMware, storage, and you really needed people who had significant Citrix experience, you can boost that term with the ^symbol:</p>
<p>Unix AND Windows AND Citrix^5 AND VMware AND storage</p>
<p>This will make profiles with more mentions of the term Citrix to appear more relevant and thus be higher in the search results ranking.  This is important, because people who have a lot of experience with Citrix (in terms of specific responsibilities and/or mulitple positions in their career history in which they use Citrix) will likely have multiple mentions of Citrix in their profile. Boosting Citrix will result in bubbling all of the profiles with many mentions of Citrix to the top of the results.</p>
<p>This is especially critical because without the ability to &#8220;tell&#8221; the search engine with specific terms are actually most relevant to you, the search engine makes its own &#8220;decision&#8221; as to what&#8217;s relevant. And in the case of my example &#8211; the search engine may see profiles who mention the word Windows 20 times in their profile as highly relevant, even if they only mention Citrix once &#8211; which isn&#8217;t likely to actually be someone who matches my need of a strong Citrix professional.</p>
<p>In addition to boosting single terms, you can also boost phrases. By default, the boost factor is 1. Although the boost factor must be positive, it can be less than 1 (e.g. 0.2).</p>
<h4>More Lucene Search Functionality</h4>
<p>Lucene also supports fuzzy searching (finding matches of misspellings and similar words) based on the <a class="wp-caption-dd" title="What the heck is the Levenshtein Distance?" href="http://en.wikipedia.org/wiki/Levenshtein_distance" target="_self">Levenshtein Distance</a>, and range searches (similar to Google&#8217;s <a class="wp-caption-dd" title="Range searching on Google" href="http://www.googleguide.com/number_range.html" target="_self">numrange search</a>). To learn more, <a class="wp-caption-dd" title="Lucene's search functionality" href="http://lucene.apache.org/java/2_4_1/queryparsersyntax.html" target="_self">here is a page that lists all of Lucene&#8217;s search functionality</a>. </p>
<div><a href="http://news.cnet.com/8301-13505_3-10107745-16.html"></a></div>
<p><a href="http://lucene.apache.org/java/2_4_1/queryparsersyntax.html"></a></p>
<h3>Conclusion</h3>
<p>Now that you know that LinkedIn uses Lucene as their text search engine and you&#8217;ve seen all of the powerful search functionality Lucene has to offer &#8211; wouldn&#8217;t you like to be able to use wildcard searching, proximity search, term weighting, and fuzzy search when searching LinkedIn? I know I do! I currently have access to an ATS that uses a text search engine similar to Lucene that supports configurable proximity and variable term weighting, and I can tell you that these features make a HUGE difference in the relevance of search results.</p>
<p>I&#8217;m still trying to figure out why LinkedIn doesn&#8217;t offer users all of Lucene&#8217;s search functionality as they&#8217;ve been using Lucene as their text search engine for at least 7 months now. </p>
<p>I&#8217;ve tried to communicate my search improvement suggestions to LinkedIn a couple of different ways. In June I sent message to <a class="wp-caption-dd" title="Esteban Kozak" href="http://www.linkedin.com/in/estebankozak" target="_self">Esteban Kozak</a> - Senior Product Manager overseeing search at LinkedIn - via LinkedIn (of course) that detailed all of my suggestions for improving LinkedIn&#8217;s search functionality, including wildcard search, proximity search, and term weighting &#8211; and I haven&#8217;t received a response. </p>
<p>I also caught <a class="wp-caption-dd" title="William Uranga on Twitter" href="http://twitter.com/williamU" target="_self">William Uranga</a> Tweeting from a LinkedIn customer advisory session last week, so I DM&#8217;d him and let him know I had a list of search recommendations and he kindly let me send them to him via email so he could share them during the session at LinkedIn. William wrote a post about his customer advisory session experience at LinkedIn &#8211; <a class="wp-caption-dd" title="What LinkedIn Always Knew by William Uranga" href="http://williamu.wordpress.com/2009/07/01/what-linkedin-always-knew/" target="_self">you can read it here</a>. </p>
<p>We can only hope that sometime in the near future LinkedIn taps into the awesome search power of Lucene, enabling users to take control of search relevance and tap into semantic search. I know I&#8217;ve got my fingers crossed!</p>
<h3>Update!</h3>
<p>Esteban Kozak replied to my message with this helpful response:</p>
<p>1- Prefix matching: we are currently evaluating the release of prefix matching for names in order to enable a quick way to navigate your contacts from the mobile application. Prefix matching for free text queries is very expensive because the query needs to be translated into a huge OR statement in the back end. There are better ways to solve this problem more elegantly. We are investigating alternative approaches like stemming, automatic expansion at query time and other techniques to ensure good recall.</p>
<p>2- Proximity search / Term weighting: These two are much easier to open up and will be available shortly.</p>
<p>Also &#8211; be sure not to miss LinkedIn Principle Search Engineer <a class="wp-caption-dd" title="Jake Mannix's LinkedIn profile" href="http://www.linkedin.com/in/jakemannix" target="_self">Jake Mannix&#8217;s</a> thorough and detailed comments below.</p>
<p>It appears we have much to look forward to with regard to LinkedIn search functionality!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.booleanblackbelt.com/2009/07/linkedin-search-what-it-could-and-should-be/feed/</wfw:commentRss>
		<slash:comments>18</slash:comments>
		</item>
		<item>
		<title>Semantic Search for Sourcers and Recruiters</title>
		<link>http://www.booleanblackbelt.com/2008/12/semantic-search-for-sourcers-and-recruiters/</link>
		<comments>http://www.booleanblackbelt.com/2008/12/semantic-search-for-sourcers-and-recruiters/#comments</comments>
		<pubDate>Mon, 29 Dec 2008 17:12:08 +0000</pubDate>
		<dc:creator>Boolean Black Belt</dc:creator>
				<category><![CDATA[Extended Boolean]]></category>
		<category><![CDATA[Semantic Search]]></category>
		<category><![CDATA[Recruiting]]></category>
		<category><![CDATA[Sourcing]]></category>

		<guid isPermaLink="false">http://www.booleanblackbelt.com/?p=773</guid>
		<description><![CDATA[On Tuesday, December 23, 2008 I wrote a &#8220;Tutorial Tuesday&#8221; post on www.recruitingblogs.com titled, &#8220;What is Semantic Search?,&#8221; explaining the concepts of semantic search with regard to how it can be leveraged effectively by sourcers and recruiters. Now, I am not sure if the fact that it was posted the day before Christmas eve had anything to do [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: left; margin-right: 10px;"><a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fwww.booleanblackbelt.com%2F2008%2F12%2Fsemantic-search-for-sourcers-and-recruiters%2F"><img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fwww.booleanblackbelt.com%2F2008%2F12%2Fsemantic-search-for-sourcers-and-recruiters%2F" height="61" width="51" /></a></div><p><a href="http://www.booleanblackbelt.com/wp-content/uploads/2008/12/search-google-paco-calvino1.jpg"><img class="alignright size-medium wp-image-838" title="search-google-paco-calvino1" src="http://www.booleanblackbelt.com/wp-content/uploads/2008/12/search-google-paco-calvino1-300x300.jpg" alt="" width="300" height="300" /></a>On Tuesday, December 23, 2008 I wrote a &#8220;Tutorial Tuesday&#8221; post on <a href="http://www.recruitingblogs.com">www.recruitingblogs.com</a> titled, &#8220;<a class="wp-caption-dd" title="RecruitingBlogs post on &quot;What is Semantic Search?&quot;" href="http://www.recruitingblogs.com/forum/topics/tutorial-tuesday-what-is" target="_blank">What is Semantic Search?,&#8221; </a>explaining the concepts of semantic search with regard to how it can be leveraged effectively by sourcers and recruiters. Now, I am not sure if the fact that it was posted the day before Christmas eve had anything to do with it, but I only received 2 comments on the post so I am not exactly sure how many people actually had the chance to read it.  As such - I am posting it here in its entirety because while semantic search is not well known or understood, it can be powerfully applied to sourcing and recruiting efforts.</p>
<p>When I talk about semantic search, I don&#8217;t mean the <a href="http://en.wikipedia.org/wiki/Semantic_Web">semantic web </a>(which I still think is a LONG way off), I am referring to user-generated and defined semantic search. In other words, a sourcer or recruiter creating Boolean search strings that go beyond simply trying to match the words themselves and attempting to delve into the <em>meaning</em> implied by the words.</p>
<p>In linguistics, semantics refers to the study of meaning, as inherent at the levels of words, phrases, and sentences.</p>
<p>The vast majority of sourcers and recruiters create Boolean search strings that simply return a collection of words &#8211; words that do not have any associative meaning and that are not guaranteed to be relevant with regard to the intent of the search. Relevance can be defined as the extent to which a search result matches the information need based on the intent of the person executing the search. Highly &#8220;relevant&#8221; results = results that match exactly what the searcher is looking for.<span id="more-773"></span></p>
<p>Most sourcers and recruiters are actually trying to find people that have specific skills and experience. Just because certain words appear in a person&#8217;s resume or profile &#8211; it does not mean that the person has been primarily responsible for working with those words (typically skills, technologies, etc.). For example, a knowledgeable sourcer or recruiter knows that documents with the word “account&#8221; mentioned close to the word “executive” will often have a different meaning and relevance than documents that simply mention the words “account&#8221; and &#8220;executive&#8221; located anywhere within them.</p>
<p>This is the critical difference between the semantic similarity between a search and its results vs. the lexical similarity between a search and its results. In other words &#8211; when the search results match the intended MEANING of the search, there is a semantic similarity between the search and its results. When search results simply match the search terms but not the intended meaning of the search, there is a lexical similarity (the words match) between the search and its results.</p>
<p>Semantic search can best be achieved through the use of search interfaces and engines that support proximity searching. Proximity search functionality allows a sourcer or recruiter to control how close specific words are mentioned in relation to other words.</p>
<p>When you are able to control the proximity of words to each other, you can take advantage of linguistics and sentence structure to look for verbs mentioned in close proximity to nouns, which can imply taking action. If a resume mentions (configure OR configured OR configuration) &#8211; which are verbs &#8211; in close proximity to (router OR routers) &#8211; which are nouns &#8211; and within the same sentence, it is highly likely that the writer is talking about being responsible for configuring routers.</p>
<p>A sourcer or recruiter should not be satisfied to merely scan and read resumes of people who simply mention the words &#8220;configure&#8221; and &#8220;routers&#8221; somewhere in the resume &#8211; there are many people who can mention those words somewhere in their resume who have never been specifically responsible for configuring routers. The issue is that just because these words are found in a resume &#8211; the presence of the words themselves does not MEAN anything with regard to what the candidate has specifically been responsible for.</p>
<p>With the appropriate search interface/engine, sourcers and recruiters can craft semantic searches to find people who not only mention specific words such as &#8220;configure&#8221; and &#8220;routers&#8221;, but who have actually had experience configuring routers. Being able to control the proximity of words can enable recruiters to quickly get more results that are semantically relevant to what the recruiter is actually trying to find.</p>
<p>There are 3 main types of proximity searching &#8211; I will focus on what I think are the two most powerful &#8211; fixed proximity search and configurable proximity search.</p>
<p>Fixed proximity search functionality such as the &#8220;extended Boolean&#8221; NEAR operator enables users to search for words or phrases that are mentioned close to other specific words or phrases. The range of the NEAR operator is fixed, typically at 1-10 words.</p>
<p>Did you know that Monster supports the NEAR operator? Many people aren&#8217;t aware of this &#8211; but it&#8217;s the only major job board resume database that I am aware of to do so. Kudos to Monster! It is unfortunate that there are very few people who even know about the NEAR operator, and even fewer still who know how to utilize it to achieve semantic search.</p>
<p>Among Internet search engines &#8211; Google, Yahoo, Live, and Ask do not support proximity searching of any kind &#8211; only Exalead does, to my knowledge. As for Applicant Tracking Systems, I am aware that Bullhorn has integrated <a href="http://lucene.apache.org/java/docs/">Lucene</a>, a free and open source text search engine that suppports configurable proximity, into their search interface</p>
<p>Configurable proximity search goes one step further than fixed proximity, allowing a sourcer or recruiter to precisely control the maximum distance between specific search terms and to return even more relevant results than the NEAR operator. This is because the NEAR operator’s maximum range of 10 words can allow for some non-relevant results to be returned. The farther words are mentioned apart from each other, the less likely it is that they are semantically related. In fact, when two search terms are separated by 10 words, each could be mentioned in separate bullet points or sentences on a resume and be completely unrelated.</p>
<p>However, with configurable proximity, a sourcer or recruiter can choose the maximum distance between search terms. Although search engines supporting configurable proximity vary with their exact syntax, here is an example of a search looking for someone who has been responsible for administering Exchange servers: Windows AND Exchange w/5 admin* AND server*. That search can ONLY return results of resumes or profiles that mention Exchange within 5 words of any word starting with the root of admin (administrator, administration, administer, administered, etc.), regardless of order. A maximum distance of 5 words will dramatically increase the semantic similarity between the search&#8217;s intent and the search results because mentioning those 2 search terms at such a close range makes it more likely that they are mentioned in the same bullet point or sentence and thus more likely to be semantically related. Essentially, this search will only return results of people who specifically mention something about being responsible for administering Exchange in their resume.</p>
<p>Many sourcers and recruiters employing basic search tactics and strategies may unfortunately be simply throwing a bunch of keywords in a search &#8211; and as a result, end up reviewing large volumes of irrelevant results that simply match the search terms they entered (lexical match) in order to &#8220;get lucky&#8221; to find the few results buried among them that are relevant to what they are seeking. This is a huge time drain, is inefficient, and is low yield.</p>
<p>Experts at talent mining seek to craft Boolean search strings designed to reduce irrelevant &#8220;false positive&#8221; results, eliminating those of people who simply mention the words they are searching for somewhere in their resumes or profiles, and go beyond the simple lexical match to achieve semantic search &#8211; finding people whose experience and skills match the essence of their search.</p>
<p>If you don&#8217;t already take advantage of the power of semantic search to quickly find more relevant results when creating your Boolean search strings, now is the perfect time to set it as a resolution for 2009. Make it a goal to move beyond simple buzzword matching and create Boolean searches that target people more based on what they DO, rather than just the words they use in their resume.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.booleanblackbelt.com/2008/12/semantic-search-for-sourcers-and-recruiters/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
		<item>
		<title>Extended Boolean: Proximity and Weighting</title>
		<link>http://www.booleanblackbelt.com/2008/11/extended-boolean-proximity-and-weighting/</link>
		<comments>http://www.booleanblackbelt.com/2008/11/extended-boolean-proximity-and-weighting/#comments</comments>
		<pubDate>Mon, 10 Nov 2008 13:55:16 +0000</pubDate>
		<dc:creator>Boolean Black Belt</dc:creator>
				<category><![CDATA[Extended Boolean]]></category>
		<category><![CDATA[Semantic Search]]></category>
		<category><![CDATA[Boolean]]></category>
		<category><![CDATA[Boolean NEAR Operator]]></category>
		<category><![CDATA[Buzzword matching]]></category>
		<category><![CDATA[NEAR Operator]]></category>
		<category><![CDATA[Proximity]]></category>
		<category><![CDATA[Recruiting]]></category>
		<category><![CDATA[Relevance]]></category>
		<category><![CDATA[Sourcing]]></category>
		<category><![CDATA[Weighting]]></category>

		<guid isPermaLink="false">http://www.booleanblackbelt.com/?p=327</guid>
		<description><![CDATA[Most sourcing, recruiting, and staffing professionals are familiar with the “standard” Boolean operators of AND, OR, and NOT. However, I have found that few are familiar with “extended” Boolean functionality, such as proximity (or adjacency) and term weighting.
Beyond Basic Boolean
Extended Boolean offers sourcers and recruiters significantly more control, power and precision when executing searches, and [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: left; margin-right: 10px;"><a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fwww.booleanblackbelt.com%2F2008%2F11%2Fextended-boolean-proximity-and-weighting%2F"><img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fwww.booleanblackbelt.com%2F2008%2F11%2Fextended-boolean-proximity-and-weighting%2F" height="61" width="51" /></a></div><p><a href="http://www.booleanblackbelt.com/wp-content/uploads/2008/11/boolean-word-scramble-by-kipbot.png"><img class="size-medium wp-image-344 alignright" title="boolean-word-scramble-by-kipbot" src="http://www.booleanblackbelt.com/wp-content/uploads/2008/11/boolean-word-scramble-by-kipbot-300x89.png" alt="" width="300" height="89" /></a>Most sourcing, recruiting, and staffing professionals are familiar with the “standard” Boolean operators of AND, OR, and NOT. However, I have found that few are familiar with “extended” Boolean functionality, such as proximity (or adjacency) and term weighting.</p>
<h3>Beyond Basic Boolean</h3>
<p>Extended Boolean offers sourcers and recruiters significantly more control, power and precision when executing searches, and in the hands of an expert – extended Boolean can enable semantic search. Semantic search uses the science of meaning in language to produce highly relevant search results rather than have a user sort through a list of loosely related keyword results.</p>
<h3>Relevance is Key</h3>
<p>Ultimately, any sourcing or recruiting professional knows that what’s most critical in running Boolean searches on the Internet, a job board, or in an internal resume database, is getting relevant results. According to Wikipedia, “<a class="wp-caption-dd" title="Definition of relevance on Wikipedia" href="http://en.wikipedia.org/wiki/Relevance_(information_retrieval)" target="_blank">relevance</a>” denotes how well a retrieved set of documents (or a single document) meets the information need of the user.</p>
<p>For sourcing and recruiting, relevant results are typically defined as resumes or profiles of (or information about) potential candidates whose experience and capabilities closely match the hiring profile or job opening that the sourcer or recruiter is trying to find candidates for.</p>
<p>I’d argue that the value of any source of information (resume database, the Internet, etc.) lies less in the information contained within, and more in the ability of a user to extract out precisely and completely what the user needs – finding and retrieving any and all appropriately qualified candidates. Information has no value to you if you are unable to find it and take action on it.</p>
<p>So how can extended Boolean help sourcers and recruiters find more relevant results? Let’s take a look at term weighting first. <span id="more-327"></span></p>
<h3>Variable Term Weighting</h3>
<p>Talented sourcers and recruiters know that not all terms are equally important in a query. In most queries and searches, certain search terms are more important than others. When running standard Boolean queries, all search terms are considered/weighted equally. Unfortunately, many search engines and database search interfaces simply assign relevance to results by the number of search term “hits” in each document. In most cases, the simple frequency of search terms does not correlate to relevant results. This is where the derisive description “buzzword matching” comes from, most often used to denote that there is little skill involved in running Boolean searches counting matched keywords.</p>
<p>Using an Information technology hiring profile as an example – if a sourcer was looking for candidates who have significant experience administering Windows servers and Exchange email servers they might create a simple Boolean query such as this: Windows AND Exchange AND server* and admin*. That search is highly likely to return and rank candidates who are Windows systems administrators who mention Windows many times in their resume/profile and happen to mention Exchange once or twice as highly relevant because of the number of “hits” for Windows – which is by nature a very common term in resumes. This would leave the sourcer with having to sort through a large volume of results to find the candidates who actually have been primarily responsible for administering Exchange servers as well as Windows servers.</p>
<p>Search engines that offer users the ability to assign different weights to each search term enable sourcers and recruiters to move beyond simple buzzword matching and take control of the relevance of the results. Essentially, with variable term weighting you can assign a number value to words to increase their weight when ranking retrieved documents – which does not change the TOTAL number of results, but the ORDER of the results.</p>
<p>Using the same example as above, a sourcer using a search engine that supports variable term weighting could create a Boolean search string such as this: Windows AND Exchange:30 AND server* and admin*. That Boolean query will pull the same number of results as the first search that had no term weighting – however, it will sort and rank the results heavily favoring resumes/profiles that mention Exchange more often in relation to the other search terms, increasing the likelihood that the sourcer can quickly identify candidates who have had experience being responsible for administering and supporting Exchange servers. By employing variable term weighting, the sourcer has increased the relevance of the results.</p>
<p>Now, let’s take a look at proximity functionality:</p>
<h3>Proximity</h3>
<p>Proximity search functionality enables a user to search for specific terms that are mentioned close to other specific terms. An adept sourcer or recruiter knows that documents with the word “computer” mentioned close to the word “science” will often have a different meaning and relevance than documents that simply mention the words “computer” and “science” anywhere within them.</p>
<p>There are 3 main types of proximity searching: fixed proximity, variable proximity, and adjacency. For the purposes of this post – I will focus only on fixed and variable proximity.</p>
<h3>Fixed Proximity</h3>
<p>Fixed proximity is most commonly represented by the NEAR operator. The search engines that do recognize and support the NEAR operator typically define NEAR proximity as within 1 to 10-16 words (specific search engines can differ – check their documentation).</p>
<p>Using the example of a Windows and Exchange administrator, a sourcer could craft this search using the NEAR operator: Windows and Exchange NEAR admin* and server*. That search will ONLY return results of resumes/profiles that mention Exchange within 1 to 16 words of any word starting with the root of admin (administrator, administration, administer, administered, etc.). Being able to control the fact that Exchange MUST be mentioned within close proximity to admin* will dramatically affect and improve the relevance of the search results, typically returning results of candidates who either have a title using both terms and/or candidates that talk about being responsible for Exchange administration.</p>
<div>Here are some examples taken from actual resumes that demonstrate the variety of relevant results that can be retrieved with the above search:</div>
<ul>
<li>Managed &amp; administered more than 300 Exchange Servers</li>
<li>Provisioned &amp; administer multiple Exchange 5.5/2003 servers</li>
<li>Not only are there administration duties for Exchange and Blackberry&#8230;</li>
<li>Exchange/RightFax administrator</li>
<li>Installing, Configuring, and Administering Microsoft Exchange 2000 Server</li>
<li>Administer a Microsoft Exchange 2003/2007 environment</li>
<li>8+ years of expertise as a System Administrator in Windows 2003 family, Windows 2000 family, MS Exchange 5.5, MS Exchange 2000, and Exchange 2003</li>
<li>I am proficient with the following skills; planning, installation and administration of Windows Active Directory, Windows Servers, Exchange Server</li>
<li>Windows Server Support, Active Directory,Exchange Server 2000, 2003 administration and Blackberry Server administration</li>
<li>Administer Exchange 2003 Server for corporate email</li>
</ul>
<p>As you can see, being able to control the proximity of specific search terms essentially increases the likelihood of returning results of candidates who have had administrative responsibility for Exchange servers, effectively increasing the relevance of the results.</p>
<h3>Fun fact:</h3>
<ul>
<li> Did you know that Monster and Exalead support the NEAR operator?</li>
</ul>
<h3>Configurable Proximity</h3>
<p>A search engine that supports configurable proximity affords users the ability to precisely control the distance between specific search terms. This can produce even more relevant results than the NEAR operator, because the NEAR operator’s maximum range of 10-16 can allow for some non-relevant results to be returned. The farther words are mentioned apart from each other, the less likely it is that they are semantically related. In fact, at 10-16 words, each could be mentioned in separate bullet points or sentences on a resume and be completely unrelated.</p>
<p>However, with configurable proximity, a sourcer or recruiter can choose the maximum distance between search terms. Although search engines vary with their exact syntax, here is an example of the Windows and Exchange admin search using configurable proximity: Windows and Exchange w/5 admin* and server*. That search can ONLY return results of resumes or profiles that mention Exchange within 5 words of any word starting with the root of admin (administrator, administration, administer, administered, etc.), regardless of order. A maximum distance of 5 words will dramatically increase the relevance of the search results because mentioning those 2 search terms at such a close range makes it more likely that they are mentioned in the same bullet point or sentence and thus more likely to be semantically related. Essentially, this search will only return results of people who specifically mention something about being responsible for administering Exchange at least once in their resume. By employing this kind of search, a sourcer is actually performing a semantic search, as they are looking specifically for people who talk about having a particular responsibility – not just looking for documents that contain words.</p>
<h3>Fun facts:</h3>
<ul>
<li>Did you know that <a class="wp-caption-dd" title="Exalead search" href="http://www.exalead.com/search" target="_blank">Exalead</a> supports configurable proximity searching?</li>
<li>Did you know that you can integrate a free, open source search engine that supports configurable proximity and variable term weighting into your ATS or resume database? Check out <a class="wp-caption" title="Lucene open source search engine" href="http://lucene.apache.org/java/docs/" target="_blank">Lucene</a>.</li>
</ul>
<h3>Conclusion</h3>
<p>Hopefully you can see how being able to control the proximity of two search terms can yield results that are FAR more relevant than results that simply mention the two terms anywhere in a document or form – this is the critical difference between the semantic similarity between a search and its results vs. the lexical similarity between a search and its results.</p>
<p>There are countless ways you can apply extended Boolean functionality such as variable term weighting and proximity searching to nearly any industry/hiring profile to create searches that return highly relevant results - results that are more relevant than those that can be acheived with standard Boolean logic. Using a search engine that supports both variable proximity and variable term weighting can empower sourcers and recruiters to quickly find large volumes of highly relevant results, increasing productivity and achieving JIT Talent identification and acquisition.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.booleanblackbelt.com/2008/11/extended-boolean-proximity-and-weighting/feed/</wfw:commentRss>
		<slash:comments>14</slash:comments>
		</item>
	</channel>
</rss>
