<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: LinkedIn Search: What it COULD and SHOULD be</title>
	<atom:link href="http://www.booleanblackbelt.com/2009/07/linkedin-search-what-it-could-and-should-be/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.booleanblackbelt.com/2009/07/linkedin-search-what-it-could-and-should-be/</link>
	<description>Leveraging social networks, resume databases, and the Internet for sourcing and recruiting</description>
	<lastBuildDate>Sat, 13 Mar 2010 14:21:36 -0500</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.3</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: LinkedIn Search Results Sorting: Relevance or Keyword?</title>
		<link>http://www.booleanblackbelt.com/2009/07/linkedin-search-what-it-could-and-should-be/comment-page-1/#comment-4499</link>
		<dc:creator>LinkedIn Search Results Sorting: Relevance or Keyword?</dc:creator>
		<pubDate>Mon, 26 Oct 2009 15:01:23 +0000</pubDate>
		<guid isPermaLink="false">http://www.booleanblackbelt.com/?p=3143#comment-4499</guid>
		<description>[...] does seem to be a safe assumption, because a Principal Search Engineer at LinkedIn commented to this combination, although without going into specific detail as to *exactly* how LinkedIn determines what is [...]</description>
		<content:encoded><![CDATA[<p>[...] does seem to be a safe assumption, because a Principal Search Engineer at LinkedIn commented to this combination, although without going into specific detail as to *exactly* how LinkedIn determines what is [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Boolean Black Belt</title>
		<link>http://www.booleanblackbelt.com/2009/07/linkedin-search-what-it-could-and-should-be/comment-page-1/#comment-4274</link>
		<dc:creator>Boolean Black Belt</dc:creator>
		<pubDate>Tue, 06 Oct 2009 00:14:49 +0000</pubDate>
		<guid isPermaLink="false">http://www.booleanblackbelt.com/?p=3143#comment-4274</guid>
		<description>Jake,
It&#039;s been a short while since we exchanged comments and ideas regarding LinkedIn search - I was wondering when some of the new search functionality you had hinted at might finally be released?

Also, I have a question for you regarding some claims I have heard someone make about the ability to alter their search ranking in LinkedIn - I was hoping to point you towards a video and either debunk or confirm what this person is claiming.

Looking forward to your response.  Thanks!</description>
		<content:encoded><![CDATA[<p>Jake,<br />
It&#8217;s been a short while since we exchanged comments and ideas regarding LinkedIn search &#8211; I was wondering when some of the new search functionality you had hinted at might finally be released?</p>
<p>Also, I have a question for you regarding some claims I have heard someone make about the ability to alter their search ranking in LinkedIn &#8211; I was hoping to point you towards a video and either debunk or confirm what this person is claiming.</p>
<p>Looking forward to your response.  Thanks!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: 090713 Techno Links &#124; johnsumser.com</title>
		<link>http://www.booleanblackbelt.com/2009/07/linkedin-search-what-it-could-and-should-be/comment-page-1/#comment-3958</link>
		<dc:creator>090713 Techno Links &#124; johnsumser.com</dc:creator>
		<pubDate>Mon, 13 Jul 2009 12:54:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.booleanblackbelt.com/?p=3143#comment-3958</guid>
		<description>[...] LinkedIn Search: What it could be and should be Glen Cathey is the Boolean BlackBelt. In this piece, he does the sort of product development that only recruiters can do. Because Glen knows about Lucene, the search engine software behind LinkedIn search (as a recruiter of course), he can make key suggestions about functionality. Buried in the piece is a gem: &#8220;I also caught William Uranga Tweeting from a LinkedIn customer advisory session last week, so I DM’d him and let him know I had a list of search recommendations and he kindly let me send them to him via email so he could share them during the session at LinkedIn. William wrote a post about his customer advisory session experience at LinkedIn - you can read it here.&#8221; There is an amazing community of Recruiters who are helping to bootstrap the next generation of technology. [...]</description>
		<content:encoded><![CDATA[<p>[...] LinkedIn Search: What it could be and should be Glen Cathey is the Boolean BlackBelt. In this piece, he does the sort of product development that only recruiters can do. Because Glen knows about Lucene, the search engine software behind LinkedIn search (as a recruiter of course), he can make key suggestions about functionality. Buried in the piece is a gem: &#8220;I also caught William Uranga Tweeting from a LinkedIn customer advisory session last week, so I DM’d him and let him know I had a list of search recommendations and he kindly let me send them to him via email so he could share them during the session at LinkedIn. William wrote a post about his customer advisory session experience at LinkedIn &#8211; you can read it here.&#8221; There is an amazing community of Recruiters who are helping to bootstrap the next generation of technology. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: The Cardinal Rule of E-Sourcing &#124; Boolean Black Belt</title>
		<link>http://www.booleanblackbelt.com/2009/07/linkedin-search-what-it-could-and-should-be/comment-page-1/#comment-3956</link>
		<dc:creator>The Cardinal Rule of E-Sourcing &#124; Boolean Black Belt</dc:creator>
		<pubDate>Mon, 13 Jul 2009 12:02:36 +0000</pubDate>
		<guid isPermaLink="false">http://www.booleanblackbelt.com/?p=3143#comment-3956</guid>
		<description>[...] I was working on the LinkedIn Search: What it COULD and SHOULD be post, I noticed a couple of things in the video of Esteban Kozak searching for Lucene Open Source [...]</description>
		<content:encoded><![CDATA[<p>[...] I was working on the LinkedIn Search: What it COULD and SHOULD be post, I noticed a couple of things in the video of Esteban Kozak searching for Lucene Open Source [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jake Mannix</title>
		<link>http://www.booleanblackbelt.com/2009/07/linkedin-search-what-it-could-and-should-be/comment-page-1/#comment-3952</link>
		<dc:creator>Jake Mannix</dc:creator>
		<pubDate>Fri, 10 Jul 2009 18:16:24 +0000</pubDate>
		<guid isPermaLink="false">http://www.booleanblackbelt.com/?p=3143#comment-3952</guid>
		<description>Hi Glen,
  I&#039;m glad my comments could help you understand a little of what we&#039;re doing behind the scenes at LinkedIn.  I can&#039;t get into too many details about all of the &quot;Product&quot;-centric decisions on what gets implemented when, but I&#039;ll briefly try to address some of your technical questions:
  If we had 2 completely separate systems, one for doing strictly keyword-based relevance, and not being as up to date (ie. not being realtime - taking only batch updates on a daily or weekly basis), and one for using the social-graph-incorporated relevance with realtime updates, with a 100-millisecond backend SLA, then it&#039;s certainly possible that we could allow much more resource intensive queries such as arbitrary wildcard and prefix queries on the former system.  But what what would the expense be, in terms of development and maintenance, as well as hardware, to maintain both systems?  If you try to run both kinds of queries on the same exact system, you run into the problem of resource allocation: the small minority of users who query with very computationally expensive queries end up locking up searching resources for the rest of everyone else.  It&#039;s a delicate balancing act, and we have to weigh the costs to the many of exposing additional functionality to the few.  To give you an idea about the amount of CPU time I&#039;m talking about, wildcard queries can take anywhere from 10-100 times (or more!) longer to execute in lucene than simple boolean queries (boosting doesn&#039;t really affect performance: we do boosting behind the scenes already, it&#039;s just not exposed in the UI), and the user may not know how slow it&#039;s going to be when executing it, because the latency is highly dependent on the number of terms the wildcard expands to, and in turn how many hits those terms generate.  Because we have a distributed system to serve the search requests (your query goes not to one index, but to roughly 10 at the same time, each with a subset of the userbase), 10-100 times the latency means one user could be hogging the resources of 10 CPU-cores for anywhere from 100ms to 10 seconds.  Another way to put it: if we allow queries which are 100 times as expensive, if only 5% of our querying userbase takes advantage of this functionality, we would be using up 5x the *total load* on our search servers!  This being a popular public site under non-insignificant load, this is a serious concern.

  This is not to say that one can&#039;t do prefix/wildcard-based searching on a Lucene-backed search system with a large number of documents, in a performant way.  It&#039;s just that doing so while also serving a heavy load of textually-simpler (but also taking into account the multiple language preferences of our users, and as mentioned before, the social-graph component) more popular queries with low latency, in the same system is highly nontrivial, and having multiple systems serving the same data in different ways poses its own resourcing challenges.

  Allowing much more advanced control over query relevance is more a question of &quot;what Products LinkedIn should provide&quot;, and is not really my bailiwick, but also digs into the question of who LinkedIn builds products for: we obviously try to serve our Power Users, who do sourcing for a living, but LinkedIn is not just for them - it&#039;s for everyone who wants to take control of their career as if it were a small business, for hiring managers who aren&#039;t search-engine experts, for people looking to connect with former coworkers and clients.  The average user, while wanting the search system to take their &quot;intent&quot; into account, may not have the time or inclination to spend a lot of time learning query syntax or a new UI to plug in how much they care about each term in their query.  

  On the other hand, users in the past decade have been trained by Google to assume that the search engine will be &quot;smart enough&quot; to know what they mean without them being very specific.  Similarly, at LinkedIn, we do a lot of work with offline data mining to do things figuring out that when a user searches for &quot;VP IBM&quot;, they&#039;re looking for someone with the *title* VP and the *company* IBM, even without specifying ccompany:IBM AND ctitle:VP (because a miniscule fraction of our userbase uses the query-field based syntax we expose that you are familiar with).  Similarly, since the typical user is looking for people who currently do that thing, instead of in the past, we dynamically turn VP IBM into (ccompany:IBM AND ctitle:VP)^current_boost OR (pcompany:IBM AND ptitle:VP)^past_boost OR (VP AND IBM)^body_boost, where current_boost &gt; past_boost &gt; body_boost are boost parameters we need to figure out based on how well it serves our users (of course, there&#039;s more going on here as well: when someone puts in &quot;Dell&quot; - are they looking for Michael Dell, the CEO, or are they looking for someone *at* Dell - we do some fancy magic to figure out the relative probabilities of both, if the user doesn&#039;t specify by using the company or name fields, and adjust the boosts accordingly: (lname:Dell^last_name_boost)^last_name_probability OR (ccompany:Dell^current_boost OR pcompany:Dell^past_boost)^company_probability and other things like this).

  But since you really want more control over the kinds of searches you can do on the site... well, I can&#039;t say anything now, but just wait a few weeks or so, you&#039;ll start to get a taste of some of the stuff our Search Team has been brewing for quite some time, and I hope you&#039;ll like it. :)</description>
		<content:encoded><![CDATA[<p>Hi Glen,<br />
  I&#8217;m glad my comments could help you understand a little of what we&#8217;re doing behind the scenes at LinkedIn.  I can&#8217;t get into too many details about all of the &#8220;Product&#8221;-centric decisions on what gets implemented when, but I&#8217;ll briefly try to address some of your technical questions:<br />
  If we had 2 completely separate systems, one for doing strictly keyword-based relevance, and not being as up to date (ie. not being realtime &#8211; taking only batch updates on a daily or weekly basis), and one for using the social-graph-incorporated relevance with realtime updates, with a 100-millisecond backend SLA, then it&#8217;s certainly possible that we could allow much more resource intensive queries such as arbitrary wildcard and prefix queries on the former system.  But what what would the expense be, in terms of development and maintenance, as well as hardware, to maintain both systems?  If you try to run both kinds of queries on the same exact system, you run into the problem of resource allocation: the small minority of users who query with very computationally expensive queries end up locking up searching resources for the rest of everyone else.  It&#8217;s a delicate balancing act, and we have to weigh the costs to the many of exposing additional functionality to the few.  To give you an idea about the amount of CPU time I&#8217;m talking about, wildcard queries can take anywhere from 10-100 times (or more!) longer to execute in lucene than simple boolean queries (boosting doesn&#8217;t really affect performance: we do boosting behind the scenes already, it&#8217;s just not exposed in the UI), and the user may not know how slow it&#8217;s going to be when executing it, because the latency is highly dependent on the number of terms the wildcard expands to, and in turn how many hits those terms generate.  Because we have a distributed system to serve the search requests (your query goes not to one index, but to roughly 10 at the same time, each with a subset of the userbase), 10-100 times the latency means one user could be hogging the resources of 10 CPU-cores for anywhere from 100ms to 10 seconds.  Another way to put it: if we allow queries which are 100 times as expensive, if only 5% of our querying userbase takes advantage of this functionality, we would be using up 5x the *total load* on our search servers!  This being a popular public site under non-insignificant load, this is a serious concern.</p>
<p>  This is not to say that one can&#8217;t do prefix/wildcard-based searching on a Lucene-backed search system with a large number of documents, in a performant way.  It&#8217;s just that doing so while also serving a heavy load of textually-simpler (but also taking into account the multiple language preferences of our users, and as mentioned before, the social-graph component) more popular queries with low latency, in the same system is highly nontrivial, and having multiple systems serving the same data in different ways poses its own resourcing challenges.</p>
<p>  Allowing much more advanced control over query relevance is more a question of &#8220;what Products LinkedIn should provide&#8221;, and is not really my bailiwick, but also digs into the question of who LinkedIn builds products for: we obviously try to serve our Power Users, who do sourcing for a living, but LinkedIn is not just for them &#8211; it&#8217;s for everyone who wants to take control of their career as if it were a small business, for hiring managers who aren&#8217;t search-engine experts, for people looking to connect with former coworkers and clients.  The average user, while wanting the search system to take their &#8220;intent&#8221; into account, may not have the time or inclination to spend a lot of time learning query syntax or a new UI to plug in how much they care about each term in their query.  </p>
<p>  On the other hand, users in the past decade have been trained by Google to assume that the search engine will be &#8220;smart enough&#8221; to know what they mean without them being very specific.  Similarly, at LinkedIn, we do a lot of work with offline data mining to do things figuring out that when a user searches for &#8220;VP IBM&#8221;, they&#8217;re looking for someone with the *title* VP and the *company* IBM, even without specifying ccompany:IBM AND ctitle:VP (because a miniscule fraction of our userbase uses the query-field based syntax we expose that you are familiar with).  Similarly, since the typical user is looking for people who currently do that thing, instead of in the past, we dynamically turn VP IBM into (ccompany:IBM AND ctitle:VP)^current_boost OR (pcompany:IBM AND ptitle:VP)^past_boost OR (VP AND IBM)^body_boost, where current_boost &gt; past_boost &gt; body_boost are boost parameters we need to figure out based on how well it serves our users (of course, there&#8217;s more going on here as well: when someone puts in &#8220;Dell&#8221; &#8211; are they looking for Michael Dell, the CEO, or are they looking for someone *at* Dell &#8211; we do some fancy magic to figure out the relative probabilities of both, if the user doesn&#8217;t specify by using the company or name fields, and adjust the boosts accordingly: (lname:Dell^last_name_boost)^last_name_probability OR (ccompany:Dell^current_boost OR pcompany:Dell^past_boost)^company_probability and other things like this).</p>
<p>  But since you really want more control over the kinds of searches you can do on the site&#8230; well, I can&#8217;t say anything now, but just wait a few weeks or so, you&#8217;ll start to get a taste of some of the stuff our Search Team has been brewing for quite some time, and I hope you&#8217;ll like it. <img src='http://www.booleanblackbelt.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Boolean Black Belt</title>
		<link>http://www.booleanblackbelt.com/2009/07/linkedin-search-what-it-could-and-should-be/comment-page-1/#comment-3951</link>
		<dc:creator>Boolean Black Belt</dc:creator>
		<pubDate>Fri, 10 Jul 2009 02:12:42 +0000</pubDate>
		<guid isPermaLink="false">http://www.booleanblackbelt.com/?p=3143#comment-3951</guid>
		<description>Jake,
Thank you VERY much for responding to my post with your detailed comment. You&#039;ve essentially answered one of the questions I posed in my article, which was why LinkedIn does not allow users to use the wildcard, proximity, and term weighting/boosting search functionality that Lucene supports - that was very helpful. I also appreciated finding out that LinkedIn has always used Lucene - I&#039;d done some research online and I could not find anything that definitively linked LinkedIn and Lucene prior to November 2008.

I have had the privilege of working with software and database engineers responsible for an enterprise portal (involving search) and a 40+ TB data warehouse for a Fortune 50 company, so although I am clearly non-technical, I can appreciate (although certainly nowhere near your level of understanding) the challenges associated with searching profiles that are being updated in real time, and I am aware that prefix searching can be a significant drain on resources (although it can be implemented with extremely fast search execution). 

I do have a few questions for you if you have the time - I&#039;d love to tap your significant experience and insight on a couple of things:

#1 When you sort results by keyword, does LinkedIn search performance still suffer from the challenge of taking into consideration the searcher’s personal network in terms of determining relevance? If sorting results by keyword only is NOT encumbered by the searcher&#039;s connections when considering relevance, could it be less resource-intensive to offer more search functionality such as prefix and configurable proximity? If so, perhaps offering a truly separate keyword-only search and results sorting, unaffected by the user&#039;s network/connections, could offer users more advanced search functionality that could otherwise be too complex/resource dependent when sorting by what LinkedIn terms as &quot;relevance&quot; - which incorporates both keyword and connections. Thoughts?

#2 If not for the real-time updates/search aspect, does enabling wildcard/prefix search still pose as serious of a resource drain for searching LinkedIn? There are databases of well over 40M records where new records are being added every minute (but each record is typically not altered after being entered) that seem to effortlessly handle queries with 10+ prefix/wildcard terms. What in your opinion poses the biggest challenge for LinkedIn in offering prefix searching?

I&#039;m glad you find the idea of offering users the ability to boost specific search terms interesting, and I do agree that most people are unaware of this kind of search functionality. However, that fact alone should not determine whether not not such a feature is offered. In fact, being able to offer search term boosting could be a nice way to differentiate LinkedIn search, in that I am not aware of any social network, social media application, online resume database, or Internet search engine that offers this feature. Bragging rights, if you will (e.g., the Ferrari of people search at no additional cost). Plus, booting terms is not rocket science - I am confident that many users will be able to easily take advantage of the ability to control the relevance of their own results based on what they feel are the most important search terms. 

Ultimately, information management is all about retrieval - data is worthless without the ability to find exactly what you need when you need it. Basic keyword search with Boolean logic (AND/OR/NOT) is a approach that only allows for basic retrieval, which is (IMO) intrinsically limited and imprecise when it comes to retrieving relevant results and is prone to a large percentage of false positive results. I believe that most people only know how to use basic keyword search because they have not been offered anything else. 

When users with premium access to LinkedIn can view 300, 500, 700, or 1000+ results, I&#039;m pretty confident most of them aren&#039;t really viewing past the first 100 - 200, as most people don&#039;t have the time to do so. As such, to truly provide value to your paying customers, it is critical for users to be able to take some control over the relevance of their search results so that the first 100 to 200 are in fact the most relevant to them - in other words, that the the first 100 - 200 results are the ones that most closely match the INTENT of the user&#039;s search, not just the keywords. Search terms, in and of themselves, do not determine relevance - the search engine does - and the search engine does not and can not &quot;know&quot; the intent of the user. I can tell you from personal experience that boosting (controlling specific term relevance weighting) and configurable proximity search can play a HUGE role in users being able to take true control over the relevance of their results.</description>
		<content:encoded><![CDATA[<p>Jake,<br />
Thank you VERY much for responding to my post with your detailed comment. You&#8217;ve essentially answered one of the questions I posed in my article, which was why LinkedIn does not allow users to use the wildcard, proximity, and term weighting/boosting search functionality that Lucene supports &#8211; that was very helpful. I also appreciated finding out that LinkedIn has always used Lucene &#8211; I&#8217;d done some research online and I could not find anything that definitively linked LinkedIn and Lucene prior to November 2008.</p>
<p>I have had the privilege of working with software and database engineers responsible for an enterprise portal (involving search) and a 40+ TB data warehouse for a Fortune 50 company, so although I am clearly non-technical, I can appreciate (although certainly nowhere near your level of understanding) the challenges associated with searching profiles that are being updated in real time, and I am aware that prefix searching can be a significant drain on resources (although it can be implemented with extremely fast search execution). </p>
<p>I do have a few questions for you if you have the time &#8211; I&#8217;d love to tap your significant experience and insight on a couple of things:</p>
<p>#1 When you sort results by keyword, does LinkedIn search performance still suffer from the challenge of taking into consideration the searcher’s personal network in terms of determining relevance? If sorting results by keyword only is NOT encumbered by the searcher&#8217;s connections when considering relevance, could it be less resource-intensive to offer more search functionality such as prefix and configurable proximity? If so, perhaps offering a truly separate keyword-only search and results sorting, unaffected by the user&#8217;s network/connections, could offer users more advanced search functionality that could otherwise be too complex/resource dependent when sorting by what LinkedIn terms as &#8220;relevance&#8221; &#8211; which incorporates both keyword and connections. Thoughts?</p>
<p>#2 If not for the real-time updates/search aspect, does enabling wildcard/prefix search still pose as serious of a resource drain for searching LinkedIn? There are databases of well over 40M records where new records are being added every minute (but each record is typically not altered after being entered) that seem to effortlessly handle queries with 10+ prefix/wildcard terms. What in your opinion poses the biggest challenge for LinkedIn in offering prefix searching?</p>
<p>I&#8217;m glad you find the idea of offering users the ability to boost specific search terms interesting, and I do agree that most people are unaware of this kind of search functionality. However, that fact alone should not determine whether not not such a feature is offered. In fact, being able to offer search term boosting could be a nice way to differentiate LinkedIn search, in that I am not aware of any social network, social media application, online resume database, or Internet search engine that offers this feature. Bragging rights, if you will (e.g., the Ferrari of people search at no additional cost). Plus, booting terms is not rocket science &#8211; I am confident that many users will be able to easily take advantage of the ability to control the relevance of their own results based on what they feel are the most important search terms. </p>
<p>Ultimately, information management is all about retrieval &#8211; data is worthless without the ability to find exactly what you need when you need it. Basic keyword search with Boolean logic (AND/OR/NOT) is a approach that only allows for basic retrieval, which is (IMO) intrinsically limited and imprecise when it comes to retrieving relevant results and is prone to a large percentage of false positive results. I believe that most people only know how to use basic keyword search because they have not been offered anything else. </p>
<p>When users with premium access to LinkedIn can view 300, 500, 700, or 1000+ results, I&#8217;m pretty confident most of them aren&#8217;t really viewing past the first 100 &#8211; 200, as most people don&#8217;t have the time to do so. As such, to truly provide value to your paying customers, it is critical for users to be able to take some control over the relevance of their search results so that the first 100 to 200 are in fact the most relevant to them &#8211; in other words, that the the first 100 &#8211; 200 results are the ones that most closely match the INTENT of the user&#8217;s search, not just the keywords. Search terms, in and of themselves, do not determine relevance &#8211; the search engine does &#8211; and the search engine does not and can not &#8220;know&#8221; the intent of the user. I can tell you from personal experience that boosting (controlling specific term relevance weighting) and configurable proximity search can play a HUGE role in users being able to take true control over the relevance of their results.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Boolean Black Belt</title>
		<link>http://www.booleanblackbelt.com/2009/07/linkedin-search-what-it-could-and-should-be/comment-page-1/#comment-3950</link>
		<dc:creator>Boolean Black Belt</dc:creator>
		<pubDate>Fri, 10 Jul 2009 00:33:02 +0000</pubDate>
		<guid isPermaLink="false">http://www.booleanblackbelt.com/?p=3143#comment-3950</guid>
		<description>Adam,
Thank you for reading my post and leaving your comment. I am definitely looking forward to the upcoming search enhancements!</description>
		<content:encoded><![CDATA[<p>Adam,<br />
Thank you for reading my post and leaving your comment. I am definitely looking forward to the upcoming search enhancements!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jake Mannix</title>
		<link>http://www.booleanblackbelt.com/2009/07/linkedin-search-what-it-could-and-should-be/comment-page-1/#comment-3948</link>
		<dc:creator>Jake Mannix</dc:creator>
		<pubDate>Thu, 09 Jul 2009 17:02:58 +0000</pubDate>
		<guid isPermaLink="false">http://www.booleanblackbelt.com/?p=3143#comment-3948</guid>
		<description>Glen,
  While I&#039;m not in a position to speak for the company, I can say, as one of engineers responsible for search at LinkedIn, that we&#039;ve been using Lucene since the start of the company, not just the past 7 months, and in fact multiple engineers here have contributed code both back to the core Lucene library itself, but also additional search libraries which extend and enhance Lucene (in particular: the Zoie realtime search package: http://zoie.googlecode.com was developed at LinkedIn, and released as an open-source extension of Lucene).
  As Glen Gutmacher notes above, full boolean functionality (as well as exact phrase matching) is available in LinkedIn search - try something like: &quot;venture capital&quot; AND (equity OR fund AND NOT hedge) 
  Briefly regarding some of Lucene&#039;s capabilities which we are *not* exposing currently (especially things like wildcard or fuzzy matching) - remember that we are are providing search across a result set of more than forty million user profiles as they are being updated in real time, and the text alone is not the only component to the relevance: the searcher&#039;s personal view on the social graph plays a strong role, and every user has a different set of connections (and 2nd degree connections, etc) which informs this relevance component.  
  In short: there&#039;s a lot going on when you do just a simple search, and keeping the performance within desired latency specifications puts strong constraints on what kinds of queries we can perform as we continue to scale the site (imagine the amount of processing our servers would have to do to retrieve the results of the simple query: &quot;manag* team&quot;~10 ).  Performance is on the forefront of our minds when doing due diligence on whether to implement any given feature.
  Allowing users to provide their own boosting parameters is an interesting thought, but as this is not something typical search engine users are accustomed to (look at how few, statistically speaking, people even use boolean queries, either on LinkedIn, or with other search pages) I would be surprised if this would be a heavily used feature.</description>
		<content:encoded><![CDATA[<p>Glen,<br />
  While I&#8217;m not in a position to speak for the company, I can say, as one of engineers responsible for search at LinkedIn, that we&#8217;ve been using Lucene since the start of the company, not just the past 7 months, and in fact multiple engineers here have contributed code both back to the core Lucene library itself, but also additional search libraries which extend and enhance Lucene (in particular: the Zoie realtime search package: <a href="http://zoie.googlecode.com" rel="nofollow">http://zoie.googlecode.com</a> was developed at LinkedIn, and released as an open-source extension of Lucene).<br />
  As Glen Gutmacher notes above, full boolean functionality (as well as exact phrase matching) is available in LinkedIn search &#8211; try something like: &#8220;venture capital&#8221; AND (equity OR fund AND NOT hedge)<br />
  Briefly regarding some of Lucene&#8217;s capabilities which we are *not* exposing currently (especially things like wildcard or fuzzy matching) &#8211; remember that we are are providing search across a result set of more than forty million user profiles as they are being updated in real time, and the text alone is not the only component to the relevance: the searcher&#8217;s personal view on the social graph plays a strong role, and every user has a different set of connections (and 2nd degree connections, etc) which informs this relevance component.<br />
  In short: there&#8217;s a lot going on when you do just a simple search, and keeping the performance within desired latency specifications puts strong constraints on what kinds of queries we can perform as we continue to scale the site (imagine the amount of processing our servers would have to do to retrieve the results of the simple query: &#8220;manag* team&#8221;~10 ).  Performance is on the forefront of our minds when doing due diligence on whether to implement any given feature.<br />
  Allowing users to provide their own boosting parameters is an interesting thought, but as this is not something typical search engine users are accustomed to (look at how few, statistically speaking, people even use boolean queries, either on LinkedIn, or with other search pages) I would be surprised if this would be a heavily used feature.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Henry</title>
		<link>http://www.booleanblackbelt.com/2009/07/linkedin-search-what-it-could-and-should-be/comment-page-1/#comment-3947</link>
		<dc:creator>Henry</dc:creator>
		<pubDate>Thu, 09 Jul 2009 14:59:01 +0000</pubDate>
		<guid isPermaLink="false">http://www.booleanblackbelt.com/?p=3143#comment-3947</guid>
		<description>Here are more search tips right from linkedin:
http://www.linkedin.com/static?key=pop/pop_more_search</description>
		<content:encoded><![CDATA[<p>Here are more search tips right from linkedin:<br />
<a href="http://www.linkedin.com/static?key=pop/pop_more_search" rel="nofollow">http://www.linkedin.com/static?key=pop/pop_more_search</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Glenn Gutmacher</title>
		<link>http://www.booleanblackbelt.com/2009/07/linkedin-search-what-it-could-and-should-be/comment-page-1/#comment-3939</link>
		<dc:creator>Glenn Gutmacher</dc:creator>
		<pubDate>Wed, 08 Jul 2009 15:24:30 +0000</pubDate>
		<guid isPermaLink="false">http://www.booleanblackbelt.com/?p=3143#comment-3939</guid>
		<description>@Glen - nice, and we’ll push for it with our LinkedIn contacts, too (Shally Steckerl was the first external vendor officially authorized by LinkedIn to do advanced LI training).

@Phil - that is simply not true. I just ran a simple search of the type you indicated this morning — chief AND (oncologist OR neurologist) — under LI Advanced People Search just to make sure I’d have distinct results and it definitely pulls up both expected result sets.</description>
		<content:encoded><![CDATA[<p>@Glen &#8211; nice, and we’ll push for it with our LinkedIn contacts, too (Shally Steckerl was the first external vendor officially authorized by LinkedIn to do advanced LI training).</p>
<p>@Phil &#8211; that is simply not true. I just ran a simple search of the type you indicated this morning — chief AND (oncologist OR neurologist) — under LI Advanced People Search just to make sure I’d have distinct results and it definitely pulls up both expected result sets.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
