Search Relevance and Search Quality Takes Center Stage

Poor search quality results in a frustrating user experience. If you cannot show your customers targeted, personalized results that solve their problem, you will quickly lose them to a plethora of other options that they have on their fingertips. In today’s connected world, users have more choices than ever and if you cannot quickly and accurately give them what they need, your competitors would. These disappointed users will probably never return to your service. Bad search relevance essentially translates to bad service.

Good search results, on the other hand, keep users on the product and get them hooked. They fall in love with what they see and discover. This results in a far superior user experience, engagement, and loyalty. Improving relevance is a long-term investment. You will see an uptick in all key business metrics and better conversions once you are able to serve better listings and recommendations to your customers. If you surface the most appropriate, valuable and high quality content at the right time, it will drive results.

Introduction

Search Relevance is when your application accurately and intuitively answers questions asked by your users. It learns over time and personalizes results based on deep understanding, learning, domain knowledge, and the user’s likes and preferences. Your search algorithms need to sort the document results so that the content most relevant to a query are shown first. The search results need to answer user’s questions and solve problems.

Your goal should be to provide your users with the most comprehensive and relevant search results possible. If there is relevant data available, you want to ensure it can be found quickly and easily. Essentially, you have to be very good at telling the difference between good and bad matches. You also want to quickly and accurately identify and eliminate incorrect data, spam, predatory offers, duplicate listings or misleading information from your search results. With better data shown to users, you will see an increase in time spent on your product, repeat visits and number of searches performed.

You need to evaluate search against well-defined metrics and outcomes to scientifically and statistically measure the performance of your search optimization efforts. Every experiment that your product, engineering, data science and machine learning teams undertake with changes in Elasticsearch, Solr or Lucene should be tracked, monitored, measured and analyzed closely across well-defined key performance indicators (KPIs). The search quality team needs to eventually measure which changes translated into happier and more profitable customers – and then invest more in those initiatives. You need to improve your machine learning algorithms based on measurement of search results. If you are not measuring results and analyzing them, you are driving in the dark based on gut feeling - which in most cases turns out to be wrong.

Process and Techniques

Improving search involves continuously iterating through a wide range of techniques and running A/B tests.

  • Build a feedback loop in your product and business strategy to get direct user feedback. Use search analytics to get a better sense of what users search and what they do with the results.
  • Integrate domain and subject matter experts from all departments in your design process to get feedback on relevancy based on search queries.
  • Conduct usability studies to learn how users gain interest in different results and what appeals to them.
  • Use personas to break down your users into different buckets and see how you can serve their needs.
  • Define use cases and get a better understanding of how your product is helping users get things done and find the information they are looking for.
  • Your search algorithms need to start getting better over time at computing relevancy scores for a given query and assign them to all the documents in your database. Relevancy scores take into consideration content match and external factors such as popularity in terms of user clicks. Google PageRank for instance paid a lot of importance initially based on number and authority of other sites linking to a document and the anchor text that indicated to it how others understood that document.
  • Normalize text from search queries and results using stemming techniques to allow fuzzy matching.
  • Weigh various fields differently based on importance and context.
  • Analyze the position of words in a search query.
  • Identify relevance between query and document text based on deeper analysis of content beyond simple term matching. Search engines are expanding the user's terms to include acronyms, spelling differences, synonyms and other strongly related terms.
  • Understand the grammatical structure of text in the query and search results to allow for better matching using Natural Language Processing. Develop relationship between words to detect associations.
  • Analyze which results were viewed by users and how long they stayed on that page to gauge if what was shown was found to be useful and relevant. Trustworthiness and popularity of each document and content source is an important factor.
  • Evaluate how you can leverage semantic search, personalization, recommendation engines, voice search and visual search.
Humans-in-the-Loop and Training Data Sets

The way users type a search query can often be fuzzy and not very clearly aligned with their intent. This is why you need to use humans-in-the-loop with machines to create high quality training data sets and ask them to judge search results. This will help improve the precision of your models over time and the relevancy will automatically adapt to changes in user behavior.

You should build a process to ask human judges to score quality of search results against a query. These metrics will help you compare the performance of difference search algorithms, understand how humans think about searching on your product, and identify specific areas for improvement.

The judgment process is subjective, and different people will make different choices. To solve this problem, you need to ask multiple people to assess the same results to get an average score. It is critical that you partner with a vendor whose human judges do not game the system by answering fast and erroneously. There are several techniques to identify bad human judges. One common technique is to randomly inject questions where you know what the relevance score should be, and then check that the judges answer most of those correctly. Another technique is to review the input from judges who consistently answer differently from other judges for the same tasks. Having this process in place will help you avoid poor quality and misleading training data.

Conclusion

Today, data is growing at an unprecedented speed. With advances in artificial intelligence (AI), machine learning, natural language processing and computing power you have the tools and infrastructure to dramatically improve the quality of your search. Organizations need to create a centralized Search Quality and Relevance team that needs to be tightly integrated with product teams. It’s time for search to take center stage to differentiate your product and build customer loyalty.