Search Engine Troubleshooting

 

The default configuration of the XperienCentral search engine provides a basic search implementation for generic websites, however, every website is different. Different content, content types, structure, visitors, tasks and several other differences can totally change the perception of the search engine for the website visitor. This topic provides more information for content owners, webmasters and developers about how to measure, optimize and improve the quality of the search engine.

 

In This Topic

 


Measuring

Search Behavior

Before radically changing the search engine it is a must to known as much as possible about the visitors of your website and their search behavior. There are two main questions that have to be answered:

  1. What are people searching for?
  2. What do they expect?

Here are some steps to gain more insight in the search behavior of your visitors:

  1. Use a web analytics tool to generate reports of search queries. XperienCentral usually generates search queries where the search terms (keywords) are included in the URL. Most web analytics tools provide a way to search for specific URLs. Search for URLs with the string &keyword= and generate some reports: for the last month and for the last year.
  2. Listen to your visitors. If you have the chance, then talk to your website visitors. If they complain about the search engine then call them or email them and ask the two main questions: “What were you looking for?” and “What did you expect?”. Also ask which queries they used.

Index Quality and Index Characteristics

Keeping your search index up to date is really important. Not only indexing new documents as quickly as possible (at least within 24 hours), but also removing documents to avoid dead links. It is important to check periodically if the index is up to date, or maybe even fully re-index the site.

Knowing what is actually indexed is also important, especially for the analysis and approach for tuning the search engine. Relevant questions are:

  1. Which fields are actually indexed and what information is stored in the fields? This is by far the most important question. One way to describe it is "garbage in = garbage out". In other words if the structure, fields and/or contents of the indexed documents is incorrect then the search index will never be of high quality. Use a good tool to analyze the index.
  2. What is the ratio between HTML documents, Word/PDF documents and other content? The reason to check this is that large amounts of Word documents can drastically lower the relevance of normal HTML pages.
  3. What is the average size and standard deviation in size of all these documents? Large documents could mean lower relevance.

 

Back to top

 


Analysis

Once you have information about the queries that people use, it is time to sit down with some people who have in depth knowledge of the content on the website. It’s important to realize that analyzing search queries and search behavior is not a technical process, but that it’s all about linking website content to the website visitors. Therefore the best way to analyze the results of your research is to sit down with several content owners, editors, domain experts or any other role within the organization that can assist in this process.

One way to do this is to organize a session where someone presents the top search queries and summarizes the feedback from search engine users. In this session an attempt could be made to link the top queries to the most relevant pages. Make sure that everyone is aware that it’s not only about what visitors want to find, but also what the organization wants them to find. This is not only important for companies who want to sell products or services, but for any organization. It’s all about conversion, and conversion is not only about leading visitors to ordering the most expensive product, but also about answering questions for citizens (for governments), finding the self-service page etc. Write down the links between top queries and pages, plus other suggestions from the content experts.

In the same session try to get answers to fundamental questions such as:

  • Do visitors always want the most recent documents first, or is the relevance more important than the document date? Is this for all documents, or only some documents?
  • Do visitors expect a) answers, b) links or c) direct information? This is different from asking “do we want to provide our visitors with answers/links/direct information?”. Choosing a) for example can have a lot of implications for your information structure and search engine, but if that is what visitors expect than this should be the goal.
  • Do we really need to include our 10.000 document management system in the search index? Does it bring anything extra, or does it just lower the relevance of our normal web pages?

 

Back to top

 


Improvements

After this session it’s time for some homework again, because then it’s important to find out why a certain top 10 query leads to page X and not to page Y, which is the best page according to the content experts. The Setup tool can be used to get more information about the relevance score for the queries.

There are two main reasons why a page is less relevant than another page:

  1. the number of words on a page combined with the index factor of the fields is lower
  2. the size of the page is larger, i.e. smaller pages tend to be more relevant than long documents

The first is by far the most important factor. Luckily there are several ways to get higher scores. First of all make sure the page is indexed properly. Make sure that all fields content, langid, keyword(s), summary, title and webid are filled correctly. Use a tool to inspect the index (see chapter 5.1)

The index factor settings can lead to wrong results. The index factors for fields are specified in the properties.txt file. When not specified the default settings are:

factor.title=10
factor.description=5
factor.keyword=10
factor.location=1

These factors tell the search engine how important a field is. By default the title is 10 times more important than the location. If the ‘keyword’ field is filled with non-relevant results, or not filled at all, then it might be smart to clean up all keyword settings or set the keyword factor to a lower value (5 or 1). Tip: Don’t use the default meta keywords settings. Leave it empty.

The same goes for the title field. In most (default) presentations the title field is prefixed with the same value on every page, for example the company name. If it’s import that the company name leads to a specific page then this is almost impossible, unless you remove it from the title or lower the factor.title setting.

These checks will help to improve the relevance for certain pages by analyzing what’s actually indexed and removing unnecessary information or changing the index factor. After changing index factors or changing the presentation it’s necessary to fully re-index the website again because documents are not indexed alone, but in relation to the other documents. To get accurate results this is necessary.

Improvement Suggestions

Besides carefully analyzing what is indexed and tuning all the fields and indexing factors, there are many other improvements that can be implemented. The best approach is to try to improve the search results first by analyzing the fields as mentioned in the previous paragraph. But luckily there are several other improvements that are known to work well to improve the search experience. Here is a top 5:

  1. Divide your website in several parts, categories or other logical parts. For example if you have a forum or document database on your website then index these sources with additional metadata that can be used to create additional filters. This is explained in the ‘How-to’ chapter.
  2. Offer advanced search and filtering options, such as searching in site categories, document types, date ranges and sorting by date instead of relevance. Optionally create a simple search interface and an advanced search interface.
  3. Provide search tips to website visitors. Show example queries, preferably from the query top 10 of course. Explain how to use the advanced search options, if they are implemented.
  4. Remove pages or documents from the index. By removing documents the total amount of documents is lower and therefore the relevance of the remaining documents automatically increases. Usually there are groups or types of documents that may be nice to index, but are not relevant at all for the average user. This could also be a bit tricky, so be careful not to remove too many documents.
  5. Implement a "best bets" search.

 

 

 

Back to top