Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

 

The default configuration of the XperienCentral search engine provides a basic search implementation for generic websites, however, every website is different. Different content, content types, structure, visitors, tasks and several other differences can totally change the perception of the search engine for the website visitor. This topic provides more information for content owners, webmasters and developers about how to measure, optimize and improve the quality of the search engine.

 

In This Topic

 


Measuring

Search Behavior

Before radically changing the search engine it is a must to known as much as possible about the visitors of your website and their search behavior. There are two main questions that have to be answered:

  1. What are people searching for?
  2. What do they expect?

Here are some steps to gain more insight in the search behavior of your visitors:

  1. Use a web analytics tool to generate reports of search queries. XperienCentral usually generates search queries where the search terms (keywords) are included in the URL. Most web analytics tools provide a way to search for specific URLs. Search for URLs with the string ‘&keyword=’ and generate some reports: for the last month and for the last year.
    Example: in Google Analytics this can be done by navigating to ‘Content > Top Content’ and then entering ‘&keyword=’ in the ‘Search for URLs containing’ box.
  2. Listen to your visitors. If you have the chance, then talk to your website visitors. If they complain about the search engine then call them or email them and ask the two main questions: “What were you looking for?” and “What did you expect?”. Also ask which queries they used.

Index Quality and Index Characteristics

Keeping your search index up to date is really important. Not only indexing new documents as quickly as possible (at least within 24 hours), but also removing documents to avoid dead links. It is important to check periodically if the index is up to date, or maybe even fully re-index the site.

Knowing what is actually indexed is also important, especially for the analysis and approach for tuning the search engine. Relevant questions are:

  1. Which fields are actually indexed and what information is stored in the fields? This is by far the most important question. One way to describe it is "garbage in = garbage out". In other words if the structure, fields and/or contents of the indexed documents is incorrect then the search index will never be of high quality. Use a good tool to analyze the index.
  2. What is the ratio between HTML documents, Word/PDF documents and other content? The reason to check this is that large amounts of Word documents can drastically lower the relevance of normal HTML pages.
  3. What is the average size and standard deviation in size of all these documents? Large documents could mean lower relevance.

 

Back to top

 


Analysis

Once you have information about the queries that people use, it is time to sit down with some people who have in depth knowledge of the content on the website. It’s important to realize that analyzing search queries and search behavior is not a technical process, but that it’s all about linking website content to the website visitors. Therefore the best way to analyze the results of your research is to sit down with several content owners, editors, domain experts or any other role within the organization that can assist in this process.

One way to do this is to organize a session where someone presents the top search queries and summarizes the feedback from search engine users. In this session an attempt could be made to link the top queries to the most relevant pages. Make sure that everyone is aware that it’s not only about what visitors want to find, but also what the organization wants them to find! This is not only important for companies who want to sell products or services, but for any organization. It’s all about conversion, and conversion is not only about leading visitors to ordering the most expensive product, but also about answering questions for citizens (for governments), finding the self-service page etc. Write down the links between top queries and pages, plus other suggestions from the content experts.

In the same session try to get answers to fundamental questions such as:

  • Do visitors always want the most recent documents first, or is the relevance more important than the document date? Is this for all documents, or only some documents?
  • Do visitors expect a) answers, b) links or c) direct information? This is different from asking “do we want to provide our visitors with answers/links/direct information?”. Choosing a) for example can have a lot of implications for your information structure and search engine, but if that is what visitors expect than this should be the goal.
  • Do we really need to include our 10.000 document management system in the search index? Does it bring anything extra, or does it just lower the relevance of our normal web pages?

 

Back to top

 


Improvements

After this session it’s time for some homework again, because then it’s important to find out why a certain top 10 query leads to page X and not to page Y, which is the best page according to the content experts. The Setup tool can be used to get more information about the relevance score for the queries.

There are two main reasons why a page is less relevant than another page:

  1. the number of words on a page combined with the index factor of the fields is lower
  2. the size of the page is larger, i.e. smaller pages tend to be more relevant than long documents

The first is by far the most important factor. Luckily there are several ways to get higher scores. First of all make sure the page is indexed properly. Make sure that all fields content, langid, keyword(s), summary, title and webid are filled correctly. Use a tool to inspect the index (see chapter 5.1)

The index factor settings can lead to wrong results. The index factors for fields are specified in the properties.txt file. When not specified the default settings are:

factor.title=10
factor.description=5
factor.keyword=10
factor.location=1

These factors tell the search engine how important a field is. By default the title is 10 times more important than the location. If the ‘keyword’ field is filled with non-relevant results, or not filled at all, then it might be smart to clean up all keyword settings or set the keyword factor to a lower value (5 or 1). Tip: Don’t use ‘default meta keywords settings’ in Config > Web initiative configuration > [General]. Leave it empty.

The same goes for the ‘title’ field. In most (default) presentations the title field is prefixed with the same value on every page, for example the company name. If it’s import that the company name leads to a specific page then this is almost impossible, unless you remove it from the title or lower the factor.title setting. For more information about using the properties.txt file see chapter 4.1.

These checks will help to improve the relevance for certain pages by analyzing what’s actually indexed and removing unnecessary information or changing the index factor. After changing index factors or changing the presentation it’s necessary to fully re-index the website again because documents are not indexed alone, but in relation to the other documents. To get accurate results this is necessary.

 

 

 

 

Back to top

 

 

 

 

 

 

 

 

 

 

 

 

  • No labels