Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The XperienCentral search engine is based on the popular open source search engine Lucene. The Lucene core code is used for the search service, querying and indexing. A lot of additional functionality has been added to deal with XperienCentral-specific content, structure, security and architecture. For more information on Apache Lucene see http://lucene.apache.org/.

In This Topic

Table of Contents
maxLevel2
minLevel2

...

Retrieving the documents starts with retrieving the URLs URLs of all documents and pages. GX WebManager XperienCentral provides links to all items that should be indexed on one page: the indexer page. This page contains references to:

  • Pages of all web initiatives
  • Media items created in the last 5 days
  • Documents (uploaded at to a Download element or placed in the Object Manager)
  • Special content types (such as government papers)

The URL of the indexer page is configured in the properties.txt file (metaurl parameter: ‘metaUrl’) and is the default start page in the Setup tool. Note that the setting in the properties.txt file is leading.

...

A request for a document is not only a direct request for the document - additional meta information is requested as well. This extra meta information is provided by the indexer page by requesting the indexer page with an additional ‘documentdocument=parameter. For example to index the homepage on local GX WebManager XperienCentral installation the crawler requests the URL http://localhost:8080/web/webmanager?id=39016&document= http%3A%2F%2F127.0.0.1%3A8080%2Fweb%2Fshow%2Fid%3D26111. The underlined value is the URL-encoded URL of the homepage. When this URL is requested a small XML result is returned which looks like this:

...

  • Special fields from indexed PDF, MS Word and MS Excel documents. Possible fields include author, creator, encrypted, filesize and so forth.
  • All <meta> tag fields found in HTML pages. For example, when a page is indexed that contains the code <meta name=”mySpecialField” content=”mySpecialContent”/>, a new field will be created in the index with the title mySpecialField. For this page it will contain the value mySpecialContent - for all other documents it will be empty.

...