...
The XperienCentral search engine is based on the popular open source search engine Lucene. The Lucene core code is used for the search service, querying and indexing. A lot of additional functionality has been added to deal with XperienCentral-specific content, structure, security and architecture. For more information on Apache Lucene see http://lucene.apache.org/.
In This Topic
Table of Contents | ||||
---|---|---|---|---|
|
...
Retrieving the documents starts with retrieving the URLs URLs of all documents and pages. GX WebManager XperienCentral provides links to all items that should be indexed on one page: the indexer page. This page contains references to:
- Pages of all web initiatives
- Media items created in the last 5 days
- Documents (uploaded at to a Download element or placed in the Object Manager)
- Special content types (such as government papers)
The URL of the indexer page is configured in the properties.txt file (metaurl
parameter: ‘metaUrl’) and is the default start page in the Setup tool. Note that the setting in the properties.txt
file is leading.
...
A request for a document is not only a direct request for the document - additional meta information is requested as well. This extra meta information is provided by the indexer page by requesting the indexer page with an additional ‘documentdocument=
’ parameter. For example to index the homepage on local GX WebManager XperienCentral installation the crawler requests the URL http://localhost:8080/web/webmanager?id=39016&document= http%3A%2F%2F127.0.0.1%3A8080%2Fweb%2Fshow%2Fid%3D26111
. The underlined value is the URL-encoded URL of the homepage. When this URL is requested a small XML result is returned which looks like this:
...
- Special fields from indexed PDF, MS Word and MS Excel documents. Possible fields include
author
,creator
,encrypted
,filesize
and so forth. - All
<meta>
tag fields found in HTML pages. For example, when a page is indexed that contains the code<meta name=”mySpecialField” content=”mySpecialContent”/>
, a new field will be created in the index with the titlemySpecialField
. For this page it will contain the valuemySpecialContent
- for all other documents it will be empty.
...