Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

During the startup of the search engine some basic configuration parameters have to be available. These basic settings are stored in the file properties.txt. The general format in the properties.txt file is [config parameter]=[config value]. The names of the configuration parameters are case sensitive. Comments can be added by putting a # in front of the line. Additional explanation is given in the *** comments of the properties.txt file.


Back to top


...

Task Configuration: crontab.txt

...

For more information and examples see Crontab. The fullindex and index commands have three arguments:

...

This configuration specifies that a local website has to be indexed with depth=1 at 5 past midnight every day. At 2 AM the index is checked for all URLs (*) and non existing URLs are removed.


Back to top


...

Anchor
parser_configuration
parser_configuration
Parser Configuration: parser.txt

...

The parser.txt file is read every minute so it’s not required to restart the search service when the contents are edited. Every document is matched top-down and from left to right. The document will be sent to the parser of every line that matches. When no valid parser is found, the document will not be indexed. This is also counts for the special parser name ‘-‘, which also means the document type is will not be indexed.


Back to top


...

Anchor
credentials_xml
credentials_xml
Credentials Configuration: credentials.xml

Even though the search engine indexes the website through the frontend, there is a basic form of authentication required to retrieve the indexer page and the meta information of documents. The authentication for the search engine is configured in the file credentials.xml. Besides basic authentication, credentials.xml can also contain advanced authentication for secure websites and documents.

...

Code Block
<credentials>
   <credential pattern="http://www.gxsoftware.com/web/show/.*" type="postform" username="gxsearch" password="Search987">
   <!-- indicate which input parameters in the login form correspond to the user and password -->
   <param name="userparam" value="f48305" />
   <param name="passwordparam" value="f48306" />
   <!-- the action url george needs to post the user/password to -->
   <param name="actionurl" value=" http://www.gx.nl/web/formhandler?source=form" />
   <!-- include all input parameters in the form -->
   <formparam name="id" value="29347" />
   <formparam name="pageid" value="47952" />
   <formparam name="handle" value="form" />
   <formparam name="ff" value="47954" />
   <formparam name="form" value="48067" />
   <formparam name="formelement" value="47954" />
   <formparam name="originalurl" value=" http://www.gx.nl/web /show/id=40945/cfe=47954/ff=47954" />
   <formparam name="errorurl" value=" http://www.gx.nl/web /show/id=40945/cfe=47954/ff=47954/formerror=47954" />
   <formparam name="f48305" value="" />
   <formparam name="formpartcode" value="f48305" />
   <formparam name="f48306" value="" />
   <formparam name="formpartcode" value="f48306" />
   </credential>
   <!—- other credentials here -->
</credentials>


Back to top


...

Additional Meta Data: meta.txt

...

In this example, documents with URLs that contain the string "javadoc" will get an additional field pagetype with value "javadoc". The second line creates a field owner with value "gx" for all documents from the website www.gxsoftware.com. The format for the meta.txt file is <URL pattern><tab><index field><tab><index value>. The URL pattern is a regular expression. The string separator has to be a tab and not several spaces. Some IDEs (such as Eclipse) can be configured to automatically convert tabs to spaces which can lead to unwanted behavior. The meta.txt is read every minute so it’s not required to restart the search engine after the file has been changed. The reason for setting these properties is that they can be used for filtering the search results. For example, based on the above meta.txt, it is very easy to filter out all the items that have “gx” as value for the property owner.


Back to top