The XperienCentral JCR Index

The method of indexing content in XperienCentral was changed in version R28 and further refined in version R29.

Beginning in version R28, GX Software uses an XperienCentral-optimized implementation for indexing content. In XperienCentral R28, the implementation ran alongside the Apache Jackrabbit JCR index functionality. In XperienCentral versions R27 and earlier, XperienCentral used only Apache Jackrabbit for content indexing.

In XperienCentral versions R29 and higher, the Apache Jackrabbit JCR index functionality is disabled by default. If you want to use Apache Jackrabbit with XperienCentral, follow the steps in the last section of this topic. If you do not configure XperienCentral to use Apache Jackrabbit, the following functionality does not work out-of-the-box:

The JCR Browser
The JCR Import/Export Tool
Custom code which has not been rewritten to use the new search index

In This Topic

Advantages of the XperienCentral JCR Index

The XperienCentral JCR index stores its index in the database instead of in files on disk. The major advantage of this approach is that the index can be reused by all nodes in the cluster, without the need re-index when a new node is added to the cluster. There is only one index with the database-based index which means that errors caused by a mismatch between the different indices on the nodes in the cluster can no longer occur with the new index. Finally, since the index is stored in the same database as the content to which it applies, there is no longer a need to re-index after copying an environment by restoring a backup.

Database Tables

The JCR index stores its index in two tables prefixed by wmJcrIndex; wmJcrIndexNodes and wmJcrIndexProperties. The wmJcrIndexNodes table stores the most important metadata for nodes stored in the JCR, for example the UUID, name and nodetype. The wmJcrIndexProperties stores some of the properties of these nodes and refers to the node the property belongs to in the UUID column.

Indexing

The JCR index is built automatically. Several mechanisms are in place to ensure the index is up-to-date.

Initial Index

When a new XperienCentral installation is started or an existing version lower than R28 is started, the JCR index will be built by the webmanager-jcrindex-bundle. When this bundle starts, it checks to see whether there are any records in the index tables. If not, it starts indexing automatically. This initial indexing shows the following warning messages in the log:

Initializing JCR index...Added 5000 records to the JCR indexAdded 10000 records to the JCR index...Finished initializing JCR index of 12425 nodes in 13253 ms

Quick Scan Task

To prevent inconsistencies in the JCR index, a "quick scan" task runs every 5 minutes. This task verifies that the number of nodes in the JCR index matches the number of nodes in the JCR. This is a very simple and cost-effective way to tell whether the index is up-to-date. Each time a mismatch is detected, a internal counter is incremented by 1. Only when 12 subsequent checks detect a mismatch (which takes approximately one hour) is the index is considered to be inconsistent which then leads to the "check and repair" task being executed. The reason for this is that a mismatch between the index and the actual number of nodes in the JCR may appear to be in a consistent state when the checker runs at the exact moment that a JCR node is being added or removed. The probability that this can happen 12 times in a row however is very small. The "quick scan" task logs the following warning message when the JCR index is considered to be inconsistent:

JCR Index is out of sync: 835203 bundles but 835204 records in the index

Check and Repair Task

The check and repair task kicks in when the JCR index is considered to be inconsistent (see Quick Scan Task above). This task consists of two stages; the "check and repair missing records" stage and the "check and repair obsolete records" stage. The first stage loops over all nodes in the JCR and verifies that a corresponding record is available in the JCR index. If not, it appends the record to the index and logs a warning message similar to the following:

JCR index is missing record for uuid 0005781c-7689-4d48-8dfc-ecdc7a25a4baAdding missing record 0005781c-7689-4d48-8dfc-ecdc7a25a4ba to the JCR index

In the second stage, the task loops over all records in the JCR index and verifies that the corresponding node exists in the JCR. If it does not, it removes the record from the index and logs a warning message similar to the following:

JCR index contains obsolete record for uuid 0005781c-7689-4d48-8dfc-ecdc7a25a4baRemoving obsolete record 0005781c-7689-4d48-8dfc-ecdc7a25a4ba from the JCR index

After running both stages, the index should be consistent again.

Reindex

In some very specific circumstances, a reindex task may be triggered. A reindex task will simply repeat the initialization of the index (without first removing it) as described in the initial index explanation above. A reindex will not truncate the index before indexing. A reindex is initiated by adding an indexed property, see Indexed Properties below.

JCR Index Query Manager

The JcrIndexQueryManager interface is the primary service API for performing search queries on the index. It can be used to perform queries in the JCR on node type, properties that must match certain values and more. For complete information on the available query methods, see the XperienCentral Javadoc.

Indexed Properties

When queries are performed on the JcrIndexQueryManager using the property matching method, the performance of this method heavily depends on whether these properties are contained in the index or not. Properties that are stored in the wmJcrIndexProperties table of the JCR index are referred to as "indexed properties". Performing queries on those properties is fast because the query can directly operate on the corresponding property in this table. When a query is executed on properties that are not indexed, the query will be much slower because it needs to retrieve the property from the JCR first before it can check whether the property matches.

("Property matching method": JcrIndexQueryManager#getNodes(Session session, String nodeType, Map<String, Object> filterValues) )

When a property matching query is executed, the query manager will first execute a database query that filter all nodes that match the indexed properties. Subsequently it will iterate over all remaining nodes and check which ones match the non-indexed properties. If there is a large number of remaining nodes to iterate over left, executing the query may take a long time to execute. For that reason, it is strongly recommended not to execute such queries or to make these properties indexed properties. When such a time consuming query is being executed on more than 5000 results, a warning message similar to the following is logged:

Heavy filtering using non-indexed properties: 1% of 44,150 results left after filtering on: [custompagemetadata: custompagetype, custompagemetadata:traininggroupcode, custompagemetadata:language, custompagemetadata:form]

This particular message warns that the JCR index query manager needs to iterate over 44.150 nodes in the JCR, retrieve the four properties it mentions and check whether these all match the specified values. As a result of that filtering, only 1% of those 44.150 nodes is left. This warning message will be shown when filtering on non-indexed properties reduces the result by more then 85%. In other words; if the percentage shown in the warning message is smaller than 15%. Queries like these should be avoided. There are two possible ways to resolve it. Either improve the query by matching on more indexed properties so that fewer than x number of nodes (44.150 in the example above) are left to apply the non-indexed filtering on, or make (some of) the non-indexed properties indexed. How to do so is explained in the next section.

Adding an Indexed Property

Adding and removing indexed properties is supported by the JcrIndexPropertyService service API which contains methods to retrieve the indexed properties, check whether a property is indexed and to add or remove an indexed property. Adding an indexed property is fairly easy using the JcrIndexPropertyService. For example:

JcrIndexedProperty property = new JcrIndexedProperty(
   "wmamodularcontent:stringvalue",       // The fully qualified name of the node type that holds the property
   "wmamodularcontent:templateproperty",  // The fully qualified name of the property
   PropertyType.STRING                    // The type of the property, referring to one of the types in javax.jcr.PropertyType
)
propertyService.addIndexedProperty(property);

The code snippet above will append an indexed property for the property identified by wmamodularcontent:templateproperty which is a property of the node type wmamodularcontent:stringvalue and is of type STRING.

After adding an indexed property, a “re-index” flag will be set to TRUE because no values are available for that property in the JCR index tables yet. When the quick scan task is being executed, a re-index will be performed automatically. See the section Reindex above.

Removing an Indexed Property

An indexed property can be removed simply using JcrIndexPropertyService#removeIndexedProperty with an instance of JcrIndexedProperty using the same arguments used to add the property. No reindex is performed after removing an indexed property.

Configuring Apache Jackrabbit

To enable the Apache Jackrabbit, for example to use the JCR Browser, you must make a few configuration changes to your deployment. Follow these steps:

Stop XperienCentral.
Open the file <wm-root>/work/<jcr directory>/repository.xml in a text editor.

Add the following declarations to the <Workspace></Workspace> section:

<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex"> 
   <param name="path" value="${wsp.home}/index" />
   <param name="respectDocumentOrder" value="false" />
   <param name="useCompoundFile" value="true" />
   <param name="minMergeDocs" value="100" />
   <param name="volatileIdleTime" value="3" />
   <param name="maxMergeDocs" value="100000" />
   <param name="mergeFactor" value="100" /><!-- old: 10 -->
   <param name="bufferSize" value="100" /><!-- old: 10 -->
   <param name="cacheSize" value="100000" /><!-- old: 1000 -->
   <param name="forceConsistencyCheck" value="false" />
   <param name="autoRepair" value="true" />
   <param name="onWorkspaceInconsistency" value="log" />
</SearchIndex>

Add the following declarations to the <Versioning></Versioning> section:

<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
   <param name="path" value="${rep.home}/repository/index" />
   <param name="respectDocumentOrder" value="false" />
   <param name="useCompoundFile" value="true" />
   <param name="minMergeDocs" value="100" />
   <param name="volatileIdleTime" value="3" />
   <param name="maxMergeDocs" value="100000" />
   <param name="mergeFactor" value="100" /><!-- old: 10 -->
   <param name="bufferSize" value="100" /><!-- old: 10 -->
   <param name="cacheSize" value="100000" /><!-- old: 1000 -->
   <param name="forceConsistencyCheck" value="false" />
   <param name="autoRepair" value="true" />
   <param name="onWorkspaceInconsistency" value="log" />
</SearchIndex>

Save repository.xml.
Delete the directory <wm-root>/work/jcr/repository.
Restart XperienCentral. The Apache Jackrabbit JCR index will be built.

Back to Top