/
Solr Search Replication

Solr Search Replication

 

Since XperienCentral R44 the (frontend) Solr search index can be easily replicated over multiple nodes in a cluster. This means that the actual indexing of new or updated content is only done on the 'leader' node (usually the edit environment) and the other nodes in the cluster periodically download an update of that index to use locally.

image-20250404-145902.png

The main advantages of this approach are:

  • Indexing is only done once, on the leader node, which increases overall performance

  • The index is consistent over all nodes; no more differences in search results between nodes

  • Maintenance of the index is easier as changes performed on the (leader) edit node are automatically synchronized to the other nodes

  • When a new node is added to the cluster it retrieves the complete search index in a matter of minutes.

    • This removes the need for a complete full index on the new node completely

 

There are some (smaller) disadvantages as well:

  • The full index is also stored in the database (up to two times the size of the original index during the replication process) which could mean the database will grow significantly when the Solr index on disk is large.

  • When the leader node has problems, the indexing of updated content might temporarily stop.

    • Problems on the leader/editor node are usually quickly detected and fixed because it stops editors from doing their work. On top of that the amount of missed updates is small because the editors can’t do their work either.

  • You can’t use Solr indexing anymore to populate the caches on the frontend node

How it works

Default indexing behavior

Every node in an XC cluster has a Solr index on disk (usually located in /work/searchengine) which is used for every search query. Whenever content has changed it needs to be reindexed (HTTP request to the contents URL) and stored in that Solr index on disk.

The index on disk has a ‘generation’, a number indicating the version of the Solr index which increases on every update of the index.

 

The indexing can be triggered in a few ways:

  • By default a periodic indexing of all content is triggered by XC, typically every night.

    • All URLs are provided and crawled and the results are stored in the Solr index

  • When using the Realtime Indexing functionality XC detects changes to content and then specifically reindexes the updated content only.

    • Only the URLs for the updated content are crawled and the results are stored in the Solr index

    • Periodic full indexing can be disabled when using Realtime Indexing.

      • When necessary a full reindex can also be triggered using Realtime Indexing

In the default setup the crawling is done on every node, the results are processed on every node and the index is updated on every node.

Solr replication

When Solr replication is enabled the behavior is different between the leader node and all the other follower nodes.

Before Solr 8.7 the ‘leader’ node was called the ‘master’ node and the ‘follower’ nodes where called ‘slave’ nodes.

Leader node

  • On the leader node all ‘write actions’ to the index are actually executed so that when XC wants to index a contentitem an URL is crawled and the Solr index on disk is updated

  • Periodically (every minute by default) XC checks whether the ‘generation’ ID of the Solr index on disk matches the Solr index in the database

    • If they don’t match (or there is no index stored in the database yet) the internal ‘Solr replication’ process is started which writes every single file of the index on disk to the database

    • When the new generation is succesfully stored in the database the previous generation is removed from the database

      • For a short moment we can have a maximum of 2 complete copies of the solr index in the database so the database size should be able to handle that properly

    • When a crawl action is active the check process is skipped because the resulting generation might be already outdated the moment it is replicated, because of the active crawl action that might update the Solr index on disk.

Follower nodes

  • All write actions to the Solr index are ignored on the follower nodes

  • Periodically (every 20 seconds by default) Solr checks whether the ‘generation’ ID of the Solr index on disk matches the Solr index in the database

    • If they don’t match (or there is no index stored on disk yet) the internal ‘Solr replication’ process is started which writes every single file of the index to disk

 

Note: the leader node for Solr replication is the so-called taskmaster node in XC. In a situation with 1 editor node this is always the taskmaster node. In a clustered setup with multiple editor nodes the taskmaster can be either one of the editor nodes.

Installation / Configuration

When running on a R44 or higher you can enable Solr replication. This is only useful when you have multiple nodes to replicate between.

Prerequisites

The following actions have to be performed before it will actually work:

  • Make sure the following Java options are added to your setenv.sh or setenv.bat:

    • -Dsolr.disable.shardsWhitelist=true

    • -Dsolr.disable.allowUrls=true

  • Make sure the solrconfig.xml contains the following block:

    • <requestHandler name="/replication" class="solr.ReplicationHandler">

    • The pollInterval indicates how often the frontend nodes will check whether a new ‘generation’ of the Solr index is available. The default is every 20 seconds (00:00:20)

    • Make sure the masterUrl property points to the local machine on every node.

      • By default the URL is http://127.0.0.1/web/solr/replication but that requires that 127.0.0.1 is added as a backend alias in /web/setup

      • You can also point the URL to the edit hostname as that hostname should remain ‘within the local machine’ on every node as well: https://backendhostname/web/solr/replication

Configuration

To enable the Solr replication feature the /web/setup configuration entry replication_enabled within the wmasolrsearch configuration set should be enabled.

The replication_crontabschedule configuration entry defines when the leader node starts the process of writing the Solr index to the database, if updated. By default this is every minute (0 * * * * ?).

Monitoring and debugging

After configuring and enabling Solr replication you might want to check whether the replication actually is working. This paragraph describes a few things to check when you expect issues with Solr replication or you just want to check whether it works as expected.

  • In the Search Tools tab of the XC Setup screen you should see extra information in the Server status fieldset:

    • Replication = true / false

      • indicates whether replication is enabled

      • When enabled another field appears:

        • Replication master = true / false

          • indicates whether the current node is the replication leader (master)

  • The database tables solrReplication and solrReplicationGeneration give a good insight into the status of the replication on the leader node:

    • The solrReplication table contains the actual file contents for a given generation

      • There should be at least a complete set of entries for one generation ID.

        • A complete set exists of:

          • command=indexversion&qt=%2Freplication&version=2&wt=javabin

          • command=filelist&generation=[generation id]&qt=%2Freplication&version=2&wt=javabin

          • and then 1 entry for every file that is actually in the Solr index on disk

      • There should be at most 2 different generation IDs in the generation column

      • The timestamp of the latest generation entries should roughly correspond with the first replication run on the leader node after the last actual change to the Solr index on disk

    • The solrReplicationGeneration table contains the actual generation ID and that should correspond with a complete set of entries in the solrReplication table with the same generation ID

  • On both the leader and the follower nodes you should see incoming requests to the following URL for the User-Agent 'Solr[org.apache.solr.client.solrj.impl.HttpSolrClient] 1.0' when looking at the /web/admin/status page:

    • /web/solr/replication?command=indexversion&qt=%2Freplication&version=2&wt=javabin

    • This request should return a 200 status.

    • Whenever the index on disk needs to be updated on a follower node this initial request is followed by many requests to actually retrieve the contents of the index.

  • Connect a logger to the classes 'nl.gx.product.wmasolrsearch.servlet.SolrServlet' and 'nl.gx.product.wmasolrsearch.searchservice.ReplicationServiceImpl' to see the log messages of these 2 classes that are used by Solr replication in XC

  • The Solr Maintenance reusable tab Advanced / Solr status contains information about the replication status on every node in the cluster. This is very useful when you don’t have direct access to the frontend / follower nodes.

    • The tab shows information about all nodes in the cluster:

      • Node name

      • Whether Solr itself is active (in other words whether it started succesfully)

      • Number of results in the index

      • Replication status, like this:

        • Slave (generation: 2289516, indexversion: 1742608814202, status: OK)

    • Example:

      • image-20250411-133443.png
      • In this example the Slave / Follower node is not up to date with the Master / leader node which indicates a problem with replication

Related content