/
Performance Checklist

Performance Checklist

This topic contains a short overview of all the factors in an XperienCentral deployment that affect performance.

Check the Versions of Related Software

Check which versions of XperienCentral, Java, the web server, the application server and database are being used. Performance may have been improved or performance issues may have been fixed in more recent versions of these software components. It is known that running the latest versions of these related software applications usually provides the best performance. Also check the latest XperienCentral hardware and software requirements. You can view which software versions you are running in the Cluster tab of the Monitoring Dashboard.

Hardware

Check whether your hardware meets the minimum required (see Hardware and Software Requirements).

Server Setup

Investigate how many backends and frontends are being used. You can use the status and controller_status Administrative Pages to see how much traffic the servers are handling. Are the backends only used by editors or do they also serve frontend visitors? What algorithm does the load balancer use to balance the load in the cluster? Use the Administrative Pages to find information about the frontends and backends.

Content

Does the XperienCentral installation have multiple channels, a large amount of pages and/or Content Repository items? How large is the JCR? Using the the Administrative Pages, you can see information about all content items in your channel(s).

Indexing Settings

How frequently does the indexing job run? When is it scheduled and how long does it take? Does the site use real time indexing or full indexing?

Time Synchronization

Are the times on all servers synchronized? When the time between servers is not properly synchronized, this can cause two problems, depending on which server is ahead in time;

  • If the backend is ahead in time, the frontend fails to serve content from the cache for a period of time that equals the time difference.
  • If the frontend is ahead in time, it fails to refresh and keeps serving outdated content for a period of time that equals the time difference between it and the backend.

Use of SSIs

Does the site use many SSIs, especially those with a short cache timeout or no cache timeout. SSIs can be seen using ssidebug=true.  If Apache SSIs are used, are pages without Apache SSIs dumped to .html and pages with Apachi SSIs dumped to .shtml? In XperienCentral versions 10.9 and earlier, SSIs are executed synchronously, which means that if one page has many SSIs, these server-side HTTP requests are invoked one after another. From XC 10.10 onwards, the number of concurrent SSI requests is controlled by the render_threads and render_threads_incontext configuration settings (see the General tab of the Setup Tool).

Multiple SSI's can slow down page requests significantly. You can see the SSIs being executed using the Threads Administrative Page. For example:

Thread[TP-Processor39,5,main] TIMED_WAITING refresh 
at java.lang.Object.wait(Native Method)
at nl.gx.proxy.storage.URLManager$Task.waitFor(URLManager.java:696)
at nl.gx.proxy.storage.URLManager.getConnection(URLManager.java:206)
at nl.gx.proxy.storage.Cache.getConnection(Cache.java:229)
at nl.gx.proxy.storage.Cache$CacheFactory.getConnection(Cache.java:658)
at nl.gx.webmanager.handler.util.Util.getShowConnection(Util.java:452)

If you see this sort of message often, it might be an indication that there are too many SSIs and/or SSI invocation takes a long time.

Check HTTP Requests

Using the status and controller_status Administrative Pages you can see the outstanding HTTP requests and the most recently handled HTTP requests. Which requests took a long time to complete? How many requests are currently open? How much traffic is the server handling? Be sure to verify the status on each backend and frontend individually.

Dumping Pages

Make sure that pages with very high traffic (like the homepage) are being dumped.

Minimize/Aggregate JS/CSS/img

Are static JS and CSS files minimized and aggregated? Are images reduced in size by a tool like Smush Image Compression? You could also consider compressing the statics in Apache (by using mod_deflate).

Optimize JS Invocation

You can optimize the speed of a web page by selectively choosing the way JavaScript is included on the page. JavaScript included using the <script> tag is executed before the rest of the page is rendered/executed in a synchronous call. Therefore, in general, it is a good idea to include those scripts just before the </body> tag. By doing so, the HTML is loaded and visible to the visitor and the page seems to be loaded while the JavaScript is executed in the background. Furthermore, in HTML5, you can use the defer and async attributes to fine tune JS loading. Using defer, the JS is loaded after including the page, similar to what happens when it is just before the <body> tag). Using async, the JS is loaded asynchronously.

Cache Headers

Are the Keep-Alive headers and other caching headers of static files configured properly? You can use Firebug in Firefox and Developer Tools in Chrome to see the exact headers being returned. The best way to cache static files is to give each one of them a version number. This allows the web server to cache static files forever.

Thread Analysis

Use the Threads Administrative Page to retrieve a thread dump at runtime. Alternatively, an application manager or developer can perform a kill -quit to create a thread dump and write it to the log file. When you compare several thread dumps created shortly after each other, you may identify that some threads seem to consume an inordinate amount of time. Blocked threads need to be investigated because  they are awaiting to get a hold on some lock — knowing what that lock is can help you identify problems. See http://geekexplains.blogspot.com/2008/07/threadstate-in-java-blocked-vs-waiting.html for a good explanation of the possible thread states and what they mean.

Memory Analysis

If you enable garbage collection logging in the start script of XperienCentral, all major and minor garbage collections are logged. Typically, minor collections are performed often but they should be completed quickly. Major garbage collections should not be performed frequently because they take a significant amount of time to finish. If you use the Concurrent Mark Sweep (CMS) garbage Collector, look for the error "concurrent mode failure" because this causes a temporary "Stop the world".

Using jmap you can also generate a heap analysis (jmap -heap) and an overview of objects present on the heap (jmap -histo). Dumping the complete heap provides the most useful information however this can sometimes be quite large and difficult to retrieve. More information about the use of yourkit and solving OOMs van be found here.

Browser Developer Tools

Tools like Firebug and the Chrome Developer Tools help you monitor the incoming and outgoing HTTP traffic including the following:

  • How many separate CSS and JS files are loaded when you open a page.
  • Cached static files In most tools you can see if they are served from the cache or not. Alternatively, you can check the HTTP response headers.
  • Static files, like images, and their sizes.
  • How long HTTP requests take to process.

Web Performance Tools

There are several tools that can help you performance test your web pages. Some of them provide a report that suggests improvements. Two such programs are Pingdom and JMeter.

Profiling

If you have access to a test, staging or development environment you can use profilers like YourKit to find out in detail which classes/methods consume the most CPU time. It can also be used for memory analysis.

XperienCentral Monitoring Tools

The Administrative Pages help you inspect many important metrics regarding your XperienCentral deployment.

Log Files

Check the Tomcat log file for errors or warnings that appear often and also check the size of the log file. You can edit the logging.properties to customize the log entries (by severity). See Avoiding Clogged Logfiles. Is the logfile regularly rotated? Check the web server access log files to see the incoming HTTP requests. Pay special attention to the identity of the user-agents. Oftentimes bots like Googlebot and Bingbot generate high amounts of traffic during their crawls. Usually you can use tools of the bot owner to configure them for crawling your site. 

Session Timeout

Verify the number of concurrent HTTP sessions using Tomcat management tools and check the configuration for the session timeout. A high timeout, 60 minutes for example, can lead to too many sessions open at one time.

JackRabbit Settings

Check the JackRabbit configuration settings in the repository.xml file if you experience out of memory errors.

Scheduled Tasks

There are job schedules that you configure in the General tab of the Setup Tool that can have a major impact on performance:

SettingExplanation
formsengine_prehandling_cachetimeoutThe PreHandlingElementHolderCacheRecalculator task evaluates whether a form element requires prehandling. If so, it updates the prehandling cache. By default, this is configured to run every 2 minutes. If your site has many form elements this could have a major impact on performance. In that case it might be a good idea to set this setting to a longer time period.
current_rollover_detector_schedule

The CurrentRolloverDetectorImpl task checks whether content items need to be rolled over by looping over all content items every 5 minutes. A loop time of every 15 or 30 minutes may be better.

contentindex_queue_poller_scheduleThe IndexQueueServiceImpl task indexes the content index for inline search. Usually this is scheduled for every 20 seconds, however the indexing might take longer than that which causes a lock contention.
contentindex_queue_iteration_limitThis setting which determines the maximum number of items to index when a full content index is triggered.
contentindex_optimize_scheduleThe content search optimizer runs a Lucene optimization exery x minutes. Optimizing is very expensive and causes a lot of input/output. In a lot of deployments it is a good idea to let this run once nightly.

Clean up after Copying a Database

When you copy a database from one environment to another, you should clean some tables of cluster IDs that are not applicable in the new environment. This applies in particular to modules that use revision numbers per cluster node. If the cluster nodes are not removed, the main revision table cannot delete any records since it thinks that some cluster nodes are not yet up to date.

This applies to;

  • The table wmGlobalIndexEventQueue and revision table wmLocalRevision — Remove nonexistant clusters IDs from the wmLocalRevision table, otherwise no records will be deleted from the wmGlobalIndexEventQueue table.
  • The table wm9_cluster_journal and revision table wm9_cluster_local_revision in a clustered environment. Remove nonexistent indexing clusters IDs from the wm9_cluster_local_revision table, otherwise the JCR janitor will not remove any records from the wm9_cluster_journal table.
  • The table rtindex_items and revision table rtindex_revision when real-time indexing is used. Remove nonexistent clusters IDs from the rtindex_revision table, otherwise no records will be deleted from the rtindex_items table.

AccessValveFilter

The AccessValveFilter by default uses a maximum of 5 simultaneous HTTP requests. Change it a large number, e.g. the maximum number of threads that the application server can handle. In addition, also increase the refreshmax parameter of the filter to e.g. 45.

  

 More...

The Accessvalve filter of XperienCentral controls the maximum requests an XperienCentral server will handle simultaneously.

It can be configured via 3 parameters in the web.xml, max, refreshmax en timeout.

Parameter “max”

The total maximum of requests that will be handled simultaneously.

The default value and the value that is used when no max parameter is specified is 5.

When this maximum is reached additional requests will be placed in a waiting queue until the number of running queues falls below this maximum.

Next to the absolute number of requests it is also possible to configure the:

Parameter “refreshmax”

When the refreshmax number of requests is reached the server will only handle additional requests for cached content until the total number of requests falls below the maxrefresh again.

Specifying a maxrefresh value that is lower than the max parameter will ensure that Interactive Forms requests will be given a higher priority than processing content updates e.g. when a page has been change by an editor.

When the refreshmax parameter is not specified the value that is used will be the same as the max parameter.

Parameter “timeout”

De timeout parameter determines the amount of time in seconds a request will be considered an active request.

When a request can't be executed within this timeout this request will no longer be considered relevant to determine if the max / maxrefresh of the total amount of requests has been reached.

When the timeout parameter is not specified all requests will places in the queue until it is handled.

The default web.xml of XperienCentral contains this AccessValveFilter configuration:


    <filter>
        <description>FIXME</description>
        <display-name>AccessValveFilter</display-name>
        <filter-name>AccessValveFilter</filter-name>
        <filter-class>nl.gx.filter.access.AccessValveFilter</filter-class>
        <init-param>
            <description>FIXME</description>
            <param-name>max</param-name>
            <param-value>5</param-value>
        </init-param>
    </filter>


Here is an example of how the AccessValveFilter can be configured in practice:


    <filter>
        <description>FIXME</description>
        <display-name>AccessValveFilter</display-name>
        <filter-name>AccessValveFilter</filter-name>
        <filter-class>nl.gx.filter.access.AccessValveFilter</filter-class>
        <init-param>
            <description>FIXME</description>
            <param-name>max</param-name>
            <param-value>100</param-value>
        </init-param>
        <init-param>
            <description>FIXME</description>
            <param-name>refreshmax</param-name>
            <param-value>40</param-value>
        </init-param>
    </filter>


This configuration will limit the total amount of simultaneous requests to 40 and leave the number of simultaneous requests between 40 and 100 to cachable content request.


These values max100 and refreshmax40 are probably very conservative but that can only be determined reliably by performing a load test based on the expected peak load of the website.


Very important to note is here is that sometimes other systems such as back-end systems are accessed during request handling.

A high(er) number for maxrefresh can cause an overload for these other systems. When the performance of these other systems can't be improved a lower refreshmax should be considered.

Known Issues

Check JIRA to find any known performance issues in a specific XperienCentral. Some known issues;

  • Performance issue with file distribution on MS SQL 2012 (GXWM-29415)
  • Personalization dialog very slow when BlueConic integration is active (GXWM-29507)
  • Performance issue in page tree (same Ajax request fired 6 times) (GXWM-29107)
  • Performance issue when many pages in page tree are expanded (GXWM-29112)
  • Content index queue service runs fullIndex, even when limit is set to 0 (GXWM-29098)
  • Major lock contention in IndexQueueServiceImpl (GXWM-29088)
  • The Entity manager should use hasNode instead of catching PathNotFoundException for performance reasons (GXWM-29095)

Maximum Stale Time

There is a setting called cache_max_stale_time on the General tab of the Setup Tool. By default, this value is 0 and so this feature is disabled. If the max stale time is bigger then 0, content is still returned from the cache for a while after the tiemstamp has expired. For example, if the max state time is 5 minutes, content for which a cached version is available is returned from the cache for another 5 minutes after the timestamp has expired. So this effectively creates a soft timestamp mechanism for all content.