Performance Checklist
This topic contains a short overview of all the factors in an XperienCentral deployment that affect performance.
Check the Versions of Related Software
Check which versions of XperienCentral, Java, the web server, the application server and database are being used. Performance may have been improved or performance issues may have been fixed in more recent versions of these software components. It is known that running the latest versions of these related software applications usually provides the best performance. Also check the latest XperienCentral hardware and software requirements. You can view which software versions you are running in the Cluster tab of the Monitoring Dashboard.
Hardware
Check whether your hardware meets the minimum required (see Hardware and Software Requirements).
Server Setup
Investigate how many backends and frontends are being used. You can use the status and controller_status Administrative Pages to see how much traffic the servers are handling. Are the backends only used by editors or do they also serve frontend visitors? What algorithm does the load balancer use to balance the load in the cluster? Use the Administrative Pages to find information about the frontends and backends.
Content
Does the XperienCentral installation have multiple channels, a large amount of pages and/or Content Repository items? How large is the JCR? Using the the Administrative Pages, you can see information about all content items in your channel(s).
Indexing Settings
How frequently does the indexing job run? When is it scheduled and how long does it take? Does the site use real time indexing or full indexing?
Time Synchronization
Are the times on all servers synchronized? When the time between servers is not properly synchronized, this can cause two problems, depending on which server is ahead in time;
- If the backend is ahead in time, the frontend fails to serve content from the cache for a period of time that equals the time difference.
- If the frontend is ahead in time, it fails to refresh and keeps serving outdated content for a period of time that equals the time difference between it and the backend.
Use of SSIs
Does the site use many SSIs, especially those with a short cache timeout or no cache timeout. SSIs can be seen using ssidebug=true
. If Apache SSIs are used, are pages without Apache SSIs dumped to .html and pages with Apachi SSIs dumped to .shtml? In XperienCentral versions 10.9 and earlier, SSIs are executed synchronously, which means that if one page has many SSIs, these server-side HTTP requests are invoked one after another. From XC 10.10 onwards, the number of concurrent SSI requests is controlled by the render_threads
and render_threads_incontext
configuration settings (see the General tab of the Setup Tool).
Multiple SSI's can slow down page requests significantly. You can see the SSIs being executed using the Threads Administrative Page. For example:
Thread[TP-Processor39,5,main] TIMED_WAITING refresh
at java.lang.Object.wait(Native Method)
at nl.gx.proxy.storage.URLManager$Task.waitFor(URLManager.java:696)
at nl.gx.proxy.storage.URLManager.getConnection(URLManager.java:206)
at nl.gx.proxy.storage.Cache.getConnection(Cache.java:229)
at nl.gx.proxy.storage.Cache$CacheFactory.getConnection(Cache.java:658)
at nl.gx.webmanager.handler.util.Util.getShowConnection(Util.java:452)
If you see this sort of message often, it might be an indication that there are too many SSIs and/or SSI invocation takes a long time.
Check HTTP Requests
Using the status and controller_status Administrative Pages you can see the outstanding HTTP requests and the most recently handled HTTP requests. Which requests took a long time to complete? How many requests are currently open? How much traffic is the server handling? Be sure to verify the status on each backend and frontend individually.
Dumping Pages
Make sure that pages with very high traffic (like the homepage) are being dumped.
Minimize/Aggregate JS/CSS/img
Are static JS and CSS files minimized and aggregated? Are images reduced in size by a tool like Smush Image Compression? You could also consider compressing the statics in Apache (by using mod_deflate
).
Optimize JS Invocation
You can optimize the speed of a web page by selectively choosing the way JavaScript is included on the page. JavaScript included using the <script>
tag is executed before the rest of the page is rendered/executed in a synchronous call. Therefore, in general, it is a good idea to include those scripts just before the </body>
tag. By doing so, the HTML is loaded and visible to the visitor and the page seems to be loaded while the JavaScript is executed in the background. Furthermore, in HTML5, you can use the defer
and async
attributes to fine tune JS loading. Using defer
, the JS is loaded after including the page, similar to what happens when it is just before the <body>
tag). Using async
, the JS is loaded asynchronously.
Cache Headers
Are the Keep-Alive headers and other caching headers of static files configured properly? You can use Firebug in Firefox and Developer Tools in Chrome to see the exact headers being returned. The best way to cache static files is to give each one of them a version number. This allows the web server to cache static files forever.
Thread Analysis
Use the Threads Administrative Page to retrieve a thread dump at runtime. Alternatively, an application manager or developer can perform a kill -quit
to create a thread dump and write it to the log file. When you compare several thread dumps created shortly after each other, you may identify that some threads seem to consume an inordinate amount of time. Blocked threads need to be investigated because they are awaiting to get a hold on some lock — knowing what that lock is can help you identify problems. See http://geekexplains.blogspot.com/2008/07/threadstate-in-java-blocked-vs-waiting.html for a good explanation of the possible thread states and what they mean.
Memory Analysis
If you enable garbage collection logging in the start script of XperienCentral, all major and minor garbage collections are logged. Typically, minor collections are performed often but they should be completed quickly. Major garbage collections should not be performed frequently because they take a significant amount of time to finish. If you use the Concurrent Mark Sweep (CMS) garbage Collector, look for the error "concurrent mode failure" because this causes a temporary "Stop the world".
Using jmap you can also generate a heap analysis (jmap -heap
) and an overview of objects present on the heap (jmap -histo
). Dumping the complete heap provides the most useful information however this can sometimes be quite large and difficult to retrieve. More information about the use of yourkit and solving OOMs van be found here.
Browser Developer Tools
Tools like Firebug and the Chrome Developer Tools help you monitor the incoming and outgoing HTTP traffic including the following:
- How many separate CSS and JS files are loaded when you open a page.
- Cached static files In most tools you can see if they are served from the cache or not. Alternatively, you can check the HTTP response headers.
- Static files, like images, and their sizes.
- How long HTTP requests take to process.
Web Performance Tools
There are several tools that can help you performance test your web pages. Some of them provide a report that suggests improvements. Two such programs are Pingdom and JMeter.
Profiling
If you have access to a test, staging or development environment you can use profilers like YourKit to find out in detail which classes/methods consume the most CPU time. It can also be used for memory analysis.
XperienCentral Monitoring Tools
The Administrative Pages help you inspect many important metrics regarding your XperienCentral deployment.
Log Files
Check the Tomcat log file for errors or warnings that appear often and also check the size of the log file. You can edit the logging.properties
to customize the log entries (by severity). See Avoiding Clogged Logfiles. Is the logfile regularly rotated? Check the web server access log files to see the incoming HTTP requests. Pay special attention to the identity of the user-agents. Oftentimes bots like Googlebot and Bingbot generate high amounts of traffic during their crawls. Usually you can use tools of the bot owner to configure them for crawling your site.
Session Timeout
Verify the number of concurrent HTTP sessions using Tomcat management tools and check the configuration for the session timeout. A high timeout, 60 minutes for example, can lead to too many sessions open at one time.
JackRabbit Settings
Check the JackRabbit configuration settings in the repository.xml
file if you experience out of memory errors.
Scheduled Tasks
There are job schedules that you configure in the General tab of the Setup Tool that can have a major impact on performance:
Setting | Explanation |
---|---|
formsengine_prehandling_cachetimeout | The PreHandlingElementHolderCacheRecalculator task evaluates whether a form element requires prehandling. If so, it updates the prehandling cache. By default, this is configured to run every 2 minutes. If your site has many form elements this could have a major impact on performance. In that case it might be a good idea to set this setting to a longer time period. |
current_rollover_detector_schedule | The CurrentRolloverDetectorImpl task checks whether content items need to be rolled over by looping over all content items every 5 minutes. A loop time of every 15 or 30 minutes may be better. |
contentindex_queue_poller_schedule | The IndexQueueServiceImpl task indexes the content index for inline search. Usually this is scheduled for every 20 seconds, however the indexing might take longer than that which causes a lock contention. |
contentindex_queue_iteration_limit | This setting which determines the maximum number of items to index when a full content index is triggered. |
contentindex_optimize_schedule | The content search optimizer runs a Lucene optimization exery x minutes. Optimizing is very expensive and causes a lot of input/output. In a lot of deployments it is a good idea to let this run once nightly. |
Clean up after Copying a Database
When you copy a database from one environment to another, you should clean some tables of cluster IDs that are not applicable in the new environment. This applies in particular to modules that use revision numbers per cluster node. If the cluster nodes are not removed, the main revision table cannot delete any records since it thinks that some cluster nodes are not yet up to date.
This applies to;
- The table
wmGlobalIndexEventQueue
and revision tablewmLocalRevision
— Remove nonexistant clusters IDs from thewmLocalRevision
table, otherwise no records will be deleted from thewmGlobalIndexEventQueue
table. - The table
wm9_cluster_journal
and revision tablewm9_cluster_local_revision
in a clustered environment. Remove nonexistent indexing clusters IDs from thewm9_cluster_local_revision
table, otherwise the JCR janitor will not remove any records from thewm9_cluster_journal
table. - The table
rtindex_items
and revision tablertindex_revision
when real-time indexing is used. Remove nonexistent clusters IDs from thertindex_revision
table, otherwise no records will be deleted from thertindex_items
table.
AccessValveFilter
The AccessValveFilter by default uses a maximum of 5 simultaneous HTTP requests. Change it a large number, e.g. the maximum number of threads that the application server can handle. In addition, also increase the refreshmax parameter of the filter to e.g. 45.
Known Issues
Check JIRA to find any known performance issues in a specific XperienCentral. Some known issues;
- Performance issue with file distribution on MS SQL 2012 (GXWM-29415)
- Personalization dialog very slow when BlueConic integration is active (GXWM-29507)
- Performance issue in page tree (same Ajax request fired 6 times) (GXWM-29107)
- Performance issue when many pages in page tree are expanded (GXWM-29112)
- Content index queue service runs fullIndex, even when limit is set to 0 (GXWM-29098)
- Major lock contention in IndexQueueServiceImpl (GXWM-29088)
- The Entity manager should use hasNode instead of catching PathNotFoundException for performance reasons (GXWM-29095)
Maximum Stale Time
There is a setting called cache_max_stale_time
on the General tab of the Setup Tool. By default, this value is 0 and so this feature is disabled. If the max stale time is bigger then 0, content is still returned from the cache for a while after the tiemstamp has expired. For example, if the max state time is 5 minutes, content for which a cached version is available is returned from the cache for another 5 minutes after the timestamp has expired. So this effectively creates a soft timestamp mechanism for all content.