The /wiki/spaces/PD/pages/24707083 page explains how to add or change the metadata of a SOLR document. However, the one piece of information you cannot change that way is the document's location. And sometimes this is just what you want or need to doSometimes you might need to modify a document's location. For example, when if your website is running runs on https HTTPS but the your SSL offloading is performed by the loadbalancer a load balancer in front of your server, In a case like this, indexing the a website with the https url's HTTP URLs is not possible. You have to change them to httpHTTP. Fortunately the wmasolrsearch add-on of GX WebManager The wmasolrsearch
bundle in XperienCentral defines a UrlProvider
service in the nl.gx.product.wmasolrsearch.api
package for just for this purpose.
Adding
...
URLs
To add your own set of url's URLs to the SOLR Solr index, you can define a new Service service that implements theĀ UrlProvider
interface and implements its method
...
the method List<String> getUrlList(boolean
...
includeAll)
...
. Using the OSGi dependency mechanism in
...
XperienCentral, this new UrlProvider
service is automatically picked up by the SearchService
...
and its list of
...
URLs is appended to the default list
...
in XperienCentral.
Changing or
excluding url'sIf you want to change the url's determined by the default UrlProviderof GX WebManager, you can do this by first capturing these url's, changing them, and then feeding them to the SearchService in the wmasolrsearch add-on:
Code Block | ||||
---|---|---|---|---|
| ||||
private UrlProvider m_urlProvider; // Injected by OSGi
private SearchService m_searchService; // Injected by OSGi
private void index(boolean fullIndex, boolean clearRest) {
// Get all the url's from GX WebManager that should be indexed
String[] urls = m_urlProvider.getUrls(indexFullContent);
// Update the url's in the list as you see fit
...
// Index the urls allowing all hostnames and setting the follow-url depth to 0.
// The SearchService class also has a different indexPages method in which you can specify
// the allowed hostnames and depth.
m_searchService.indexPages(urls, allowedHostNames.split(","), 0, m_configUtil.getClearRest());
}
|
When you do this, you normally do not want the default indexer task of GX WebManager to run. You can disable it by emptying the wmasolrsearch.crontabschedule configuration setting on the GX WebManager setup page.
Code Block | ||||
---|---|---|---|---|
| ||||
// Get all the url's that should be indexed
String[] urls = m_urlProvider.getUrls(indexFullContent);
LOG.info("The url provider returned " + urls.length + " url's to index.");
// Add, remove or change the url's
urls = updateUrls(urls);
LOG.info("Updating the url's resulted in " + urls.length + " url's to index.");
// Index the urls
String allowedHostNames = m_configUtil.getAllowedHostNames();
if (allowedHostNames != null && allowedHostNames.trim().length() > 0) {
m_searchService.indexPages(urls, allowedHostNames.split(","), 0, m_configUtil.getClearRest());
} else {
// Allow all hostnames and set the depth to 0
m_searchService.indexPages(urls, m_configUtil.getClearRest());
} |
...
Excluding URLs
URLs can be changed and or removed by creating a new UrlFilter
service. The methodĀ filterURLs
must be implemented and return the new filtered URLs. Similar to the UrlProvider
, all UrlFilter
services are automatically picked up by the SearchService
when the OSGi bundle is installed.
Example
Download the urlProviderService.zip example.