Prerender.io

Last Updated: Jul 19, 2022
documentation for the dotCMS Content Management System

dotCMS includes an app that can be used to connect to and control Prerender.io, a web service that can be used to improve SEO rankings for client-side rendered Javascript/SPA pages. The dotCMS Prerender app does this by intercepting requests by indexing bots, such as Googlebot or Bingbot, and then proxying the request to the Prerender.io service. This service does the work of “prerendering” the Javascript app, including any content that is returned by client-side API calls. It then returns the resultant rendered — or “hydrated” — HTML to dotCMS and finally to the request for indexing.

To use this app, you will need either an account with Prerender.io or to have set up your own instance of the Prerender.io open-source application, which can be connected to and used by your dotCMS instance.

How the App Works

  1. It first performs a check to see if we should show a prerendered page.
    1. Check if the request is from a crawler — as defined in the crawlerUserAgents or if the request has an _escaped_fragment_ in its URL.
    2. Check to make sure we aren't requesting a resource — such as JS, CSS, etc.
    3. (optional) Check to make sure the URL is in the whitelist.
    4. (optional) Check to make sure the URL isn't in the blacklist.
  2. Make a GET request to the Prerender service (PhantomJS server) for the page's prerendered HTML.
  3. Return that HTML to the crawler.

Customization

Prerender Service URL

Defaults to the Prerender.io service at http://service.prerender.io/. If you've deployed the opensource Prerender.io service on your own infrastructure, you can set the URL so that it points there instead.

prerenderToken

This is the token from your Prerender.io account used to validate the prerender request.

protocol

If you specifically want to make sure that the Prerender service queries using HTTPS or HTTP protocol, you can set the init-param protocol to https or http respectively. Should generally be https.

whitelist

This is a comma-separated list of URL regular expressions (regexes) that will be sent to Prerender.io for rendering.

Example:/products/.*,/blog/.*

blacklist

This is a comma-separated list of URL regexes that will never be sent to Prerender.io for rendering.

Example: /images/.*,/css/.*

crawlerUserAgents

This is a comma-separated list of strings that will be matched against the request's User-Agent header. If one of these strings match, the request will be proxied to Prerender.io.

Example: googlebot,bingbot

forwardedURLHeader

Important for servers behind a reverse proxy that need a different public URL to be used for prerendering.

Testing

You can test if your requests are being prerendered by setting the correct User-Agent header in a page request. For example:

curl --head -H 'User-Agent: googlebot' https://www.my-spa.com/blogs/improving-our-seo 

This will give you back a Prerender request header — something like:

x-prerender-requestid: 28fcca74-71a4-4b5f-a293-380e81ec4cac

Note on Hashbang Navigation

If you are using a # in your URLs, make sure to change it to #!, known as the hashbang character.

For hashbang URLs

To see:http://localhost:3000/#!/profiles/1234
Go to:http://localhost:3000/?_escaped_fragment_=/profiles/1234

For push-state URLs

To see:http://localhost:3000/profiles/1234
Go to:http://localhost:3000/profiles/1234?_escaped_fragment_=

For general information about AJAX crawling, read more about Google's protocol here.

On this page

×

We Dig Feedback

Selected excerpt:

×