SEO Considerations & Implications of Adopting a SPA/JavaScript React Framework

Recently (as in, the past year or so) there has been a lot of discussion around the SEO implications of migrating a site to a JS framework. 

This is a really tricky topic to dissect and I feel as if little was truly known about what would happen, as most clients/sites are slow to adopt this process. Recently we’re seeing actual case studies and results come from this migration though, such as these results shared recently on Twitter by Pedro Dias: 

So, I think the time has come to look JS in the eye and say, “Hey: why do you tank my SEO?” 

Grab a beer (or tea. or coffee.) because things are about to get technical. 

To understand why a JavaScript framework may not be great for SEO, we need to go back to the basics to remember how Google actually renders and retrieves information. 

In general, there are three main aspects to the search engine retrieval process: 

  1. Crawler (i.e. Googlebot)
  2. Indexer (for Google, this is called “Caffeine”)
  3. Query Engine (i.e. the platform itself, such as Google)

The Crawler’s job is to find all URLs on the web (or a particular website), and to crawl them. This is done by reading HTML, and by following any URLs found within the traditional <a href=” “> snippet. 

Once the Crawler has found its content, it sends that content to the Indexer, whose job it is to render the content at that particular URL (or set of URLs), and to make sense of the page(s). This process depends on many things such as page layout and PageRank (which Google does still use internally to determine a URL’s authority), as well as executing JavaScript.

This relationship can be cyclical, as the Crawler sends information to the Indexer, and then the Indexer may send information back to the Crawler as it discovers new URLs by rendering the page / executing JavaScript. The Indexer also helps the Crawler prioritize URLs based on what it determines to be high-value URLs. This affects how often the Crawler visits your website, and which pages it chooses to Crawl. 

So, they feed each other.

Crawling & Indexing JavaScript

When it comes to the question of whether Googlebot can crawl and index JavaScript, we have to keep in consideration the two separate processes of the Crawler and Indexer. 

At the end of the day, the short answer is that Google can and will crawl and index JavaScript pages. However, it is not as straightforward as that. 

To understand this further, we must separate the questions: 

  • Can Google crawl JavaScript? No.
  • Can Google index JavaScript? Yes. 

This is because Googlebot (i.e. the Crawler) can only really handle HTML and CSS – traditionally built pages and code. It must rely on Caffeine (i.e. the Indexer) to actually render the JavaScript before it can crawl your URLs and send them back to the Indexer to be prioritized and evaluated. 

This process is outlined in Google’s two-wave process for JS rendering and indexing, as shown below: 

1: thesempost.com

Overall, this makes the process of crawling and indexing a JavaScript site extremely inefficient and slow.

This is because on JavaScript sites (which use client-side rendering vs. server-side rendering), most (or all) internal links are not actually part of the HTML source code – what is handed to the Crawler initially, is mostly a blank HTML document with a large JS bundle (which takes a long time to download). 

Possible Implications

So, looking at Google’s model, in CSR there’s nothing for Google to index in the source code during its first wave. And the second wave may occur hours or even a few weeks later, leaving you at risk for a partially indexed site.

The other risk factor is crawl budget. It’s not a new concept that Google does not have infinite patience; it only crawls a certain number of content / URLs on any given site. This is because it does not have unlimited resources. 

Since JavaScript websites add an extra layer of complexity to the process of crawling and indexing, there are inherent risks to your website, such as mismatched content priority (due to perceived lack or surplus or internal links pointing to that particular URL), or again – a partially crawled and indexed website.

Also, this is only for Google, which lucky for us has the capability to even do anything with JavaScript-based sites. Other search engines like Bing, Yahoo, DuckDuckGo, and even Baidu, do not have the same capabilities, and in most cases it’s been found that JavaScript-based pages are not even indexed, due to those search engine’s indexers not being as sophisticated or powerful (remember, rendering JavaScript requires much more electricity and processing power). 

2: moz.com

So, if there’s any consideration at all for other search engines, know that your website could be at risk for real limitations in other networks. 

Exploring Solutions

In addition to Pedro’s tweet mentioned at the beginning of this article, there have been other studies that show damage to rankings and organic traffic for sites that switch over to SPAs / other JavaScript-based technologies (like on Hulu.com for instance), and there have been others that show significant improvements when a different approach was adopted, or when JavaScript reliance was dismissed. So, we have to be careful.

The good news is that there are two main solutions which seem to help mitigate the negative implications of migrating to a JavaScript based site:

  • Isomorphic JavaScript / Isomorphic applications (sometimes called “universal applications)
  • Pre-rendering

Isomorphic JavaScript is the solution actually recommended by Google. Here’s an explanation of both solutions though, from this article:

1. Pre-renderingEssentially consists of listening and sending a pure HTML snapshot to the search engine bot when it requests your page. This ensures that the user can still enjoy the fast speeds provided by CSR, while also serving the search engines the HTML content needed to index and rank your pages.

2. Isomorphic JavaScriptRecommended by Google, this option consists of both client and search engines receiving a pre-rendered page of indexable HTML content at the initial load (essentially acting like SSR). All of the JS functionality is then layered on top of this to provide the fast client-side performance. It also works best for both users and search engine bots…

To add a bit more context:

Pre-rendering can be helpful, but are some possible pitfalls, such as having to manage and maintain another piece of software on your server, and there can sometimes be compatibility issues which cause the HTML output to be incorrect. These issues do not happen to every site and using reliable sources like https://prerender.io/ can help; however, it is something to keep in mind.

Isomorphic applications are considered the best of both worlds – the crawlability and indexability of HTML, with the speed of JavaScript. It allows the Crawler to see the same output the browser sees due to the content being rendered and available when the search engine accesses the page. React JS is a framework that supports Isomorphic approach, so as long as your engineers have the skillset and bandwidth, I’d recommend this solution if possible. It is known to be the best for SEO purposes.

In general, even if the preferred Isomorphic JavaScript is not able to be fully implemented, I believe that the best solution should be a mix of SSR and CSR. This can allow for the initial HTML to be generated on the server while providing an interactive experience to the user. In other words, ideally there would be some level of SSR involved here.

Why an SSR/CSR Hybrid Approach?

At the end of the day, the SEO game is about efficiency, and readability. So, it’s important to ensure that Googlebot and Caffeine can be highly efficient when processing your website, and that they can easily (and quickly) see the actual content, and related markup, that is on each page (through server-side rendering). 

This means taking steps so that imperative content is loaded and presented to Google within 5 seconds. This is why an approach like Isomorphic JavaScript, or at least Pre-rendering, can help preserve SEO while enabling the inherent benefits of SPAs like speed and flexibility. 

Another thing to consider when migrating your site to JS is to maintain all unique/ static URLs for pages, rather than using a pushstate method that generates a # in the URL when new content is loaded (vs just creating a new URL). Why? Because as you know, it’s important for SEO that each page have its own “real” URL which can be indexed. This allows those pages to build up authority, gain backlinks, and gives them an opportunity to rank for certain topics. 

So, all links should still include the “href” parameter so that Google can pick up those links (vs. relying solely on the onClick DOM event, which Google likely will not follow).

Other helpful tools for improving the SEO of an SPA/React Website include: 

  • React Router v4 (which will allow you to maintain an SEO-friendly URL structure for your website)
  • React Helmet (which will allow you to at least manage the metadata of a web document being served by React components. It’s described as “A document head manager for React.”)

If you’re interested in learning more and have a team of developers / programmers ready to jump in, this may also be a good resource for engineers as they seek to maximize performance and SEO through an SPA, as it outlines several technologies used in this hybrid approach: https://blog.digitalkwarts.com/server-side-rendering-with-reactjs-react-router-v4-react-helmet-and-css-modules/ 

QAing after you migrate to JS

If you’ve already migrated to JS or you’re about to, be sure you QA afterwards to ensure that your site is still able to be rendered. 

Going back to Pedro’s tweet, he left this suggestion to a question about knowing whether your JS is rendering properly: 

You could also check your outlinks to see how many show up with JavaScript enabled vs. disabled as this would indicate whether or not these are able to be detected by a basic crawler. Both Chrome and most crawling tools (like DeepCrawl or ScreamingFrog) should allow you crawl with or without JS, so this should come in handy when testing. 

So – what say ye? Do you have a love/hate relationship with JavaScript / React frameworks? Have you been impacted, or seen a site impacted, by a migration to JS? Share in the comments, or hit me up at @mhankins12 on Twitter!