Search engines use “crawlers” to discover pages on websites. These pages are then scored by hundreds of signals to determine if they should be “indexed” and where they should be “ranked” in the index for a given keyword.
Most technical SEO audits don’t analyze your website’s crawl, as off the shelf SEO tools don’t gather the proper data. This is a huge mistake, particularly for larger websites.
Mismanaging your website’s crawl can lead to:
- Decrease in your average daily crawl allocation
- Decrease in your indexation rate, and ultimately
- Decrease in your rankings and organic traffic
This webinar will explore the importance of managing your website’s crawl for increased visibility in search engines.
What is a Crawl Budget?
Google’s Gary Illyes noted there’s no “official” definition of crawl budget in a Google Webmaster blog post:
“Recently, we’ve heard a number of definitions for ‘crawl budget’, however, we don’t have a single term that would describe everything that ‘crawl budget’ stands for externally.”
He goes onto break the budget itself into three components:
1. Crawl rate limit
The number of simultaneous connections Googlebot may use and also the time it waits between each crawl.
2. Crawl health
If a site responds quickly to Googlebot, the bot will see the site as healthy and can increase the crawl rate. Conversely, if the site responds poorly, the crawl rate can go down.
3. Crawl demand
If a site doesn’t have much content, or new content being added, or doesn’t have content linked to, the demand for Google to crawl that site reduces.
You can hopefully see there are some things we can have some influence over here and that some of the activity you are doing already has a bigger impact on your ability to rank you than thought.
- Popularity – URLs that are more popular, tend to be crawled more often to maintain freshness
- Staleness – Google’s attempt to prune pages that no longer provide user value
What is my site’s crawl budget?
We can use Google Search Console (or Bing Webmaster Tools) to get the data we need.
Just log in to Search Console > Domain > Crawl > Crawl Stats to see the number of pages crawled per day.
How to read Crawl Stats reporting in GSC
Google doesn’t give any guidance on what these reports mean, so let’s take a look at 3 client websites to give some context:
1. Website with a stable crawl
The website above is experiencing a steady, healthy crawl from the Googlebot. The statistics to the right show the variance depending on daily crawls. These variances are normal, there’s generally a large swing between “high” and “low”.
2. Website with a crawl that dropped suddenly
The website above experience a drastic drop in pages crawled after they misused a directive in their Robots.txt file. The directive told search engines NOT to crawl a large number of pages on their website, causing a sudden drop in pages crawled. In this instance, this was a bad move, as the pages they blocked had value to searchers.
3. Website with a increasing crawl
The website above is experiencing an increase in pages crawled per day due to an influx of inbound links from authority websites (causing Googlebot to visit the site more). You could also see an increase in pages crawled by passing equity to pages using internal links OR publishing more content on your website.
What factors affect my site’s crawl budget?
Any of the following factors could have a negative impact on your crawl budget.
a. Faceted Navigation and Session Identifiers
For larger sites, this is a way to filter and display certain results. While great for users – for Googlebot not so much. This creates lots of duplicate content issues, sends Google the wrong message, and can mean Googlebot crawls pages we don’t want it to.
b. On-site duplicate content
This can occur pretty easily on custom built eCommerce websites. For example:
This can easily happen on custom built eCommerce websites – the solution is to use canonicalization or dynamic url parameters.
Watch Google’s guidance on configuring URL parameters
c. Soft error pages
Having Google see a 200 (OK) status returned for a page that doesn’t exist and that should be a 404 error. Google doesn’t want to waste time crawling these pages and over time, this can negatively impact how often Google visits your site.
You can check for soft errors in Google Search Console Crawl Errors report.
d. Avoid infinite spaces
These happen when links go on and on. Google gives the example of a calendar with a “Next Month” link. Googlebot could theoretically follow those links forever.
e. Low quality and spam content
Google wants to show users the best search results possible. Having a lot of low quality, thin or spammy content will cause Google to lose trust in your website.
How can I audit my website’s crawl budget?
The best way to audit your site crawl budget is to analyze your log files. Your server logs should be readily available but can normally be accessed via cPanel, Plesk or FTP.
Downloading your log files you can use a tool such as botify which can help you understand where Googlebot is spending most of its time on your site.
You are looking to find out:
- Which pages get the most hits/ requests
- Which pages return errors
- Which pages get the least hits/ requests
Once you have identified these pages and errors, we can start to look at increasing our crawl budget.
How can I increase my website’s crawl budget?
There are a number of things you can do to increase your crawl budget and are both onsite and offsite elements. Make sure you check all of these methods to increase your crawl budget.
a. Reduce crawl errors
The most common website errors are:
- Broken pages – 404 error pages
- Pages that time out
- Misused or not used directive tags – No Index / Canonical / Redirect
- Blocking pages
- Faulty mobile device sniffer – serving desktop pages to mobile users.
How to find them
- Use a tool such as ScreamingFrog, SEMRush or AHREFs to find and fix your broken site links and check for any errors you can repair.
How to fix them
- Visit timed out pages to see what is causing them to fail – third-party embed, image sizes – repair or remove.
- Check all your directive tags at a page level to ensure you have the right tags in the right page. If you are using a plugin such as YoastSEO, make sure it is configured correctly.
- Ensure you recreate any sitemaps and resubmit them to your GSC.
- Check your site renders correctly on mobile devices.
b. Reduce Redirect Chains
Redirect chains stop the flow of link equity around a site and dramatically reduces Google crawl rates.
Matt Cutts in one of his webmaster help videos back in 2011 recommended 1-2 hops as ideal, three if needed, but suggested that once you got into the 4-6 range, the success rate of Googlebot following those redirect hops was very low.
How to find
- Use your Log analyser to find redirects and see where they redirect too.
- Create a list of URLs that redirect to another URL that then also redirects. Botify has a great tool for doing this.
How to fix
- Once you have this list, you will be able to remove the chains by changing the redirect on the first URL to the last URL in the chain.
c. Better use of Robots.txt
Make sure you are using your robots.txt file to block parts of your site you don’t need Google to waste its crawl budget on.
These could include:
- Query parameters
- Subdomain or development pages
- Low-value pages – PPC pages, user forum profile pages.
d. Increase your website’s speed
As Google touched on above, site speed is a factor in your crawl health – the faster your site, the better crawl health you will have and the more Google will crawl your site. You can see that Google has highlighted this as a major reason for reduced crawl budget.
How to find issues
- Use a tool such as SEMRush, Pingdom or Google Site Speed Tool and figure out how you can make sure your site responds faster.
How to fix issues
Some of the major elements to investigate will be:
- Browser caching – you can use a plugin such as W3 Cache, WP Optimise, or SuperCache
- Image and code compression – make sure images are optimized before uploading and run WP Smush
e. Dynamic URL parameter settings
For those with eCommerce platforms that have dynamic URLs, Googlebot treats any dynamic URLs that lead to the same page as separate pages. This means you may be wasting your crawl budget by having Google to crawl the same page over and over as it thinks it is a different page.
How to find
- With most big eEommerce sites you will see your site has parameter settings. Just look at the URLs in the address bar.
- You might not be able to work around that from a display point of view, but you can tell Google what to ignore and how to handle dynamic pages.
How to fix
- You can manage your URL parameters by going to your Google Search Console and clicking Crawl > Search Parameters.
- From here, you can let Googlebot know if your CMS adds parameters to your URLs that doesn’t change the page’s actual content.
f. Get more Backlinks
Backlinks not only help to improve your organic rankings, they push search crawlers to your site more often.
Based on our data (average of FTF client data)
- 55% increase in organic traffic
- 87% increase in keyword rankings
- 38% increase in Leads and Revenue.
Wrapping it Up
While this is a part of technical SEO, it is something that gets overlooked by lots of SEOs. Crawl budgets as you can now see are a big part in making sure your site gets the crawl it needs.
If you don’t have the right amount of budget or are using it up sending Google to the wrong pages, your site will never drive you the traffic it should. As you can see from the above, this stuff works. And it can be a great way to unblock a site’s ability to drive organic search traffic.