Q: I’m looking for a tool that simply counts how many pages the website has, so I can calculate the inclusion ratio, or the percentage of pages indexed in the search engine. It seems like such a basic thing, yet I’ve been literally searching for hours and can’t find what I’m looking for. Can you help?
A: What a great question. Most people just do a site: search in Google to find the number of pages on their website (for example, you could search for <site:www.yourseoplan.com> and look at the number of pages returned in Google – currently 270 pages). But if you want to find out what portion of your total pages have been indexed by Google, you’ll need another way of counting the total number of pages on your site so you can compare it against Google’s index count.
We suggest using XML sitemaps along with Google Search Console in order to calculate the inclusion ratio of pages indexed in Google. If your website platform cannot create an XML sitemap, you may want to look into using one of the sitemap tools that are available, below are some resources:
- For a small site under 500 pages, you could use the online version at http://www.xml-sitemaps.com/ or Screaming Frog.
- For larger sites, you can review other sitemap generators at: http://code.google.com/sm_thirdparty.html
If all you do is what to find out how many pages a site has, the sitemap process in this article can do that, or you can look into using a Crawler (like Screaming Frog).
When you have a sitemap created, sign into Google Search Console and follow the steps below (if you don’t have an account, create one!):
- When logged into Google Search Console, go to the Sitemaps section.
- From here, add the sitemap that you created for your website.
- If all goes well, Google will show the URLs you have submitted (see below):
- Google does not show the discovered URLs immediately, but give it a few days and Google will now give you the total pages that it found in your sitemap.
With the submitted and processed sitemap, you can go to the Coverage section in Google Search Console and find a few details on how your site is indexed in Google:
(1) Submitted and Indexed
(2) Indexed, not submitted in sitemap
These two data points will help you understand what known pages are current indexed in Google (916/973, 93%). Along with what other pages that Google may have found and decided to index that you did not know about (99!).
In addition to the above Valid and Indexed pages that Google finds, you can also find information in the Excluded tab of the Coverage Section about the indexing status of your pages. This tab can help you understand why some pages that you may have expected to see in the “Submitted and Indexed” datapoint, but where not.