The 1,000 meters run (with steeplechase) – we are checking indexation speed. For this competition, I presented five similar data structures. Each of them had 1000 subpages with unique content and additional navigation pages (e.g. other subpages or categories). Below you can see the results for four running tracks.
This data structure was very poor with 1,000 links to subpages with unique content on one page (so 1,000 internal links). All SEO experts (including me…) repeat it like a mantra: no more than 100 internal links per page or Google will not manage to crawl such an extensive page and it will simply ignore some of the links, and it will not index them. I decided to see if it was true.
I must admit that I was disappointed by the results. I was very much hoping to demonstrate that the silo structure would speed up the crawling and the indexation of the site. Unfortunately, it did not happen. This kind of structure is the one that I usually recommend and implement on websites that I administer, mainly because of the possibilities that it gives for internal linking. Sadly, with a larger amount of information, it does not go hand in hand with indexation speed.
Nevertheless, to my surprise, Googlebot easily dealt with reading 1,000 internal links, visiting them for 30 days and indexing the majority. But it is commonly believed that the number of internal links should be 100 per page. This means that if we want to speed the indexation up, we should create website’s maps in HTML format even with such a large number of links.
At the same time, classic indexation with noindex/follow is absolutely losing against pagination with the use of index/follow and rel=canonical directing to the first page. In the case of the last one, Googlebot was expected not to index specific paginated subpages. Nevertheless, from 100 paginated subpages, it has indexed five, despite the canonical tag to page one, which shows again (I wrote about it here) that setting canonical tags does not guarantee avoiding the indexation of a page and the resulting mess in the search engine’s index.
In the case of the above-described test, the last construction is the most effective one for the number of pages indexed. If we introduced a new notion Index Rate defined by the proportion of the number of Googlebot visits to the number of pages indexed, e.g., within 30 days, then the best IR in our test would be 3,89 (running track 5) and the worst one would be 6,46 (running track 2). This number would stand for average number of Googlebot’s visits on a page required to index it (and keep it in the index). To further define IR, it would be worth verifying the indexation daily for a specific URL. Then, it would definitely make more sense.
One of the key conclusions from this article (after a few days from the beginning of the experiment) would be demonstrating that Googlebot ignores rel=next and rel=prev tags. Unfortunately, I was late to publish those results (waiting for more) and John Muller on March 21 announced to the world that indeed, these tags are not used by Googlebot. I am just wondering whether the fact that I am typing this article in Google Docs has anything to do with it (#conspiracytheory).
It is worth taking a look at pages containing infinite scroll – dynamic content uploading, uploaded after scrolling down to the lower parts of the page and the navigation based on rel=prev and rel=next. If there is no other navigation, such as regular pagination hidden in CSS (invisible for the user but visible for Googlebot) we can be sure that Googlebot’s access to newly uploaded content (products, articles, photos) will be hindered.
No comments:
Post a Comment