Improved HTML spidering (Robots, Canonical, Rel)
We now now parse a number of HTML elements to better understand your website and which files should be in your sitemap.
We now detect the link rel=“canonical” tag.
Where we detect this tag and it points to another page we will not include the current page in the sitemap and will instead spider the url specified in href attribute of the tag.
We now obey the meta tag for robots.
Where a noindex or nofollow value si detected we will not index or will stop following urls on the given page.
Anchor rel attribute
We not obey rel=“nofollow” in anchor tags
As with the meta robots tag if we detect a nofollow value we will not follow this url.