Blog posts from 2014

📌 Improved HTML spidering (Robots, Canonical, Rel)

Friday 28 March 2014

Improved HTML spidering (Robots, Canonical, Rel) We now now parse a number of HTML elements to better understand your website and which files should be in your sitemap. Canonical urls We now detect the link rel=“canonical” tag. Where we detect this tag and it points to another page we will not include the current page in the sitemap and will instead spider the url specified in href attribute of the tag.

📌 Text, HTML sitemaps, Robots.txt and more

Sunday 23 March 2014

Text, HTML sitemaps, Robots.txt and more This version includes some new updates that people have been asking for including a new number of sitemap formats. We recommend that you have a valid robots.txt file, an xml sitemap and an HTML sitemap in your website root folder to optimise your sitemap coverage. HTML sitemaps The great thing about an HTML sitemap is that when you publish it any search engine can deal with it whether they officially support sitemaps or not.

📌 How long does it take to generate my XML sitemap?

Friday 14 March 2014

How long does it take to generate my XML sitemap? Our spider can sometimes take a while to process your website and people ask how long they should wait. The time to generate a sitemap can vary quite dramatically from a few seconds to over 10 minutes and this can be influenced by a number of factors. Key Influencing Factors Your website performance The key limiting factor will usually be how fast your website / web server can respond to our spiders requests.

📌 Home page redirects fixed

Tuesday 11 March 2014

Home page redirects fixed Some homepage redirects were causing problems for our spider and resulted in sitemaps with no files for a small number of users. We believe this is now resolved. Thanks for the feedback.

📌 Why do you limit the number of URLs in a sitemap?

Sunday 9 March 2014

Why do you limit the number of URLs in a sitemap? We get asked this quite a lot….. The main reason is that XmlsSitemapGenerator is a free tool and generating sitemaps is not a free process. Our spider indexes thousands of pages a day utilizing lots of server resources (Memory, CPU and bandwidth) and racking up gigabytes of data as it indexes pages and builds up profiles. The overhead of the spidering process is quite large especially at busy times when we have many people generating sitemaps.

📌 Improved download and error reports

Sunday 2 March 2014

Improved download and error reports We’ve made some improvements to the sitemap download page to make it easier to download your sitemaps. Firstly we’ve made all the files available as a single zip file download. We have also added a simple table that gives you access to your XML Sitemap, RSS sitemap and a New error report.

📌 Problems creating your XML sitemap?

Sunday 23 February 2014

Problems creating your XML sitemap? Some users experience problems when they create a XML sitemap because of how their website is implemented and hosted. Here are some common problems with websites that result in inconsistent xml sitemaps. Server / performance issues Inconsistent urls / domains iFramed homepage Bad header tags Incorrect server tags Page size too large No “real” links / Non native behaviors Inconsistent behavior for different user agents / browsers Poor HTML mark up Incorrect use of character sets Incorrect modified date header Server / performance issues

📌 Improved support for character encoding and redirects

Saturday 15 February 2014

Improved support for character encoding and redirects Character encoding We’ve improved our spider so that it can cope with a wider range of character sets including Arabic and Chinese. Don’t forget that for this feature to work correctly it is important that we can understand your website encoding otherwise our spider won’t interpret it correctly and your sitemap will contain strange characters and symbols. Improved HTTP 301 redirect and 302 Moved handling Not only do we now follow HTTP 301 and HTTP 302 automatically.

📌 Spider Performance Improvements

Thursday 13 February 2014

Spider Performance Improvements Our spider was looking a little tired! At times some users were waiting a while for it to complete its job and in some particularly busy periods getting timeout errors. The good news is we have done some house keeping, clearing out millions of records, re-building, de-fragging, etc. and we are now ticking over a bit more smoothly. Don’t forget if you are having problems you can always contact us.

📌 Support for http-equiv="refresh" added and more ....

Sunday 9 February 2014

Support for http-equiv=“refresh” added and more …. On the 9th of Feb we made some minor updates….. New : Follow meta refresh tag e.g. http-equiv=“refresh” We found quite a number of websites using meta tags in their homepage to redirect to another page. Previoulsy we were not detecting this so our spider only found a hompage. We now take the http-equiv tag url and follow it. e.g. : New : Automatically follow both www.