Basic HTML concepts

This page provides basic information about key HTML concepts which are helpful when it comes to sitemaps and how our spider processes your website.

Head meta tags

In the header of your page there can sometimes be tags that direct search engines and our spider. For most pages you will want it to index and follow. Our spider automatically follows these rules.

<meta name="robots" content="index, follow" />

The XmlSitemapGenerator will follow all standard links such as the below example

<a href="/mylink/page.html">this is a link</a>

You can tell the spider to only follow links with a certain file extension such as .htm although normally you would want to inlcude all extensions

<a href="/mylink/page.html">this is a link</a>

Some urls such as the below will have no file extension. Again, normally you would want the spider to follow all urls.

<a href="/help/">this is a link</a>

It will also follow any query strings such as

<a href="/mylink/page.html?pageid=121">this is a link</a>

Some websitws include a unique ID or session ID for each user in the query string. If your website site does this you should make sure this parameter is added to the ignore list. We have added some of the common ones to the default values.

<a href="/mylink/page.html?SessionId=8765434567&pageid=121">this is a link</a>

The same rules are applied to all links and urls no matter where they are, for example in image maps and framesets.

If you have a large number of links that are none html, for example images, zip and other such files, you can tell our spider to skip these.

<a href="/mylink/page.zip">this is a zip file link</a>

Image maps

When you add an image to a webpage you can add a hotspot. The spider will follow these links by default.

<img src="planets.gif" width="145" height="126" alt="Planets" usemap="#planetmap">
<map name="planetmap">
<area shape="rect" coords="0,0,82,126" href="sun.htm" alt="Sun">
<area shape="circle" coords="90,58,3" href="mercury.htm" alt="Mercury">
</map>

As with other HTML elements all frame formats will be spidered by default.

Framesets and iFrames

Framesets allow you to bring together a number of separate pages displayed as one.

<frameset cols="25%,*">
<frame src="frame_a.htm">
<frame src="frame_b.htm">
</frameset>

A similar concept is the iFrame that allows you to embed another page within a page.

<iframe src="/test/myframe.htm"></iframe>

As with other HTML elements all frame formats will be spidered.

Images

The XmlSitemapGenerator will optionally include images in your sitemap. You can include all images or select them based on whether or not their alt and/or title tag is populate.

<img src="/images/text.gif" title="My title here" alt="My alt caption here" />

As well as the alt tags you can also specify based in the image type / file extension in the same way that you can for urls.

<img src="/images/text.gif" title="My title here" alt="My alt caption here" />