Webmaster's guide
How can I make my site appear in the search results?
Several factors determine whether and where a particular website will appear in the search results. These factors can include the number of sites that have links to that particular site, and the content of the pages.
In addition, the Exalead search results are organized in order of relevance for each user query. Therefore the position of a site will change according to the search terms entered.
If your site is new and not linked to from other sites, you can use our submission form to submit your site so that it may be indexed and therefore included in our search results.
How do I submit my site to the Exalead search engine?
If your site is linked to from other sites already included in Exalead search index, you don't need to do anything. At the time of our next indexing, your site will be 'crawled' (indexed) and added to our engine. If your site is new and not linked to from other sites, use our submission form to submit your site.
Why does "Exabot" crawl my site?
"Exabot" is the User-Agent of Exalead's robot. Its role is to collect and index data from all around the world to supply our search engine. The Exabot agent crawls your site so that its content may be included in our main index, and hence included in our search results pages.
Does the Exalead robot respect the rules recorded in a robots.txt file and the robots META tag?
Yes. Exalead's Exabot robot fully complies with robots.txt and robots meta tag standards. Please visit robotstxt.org for more information regarding these specifications. Exalead also supports the special characters * and $, which were not included in the initial specification.
Robots.txt standards:
- To prevent indexing of pages from a particular directory (for example, football), enter the following in your robots.txt file:
User-agent: Exabot
Disallow: football - To prevent indexing of a particular file type (.gif, par exemple), enter the following in your robots.txt file:
User-agent: Exabot
Disallow: *.gif$ - To prevent indexing of dynamic pages, enter the following in your robots.txt file:
User-agent: Exabot
Disallow: *?
Robots META tag standards:
- To prevent robots from indexing a page on your site, place the following META tag in the section <HEAD> of the page:
<meta name="ROBOTS" content="NOINDEX, NOFOLLOW">
-
To authorize robots to index a page but to instruct them not to follow external links, use the following tag:
<meta name="ROBOTS" content="NOFOLLOW">
Does your robot limit its bandwidth while crawling a site?
Yes, using several methods:
- We respect a delay of three seconds between pages.
- We use the "Last modified / If modified since" mechanism to refresh static content, if this function is supported by your server.
- We use the "ETag / If no match" mechanism to refresh dynamic content, if this function is supported by your server.
- We use HTTP compression technologies (gzip/deflate) to reduce the number of octets transferred to around one fifth of the file size, if this function is supported by your server.
- We limit the bandwidth used while crawling multimedia files such as MP3.
- We use detection algorithms to avoid using bandwidth for indexing non-relevant multimedia content.
You can also specify a desired crawl-delay by adding the following text to your robots.txt file:
User-agent: Exabot
Crawl-delay: 10
While you can regulate the crawl-delay according to your needs, remember the longer the crawl-delay specified, the slower your site will be indexed.
What types of documents are crawled by the Exalead robot?
The robot crawls HTML content, and popular office file formats (.pdf, Word, Excel, Powerpoint, Corel WordPerfect, Open Office and Rich Text Format), Shockwave Macromedia Flash, and other multimedia content.
How may I exclude my site from being crawled by the Exalead robot?
Create a simple text file named robots.txt, type in the following rules, and place the file in the root dircectory of your site:
User-agent: Exabot
Disallow: /
For more information, please refer to question 4.
How may I protect certain parts of my site from being crawled by the Exalead robot?
Type the following rule into your robots.txt file, using "football" as the name for an example directory:
User-agent: Exabot
Disallow: /football
For more information, please refer to question 4.
How may I protect only certain pages of my site from being crawled by the Exalead robot?
To protect specific pages from being crawled by the Exalead robot, a special META tag must be used between the "head" tags in the HTML files of your site.
- If you do not wish the Exalead robot to follow specific links in a page of your site but still wish for the page to be indexed, you must add the following META tag: <meta name="robots" content="nofollow">.
- If you do not want a particular page to be indexed, but you want the links of that page to other pages on your site to be followed, you must add the following META tag: <meta name="robots" content="noindex">.
- You may combine these two tags to preclude indexing of both, the page and the included links, as follows: <meta name="robots" content="nofollow,noindex">.
Visit robotstxt for more information.
How may I request that the Exalead robot refresh its index of my site?
This is completely automatic, and will be done the next time our robot crawls your site. However if you wish to speed up this process, submit the page using our site submission form.
How do I remove a page from the index?
If a page is no longer indexable (because it is either in error or excluded by the robots.txt file), it will be removed the next time the engine crawls the site. To accelerate the process, you can submit the page to the engine using our site submission form, and it will be removed from the index at that point.
You can also use an HTML meta tag to prevent the Exalead robot from indexing certain pages. This tag is placed in the <head> section of your page.
- To prevent robots from indexing a page on your site, place the following meta tag in the <head> section of the page:
<meta name="ROBOTS" content="NOINDEX, NOFOLLOW">
- To allow robots to index a page, but to instruct them not to follow external links, use the following tag:
<meta name="ROBOTS" content="NOFOLLOW">
How do I indicate to Exalead that my site has moved?
Add a permanent redirect 301 from every page of your previous site to the new one. When Exabot refreshes your site index, it will remove the links from the previous site and replace them with the links to the new one. If you want to accelerate the process, you can submit the old and the new home page to the search engine using our site submission form.
How do I test my robots.txt file?
You can use our robots.txt analysis tool to:
- Check specific URLs to see if your robots.txt file allows or blocks them.
- See if Exabot had trouble parsing any lines in your robots.txt file.
- Test changes to your robots.txt file.
This tool allows you to verify if your robots.txt file excludes or permits access to certain URLs. With each test the file is refreshed, therefore you can test in real time. By contrast the 'live' version of the file used by the robot (the version on your web server) is only refreshed once per day. Therefore, anticipate a slight delay before changes made to the live version are effected.
Why does the thumbnail preview image for my site look strange?
The rendering tool we use to create thumbnails is similar to that of the Safari browser (the KHTML HTML layout engine). If your site does not render well in Safari, the thumbnail generated by Exalead may likewise not render well. We recommend you optimize your site for display in Safari/Konqueror so that the thumbnail generated will be of the highest quality possible.
In addition, the Exalead thumbnail generator cannot yet interpret Flash files. If your site uses Flash and the thumbnail is not appearing as you desire, we recommend you specify an alternate image to display for browsers which do not support Flash. Our engine will use this image to create a thumbnail. To provide an alternate image, add an "img" tag after the "embed" tag within the "object" tag containing you Flash animation. See the example below. For further information, please refer to your Flash documentation.
For example:
<object classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,29,0" width="530" height="80">
<param name="movie" value="media/movie.swf" />
<param name="quality" value="high" />
<embed src="media/movie.swf" quality="high" pluginspage="http://www.macromedia.com/go/getflashplayer" type="application/x-shockwave-flash" width="530" height="80"></embed>
<img src="media/image.jpg" alt="Our product showcase"></img>
</object>