Crawler-friendly homepage

Last updated on August 1st, 2017 at 01:52 pm

To enable the archive of a site, it is fundamental that the site presents a crawler-friendly homepage.

The Portuguese Web Archive crawler archives the web by crawling the homepages of sites (e.g. http://www.fccn.pt) first and then following links to the remaining contents.

If the crawler cannot process the homepage of a site, it will not be able to find the links to other contents. Therefore, to create crawler-friendly homepages:

  • Use preferentially the HTML format;
  • Ensure that every content can be found by following links from the homepage;
  • Do not create homepages composed exclusively by images or animations (e.g. Flash). If you must create a homepage of this kind, there should be an alternative version of the homepage in HTML format.