Objectives

Objectives of Arquivo.pt – the Portuguese web-archive

The creation of Arquivo.pt represents a historic milestone in the preservation of knowledge for future generations. With the creation of a system that supports regular crawls of the Portuguese web, its long term storage and access, it provides the following services:

  • Term search over the archived content: enables the identification of web-archived content over the years that contain certain terms;
  • URL search over the archived content: allows to identify several versions of a content gathered from a given URL;
  • Historical collections of web contents for research purposes: the web has information about the most various subjects reflecting society changes across time. Researchers from different fields use the web as a source of information for their studies. Providing access to historical web collections enables these researchers to store and process web data locally on their computers without having to crawl the web themselves;
  • Archived data parallel processing system: it will allow researchers to execute their programs over the archived web data using several computers in parallel.

We also aim to contribute to:

  • Train human resources in web archiving to enable the maintenance of the system in the future;
  • Export know-how, experience and technology in web archiving to other countries, specially the Portuguese language ones;
  • Contribute to increase the number of domains registered under .PT, the free historical archiving of the information published under this domain could be an additional motivation for registrars;
  • Publish scientific and technical papers that enable the sharing of the acquired knowledge and receiving feedback from the community regarding the work performed.