Main objectives of the Portuguese Web Archive (Arquivo.pt)

Objectives of the Portuguese Web Archive

The creation of a Portuguese Web Archive represents a historic milestone in the preservation of knowledge for future generations. With the creation of a system that supports regular crawls of the Portuguese web, its long term storage and access, we intend to provide the following services:

  • Term search over the archived contents: it will enable the identification of archived contents over the years that contain certain terms;
  • URL search over the archived contents: it will allow to identify several versions of a content gathered from a given URL;
  • Historical collections of web contents for research purposes: the web has information about the most various subjects reflecting society changes across time. Researchers from different fields use the web as a source of information for their studies. Providing web collections will enable these researchers to store and process web data locally on their computers without having to crawl the web themselves;
  • Archived data parallel processing system: it will allow researchers to execute their programs over the archived web data using several computers in parallel.

We also want to achieve the following goals:

  • Train human resources in web archiving to enable the maintenance of the system in the future;
  • Export know-how, experience and technology in web archiving to other countries, specially the Portuguese language ones;
  • Contribute to increase the number of domains registered under .PT, the free historical archiving of the information published under this domain could be an additional motivation for registrars;
  • Publish scientific and technical papers that enable the sharing of the acquired knowledge and receiving feedback from the community regarding the work performed.