Main objectives of the Portuguese Web Archive (Arquivo.pt)
Objectives of the Portuguese Web Archive
The creation of a Portuguese Web Archive represents a historic milestone in the preservation of knowledge for future generations. With the creation of a system that supports regular crawls of the Portuguese web, its long term storage and access, we intend to provide the following services:
- Term search over the archived contents: it will enable the identification of archived contents over the years that contain certain terms;
- URL search over the archived contents: it will allow to identify several versions of a content gathered from a given URL;
- Historical collections of web contents for research purposes: the web has information about the most various subjects reflecting society changes across time. Researchers from different fields use the web as a source of information for their studies. Providing web collections will enable these researchers to store and process web data locally on their computers without having to crawl the web themselves;
- Archived data parallel processing system: it will allow researchers to execute their programs over the archived web data using several computers in parallel.
We also want to achieve the following goals:
- Train human resources in web archiving to enable the maintenance of the system in the future;
- Export know-how, experience and technology in web archiving to other countries, specially the Portuguese language ones;
- Contribute to increase the number of domains registered under .PT, the free historical archiving of the information published under this domain could be an additional motivation for registrars;
- Publish scientific and technical papers that enable the sharing of the acquired knowledge and receiving feedback from the community regarding the work performed.