Arquivo.pt – the Portuguese web-archive is a research infrastructure that enables search and access to files archived from the web since 1996. Its main objective is the preservation of information published on the Web for research purposes.
1. What is it for?
It can be used for:
- searching information from the past that is no longer available on the Web.
- providing research resources, for instance, in the fields of History, Sociology or Linguistics.
2. What motivated its creation?
After 1 year, only 20% of a set of addresses remain valid (Ntoulas, 2004). That is, 8 out of 10 of the pages that you saved on your browser Favorites will be lost after 1 year.
The amount of information that is published solely on the web has grown dramatically over the past few years. However, not long after it has been published, a large amount of this information ceases to be available online and is irrevocably lost.
If we wish future generations to have access to this information, it is important to archive and preserve what is published on the web.
3. What is the difference between Arquivo.pt and the Internet Archive?
- comprehensive crawls of the Portuguese Web
- search by term and address (URL)
- the possibility of automatic computation of the archived data for research purposes
The Internet Archive:
- collects contents worldwide and partially the Portuguese Web
- only allows search by address (URL)
4. Do you have any published statistics regarding the Portuguese Web?
Yes. The characteristics of the Portuguese Web were studied from a crawl performed in 2008. Several scientific papers have been published.
5. Is it possible to access the data for research purposes?
Yes. If you want to perform studies on the archived data, feel free to contact us.
6. Can I help preserve the Web?
Yes. Anyone can collaborate with Arquivo.pt:
7. Others questions?
If you have not found the answer to your question, feel free to contact us.