What is Arquivo.pt – the Portuguese Web Archive?

Arquivo.pt – the Portuguese Web Archive is a research infrastructure that enables search and access to files archived from the web since 1996. Its main objective is the preservation of information published on the Web for research purposes.

1. What is it for?

It can be used for:

  • searching information from the past that is no longer available on the Web.
  • providing research resources, for instance, in the fields of History, Sociology or Linguistics.

2. What motivated its creation?

After 1 year, only 20% of a set of addresses remain valid (Ntoulas, 2004). That is, 8 out of 10 of the pages that you saved on your browser Favorites will be lost after 1 year.

The amount of information that is published solely on the web has grown dramatically over the past few years. However, not long after it has been published, a large amount of this information ceases to be available online and is irrevocably lost.

If we wish future generations to have access to this information, it is important to archive and preserve what is published on the web.

3. What is the difference between the Portuguese Web Archive and the Internet Archive?

The Portuguese Web archive provides:

  • comprehensive crawls of the Portuguese Web
  • search by term and address (URL)
  • possibility of automatic computation of the archived data for research purposes

The Internet Archive:

  • collects contents worldwide and partially the Portuguese Web
  • only allows search by address (URL)

4. Do you have any published statistics regarding the Portuguese Web?

Yes. The characteristics of the Portuguese Web were studied from a crawl performed in 2008. Several scientific papers have been published.

5. Is it possible to access the data for research purposes?

Yes. If you want to perform studies on the archived data, feel free to contact us.

7. Another question?

If you have not found the answer to your question, feel free to contact us.