General informations

Arquivo.pt – the Portuguese web-archive is a research infrastructure that enables search and access to files archived from the web since 1996. Its main objective is the preservation of information published on the Web for research purposes.

1. What is it for?

It can be used for:

  • searching for information from the past that is no longer available on the Web.
  • providing research resources, for instance, in the fields of History, Sociology or Linguistics.

2. When was Arquivo.pt born?

Officially, Arquivo.pt started operating in January 2008. However, the idea of creating a Portuguese web archive came up in 2001, with a scientific project called tomba !, ran by a research group at the Faculty of Sciences of the University of Lisbon. In 2007, the Portuguese web archive was created at the FCCN.

3. What motivated its creation?

After 1 year, only 20% of a set of addresses remain valid (Ntoulas, 2004). That is, 8 out of 10 of the pages that you saved on your browser Favorites will be lost after 1 year.

The amount of information that is published solely on the web has grown dramatically over the past few years. However, not long after it has been published, a large amount of this information ceases to be available online and is irrevocably lost.

If we wish future generations to have access to this information, it is important to archive and to preserve what is published on the web.

4. How far in time does Arquivo.pt go?

Arquivo.pt has been preserving pages from 1996 on. Until 2007, content was acquired mainly from Internet Archive. After that, Arquivo.pt began to make its own collections of the web.

5. What is the difference between Arquivo.pt and the Internet Archive?

Arquivo.pt provides:

  • comprehensive crawls of the Portuguese Web
  • search by term and address (URL)
  • the possibility of automatic computation of the archived data for research purposes

The Internet Archive:

  • collects contents worldwide and partially the Portuguese Web
  • only allows search by address (URL)

6. Do you have any published statistics regarding the Portuguese Web?

Yes. The webpage Arquivo in Numbers. Also, the characteristics of the Portuguese Web were studied from a crawl performed in 2008. Several scientific papers have been published.

7. Is it possible to access the data for research purposes?

Yes. If you want to perform studies on the archived data, feel free to contact us.

9. Others questions?

If you have not found the answer to your question, feel free to contact us.