A first attempt to archive the .EU domain

Last updated on August 1st, 2019 at 02:35 pm

News updated on August 1, 2019.

Arquivo.pt (the Portuguese web-archive) performed an experiment to preserve .EU web sites.

The .EU domain is commonly used to reference sites related to Europe. The strategy adopted to archive the World Wide Web has been delegating the responsibility of each domain to the respective national archiving institutions. However, the .EU domain fails to fit in this model because it covers multiple nations. Thus, the preservation of .EU sites was not been yet assigned and undertaken by any institution.

RESAW is an European network that aims to create a Research Infrastructure for the Study of Archived Web Materials (resaw.eu).

Arquivo.pt performed a first attempt to crawl and preserve web sites hosted under the .EU domain within the scope of RESAW activities. This first crawl began on the 21 November 2014 and finished on the 16 December 2014.

We performed 2 more crawls of the .EU domain. All the crawls were indexed and became searchable through Arquivo.pt one year after its finish date. Moreover, we made available a prototype that enables focused search over the .EU crawls which demonstrates the simplicity of creating search engines that targeted specific collections through the usage of the “collection” search parameter.

Collaborations with researchers interested on studying the collected web data sets or crawl logs are welcome.

To know more

2nd best paper at LA-Web

Last updated on December 20th, 2019 at 02:49 pm

The paper “Trends in Web characteristics” authored by the Portuguese Web Archive team received an award at the conference LA-Web 2009.

The paper Trends in Web characteristics written by João Miranda and Daniel Gomes, was distinguished as the 2nd best paper presented at the 7th Latin American Web Congress.

The paper presents results on the evolution of Web characteristics, derived from the comparison of a crawl performed by the PWA in 2008 with previous studies.

rARC Linux version now available

Last updated on December 20th, 2019 at 02:55 pm

rARC version for Linux is now available.

RARC is a pioneer system being developed within the Portuguese Web Archive project. Its main goal is to enable Internet users to provide storage space from their computers to help preserve web contents for the future. Anyone can contribute to preserve the web, by providing a little amount of space to keep a backup of a small part of the archived data.

This version was successfully tested on the following distributions:

  • Fedora 10, 11 (Gnome)
  • Ubuntu 9.04, 9.10 (Gnome)

Please contact us if you find that this version is compatible with other distributions or if you find any problem.

rARC Windows 7 version now available

Last updated on December 20th, 2019 at 03:04 pm

rARC version for Windows 7 is now available.

RARC is a pioneer system being developed within the Portuguese Web Archive project. Its main goal is to enable Internet users to provide storage space from their computers to help preserve web contents for the future. Anyone can contribute to preserve the web, by providing a little amount of space to keep a backup of a small part of the archived data.

Please contact us if you find any problem.

Search over the past Web is available

Last updated on September 29th, 2017 at 03:25 pm

The Portuguese Web Archive released a service that enables search and access to web contents that are no longer available online.

Portuguese Web Archive

This beta version of the service includes 130 million contents of the Portuguese web archived between 1996 and 2007.

It enables advanced search options, such as, date range restrictions.

Please send us your comments and critiques. They are most welcome.

Search the past now!

2005 contents provided by the National Library can now be searched

Last updated on December 20th, 2019 at 03:41 pm

The contents provided by the National Library of Portugal were successfully integrated and can be searched through our experimental search system.

In 2005 the National Library of Portugal in collaboration with INESC conducted a series of web crawls to gather information related to the national elections.

This project was named RECOLHA and the data collected comprised over 14 million contents (165 GB).

In 2009 these data was supplied to the Portuguese Web Archive.

It was successfully integrated and now it can be searched through our experimental search system.

Notice, that searches are performed over all the archived data, independently from their origin, so you will not be able to identify the RECOLHA contents.

Scientific study about Web accessibility conducted in collaboration with the Archive

Last updated on December 20th, 2019 at 03:44 pm

This research, conducted by the HCIM group from the University of Lisbon in collaboration with the Portuguese Web Archive, presents a measurement of the Portuguese Web accessibility for people with disabilities.