Lend a little disk space to preserve the Web

Last updated on August 5th, 2024 at 11:39 am

Anyone can contribute to preserve the information published on the Web. You just need to install a simple application on your computer.

The information published on the web is a resource of great historical value that must be preserved for future generations.

RARC is an application that enables anyone to contribute to preserve the web, by providing a little bit of disk space to store a backup copy of the archived information.

This way, if information is lost from the central repository due to, for instance, a natural disaster, it can be retrieved from the computers of the people that installed rARC.

RARC includes a screen saver that presents examples of archived pages. You can uninstall rARC or reduce the donated space whenever you want.

Comments and suggestions are most welcome.

The Portuguese Web Archive publishes a top of the most generous contributors and the approximate location of the backup copies.

Start contributing to rARC now!

Thank you for your support.

Paper presented at EPIA 2009

Last updated on August 4th, 2024 at 06:03 pm

An Updated Portrait of the Portuguese Web presented at EPIA 2009

The paper An Updated Portrait of the Portuguese Web, by João Miranda and Daniel Gomes, was presented at the 14th Portuguese Conference on Artificial Intelligence (EPIA 2009) in Aveiro.

This paper presents a characterization of the Portuguese Web derived from a crawl performed by the Portuguese Web Archive in March 2008, with 48 million documents in 2.5 TB of amount of data.

Paper will be presented at LA-Web on November 11

Last updated on August 5th, 2024 at 12:20 pm

The paper Trends in Web characteristics will be presented at LA-Web 2009, on November 11, 2009.

The paper Trends in Web Characteristics, by João Miranda and Daniel Gomes, will be presented at the 7th Latin American Web Congress (LA-Web), in Merida, Mexico.

This paper presents trends in the evolution of the Web derived from the analysis of 3 characterizations performed within an interval of 5 years. The Web portion used as a case study was the Portuguese Web. Several metrics regarding site and content characteristics were analyzed.

Session at ISCTE “Archive.pt as an infrastructure for research in Social Sciences and Humanities

Last updated on August 4th, 2024 at 06:05 pm

Session at ISCTE (Lisbon) “Archive.pt as an infrastructure for research in Social Sciences and Humanities”

You missed it?

No problem. Here are all the presentations:

Portuguese Web Archive – a Memory Infrastructure @DLM2014

Last updated on August 4th, 2024 at 06:08 pm

Presentation about the Archive.pt service and the importance of web archiving to preserve the memory of Humanity.

Presentation on Thursday 17:15 (13 November) in Lisbon at DLM Forum – Making the Information Governance Landscape in Europe

The Forum will be held at Instituto Superior Técnico.

@dlmforum2014 #DLM2014

Ipres’2013: Five new communications about the Portuguese Web Archive

Last updated on December 20th, 2019 at 05:22 pm

The Portuguese Web Archive activities were presented during Ipres’2013, a scientific conference on the preservation of digital objects, that took place in Lisbon from 2 to 6 September.

System administrator for the Portugese Web Archive

Last updated on September 28th, 2017 at 01:12 pm

The Portuguese Web Archive (www.archive.pt) is looking for an administrator for its large-scale distributed system.

The system administrator we are looking to strengthen our team will be in charge of operating and maintaining the quality of the results provided by a web archive system composed by more than 60 servers.

Requirements

  • Higher education on Computer Engineering.
  • Experience on design, operation and administration of large-scale distributed systems exposed on the Internet.
  • Technical knowledge on Apache HTTP Server, Apache Tomcat, Java and Linux.
  • Experience on technologies to monitor and manage distributed systems (e.g. Nagios, Cacti, Ganglia, Rex, Puppet, Chef, SpaceWalk, Jenkins).

Experience preferences

  • Distributed processing (e.g. Hadoop, HBase).
  • Web informationm search (e.g. Apache Solr, Lucene);
  • Web archiving tehcnologies (e.g. Heritrix, Wayback Machine, NutchWAX);
  • Software management platforms (ex. Selenium, SonarCube, Ant, Maven, Git, SVN).
  • Load balacing and fault tolerance tools (e.g. LVS).

We appreciate your help disseminating this offer.

It’s now easier to download the PWA software.

Last updated on August 5th, 2024 at 11:30 am

All the source-code, binary and documentation files can now be downloaded through a set of large compressed files.

The software developed to create the Portuguese Web Archive is available as a free open-source project hosted at Google Code under the name pwa-technologies.

The files below contain a dump of the PWA-technologies software and were created to facilitate the distribution and preservation of the developed software. Please feel free to join us to contribute to improve this software:

We hope this project can be useful to enable access to our digital memory.