The Importance of Web Archives for Humanities
Anyone can contribute to preserve the information published on the Web. You just need to install a simple application on your computer.
The information published on the web is a resource of great historical value that must be preserved for future generations.
RARC is an application that enables anyone to contribute to preserve the web, by providing a little bit of disk space to store a backup copy of the archived information.
This way, if information is lost from the central repository due to, for instance, a natural disaster, it can be retrieved from the computers of the people that installed rARC.
RARC includes a screen saver that presents examples of archived pages. You can uninstall rARC or reduce the donated space whenever you want.
Comments and suggestions are most welcome.
Thank you for your support.
The paper Trends in Web characteristics will be presented at LA-Web 2009, on November 11, 2009.
This paper presents trends in the evolution of the Web derived from the analysis of 3 characterizations performed within an interval of 5 years. The Web portion used as a case study was the Portuguese Web. Several metrics regarding site and content characteristics were analyzed.
Our former colleague Miguel Costa defended his PhD thesis at the University of Lisbon on the 4th November 2014. The slides and video are available!
- PhD thesis “Information Search in Web Archives”
- Presentation video
- Slides of the presentation
- Official announcement
- Bibliographic reference (bibtex)
Congratulations and thank you for this important contribution to Web Archiving!
The Portuguese Web Archive activities were presented during Ipres’2013, a scientific conference on the preservation of digital objects, that took place in Lisbon from 2 to 6 September.
The following communications about the Portuguese Web Archive were made during Ipres’2013:
- Acquiring and providing access to historical web collections
- Adapting search user interfaces to web archives
- Query suggestion for web archive search
- Information Search in Web Archives (palestra convidada)
- Web Archiving: Lessons and Potential (painel de discussão)
See all the details on our Publications page.
The Portuguese Web Archive (www.archive.pt) is looking for an administrator for its large-scale distributed system.
The system administrator we are looking to strengthen our team will be in charge of operating and maintaining the quality of the results provided by a web archive system composed by more than 60 servers.
- Higher education on Computer Engineering.
- Experience on design, operation and administration of large-scale distributed systems exposed on the Internet.
- Technical knowledge on Apache HTTP Server, Apache Tomcat, Java and Linux.
- Experience on technologies to monitor and manage distributed systems (e.g. Nagios, Cacti, Ganglia, Rex, Puppet, Chef, SpaceWalk, Jenkins).
- Distributed processing (e.g. Hadoop, HBase).
- Web informationm search (e.g. Apache Solr, Lucene);
- Web archiving tehcnologies (e.g. Heritrix, Wayback Machine, NutchWAX);
- Software management platforms (ex. Selenium, SonarCube, Ant, Maven, Git, SVN).
- Load balacing and fault tolerance tools (e.g. LVS).
We appreciate your help disseminating this offer.
All the source-code, binary and documentation files can now be downloaded through a set of large compressed files.
The software developed to create the Portuguese Web Archive is available as a free open-source project hosted at Google Code under the name pwa-technologies.
The files below contain a dump of the PWA-technologies software and were created to facilitate the distribution and preservation of the developed software. Please feel free to join us to contribute to improve this software:
We hope this project can be useful to enable access to our digital memory.
The Foundation for the National Scientific Computing, home of the Portuguese Web Archive, joined IIPC.
The International Internet Preservation Consortium (IIPC) is a worldwide consortium that joins 44 organizations from 25 countries.
The IIPC is dedicated to improving the tools, standards and best practices of web archiving while promoting international collaboration and the broad access and use of web archives for research and cultural heritage.
The Portuguese Web Archive is an innovative service based on cutting-edge technology that requires permanent investments on Research and Development activities.
Joining IIPC is a crucial milestone to establish international partnerships to enable the collaborative development of the tools used by the Portuguese Web Archive and improve the quality of the provided service.
The Portuguese Web Archive published a study that contributes with an overview of the lessons learned while developing the Portuguese Web Archive, focusing on web data acquisition, ranking search results and user interface design.
Several organizations around the world are struggling to archive information from the web before it vanishes. However, users demand efficient and effective search mechanisms to access the already vast collections of historical information held by web archives. The Portuguese Web Archive is the largest full-text searchable web archive publicly available. It supports search over 1.2 billion files archived from the web since 1996.
Invented to exchange data between scientists, the web is now used to share publications and knowledge. But how much of this will still be available in years to come?
Read the full article on page 25 of the GÉANT CONNECT magazine.