Ipres’2013: Five new communications about the Portuguese Web Archive

The Portuguese Web Archive activities were presented during Ipres’2013, a scientific conference on the preservation of digital objects, that took place in Lisbon from 2 to 6 September.

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+Email this to someone

System administrator for the Portugese Web Archive

The Portuguese Web Archive (www.archive.pt) is looking for an administrator for its large-scale distributed system.

The system administrator we are looking to strengthen our team will be in charge of operating and maintaining the quality of the results provided by a web archive system composed by more than 60 servers.

Requirements

  • Higher education on Computer Engineering.
  • Experience on design, operation and administration of large-scale distributed systems exposed on the Internet.
  • Technical knowledge on Apache HTTP Server, Apache Tomcat, Java and Linux.
  • Experience on technologies to monitor and manage distributed systems (e.g. Nagios, Cacti, Ganglia, Rex, Puppet, Chef, SpaceWalk, Jenkins).

Experience preferences

  • Distributed processing (e.g. Hadoop, HBase).
  • Web informationm search (e.g. Apache Solr, Lucene);
  • Web archiving tehcnologies (e.g. Heritrix, Wayback Machine, NutchWAX);
  • Software management platforms (ex. Selenium, SonarCube, Ant, Maven, Git, SVN).
  • Load balacing and fault tolerance tools (e.g. LVS).

We appreciate your help disseminating this offer.

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+Email this to someone

It’s now easier to download the PWA software.

All the source-code, binary and documentation files can now be downloaded through a set of large compressed files.

The software developed to create the Portuguese Web Archive is available as a free open-source project hosted at Google Code under the name pwa-technologies.

The files below contain a dump of the PWA-technologies software and were created to facilitate the distribution and preservation of the developed software. Please feel free to join us to contribute to improve this software:

We hope this project can be useful to enable access to our digital memory.

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+Email this to someone

The Portuguese Web Archive joined the International Internet Preservation Consortium

The Foundation for the National Scientific Computing, home of the Portuguese Web Archive, joined IIPC.

The International Internet Preservation Consortium (IIPC) is a worldwide consortium that joins 44 organizations from 25 countries.

The IIPC is dedicated to improving the tools, standards and best practices of web archiving while promoting international collaboration and the broad access and use of web archives for research and cultural heritage.

The Portuguese Web Archive is an innovative service based on cutting-edge technology that requires permanent investments on Research and Development activities.

Joining IIPC is a crucial milestone to establish international partnerships to enable the collaborative development of the tools used by the Portuguese Web Archive and improve the quality of the provided service.

 

Related links:

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+Email this to someone

How to create a billion-scale searchable web archive

The Portuguese Web Archive published a study that contributes with an overview of the lessons learned while developing the Portuguese Web Archive, focusing on web data acquisition, ranking search results and user interface design.

Several organizations around the world are struggling to archive information from the web before it vanishes. However, users demand efficient and effective search mechanisms to access the already vast collections of historical information held by web archives. The Portuguese Web Archive is the largest full-text searchable web archive publicly available. It supports search over 1.2 billion files archived from the web since 1996.

The paper Creating a Billion-Scale Searchable Web Archive was presented on the Temporal Web Analytics Workshop 2013, in Rio de Janeiro, Brazil.

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+Email this to someone

New video: “The Portuguese Web Archive: An overview”

The Portuguese Web Archive preserves and provides access to information published on the web of main interest to the Portuguese community. It provides a free and publicly available full-text search service over 1 billion web archived since 1996.

This video provides an overview of the services provided by the Portuguese Web Archive:

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+Email this to someone

Satisfaction surveys of the Portuguese Web Archive at Jornadas FCCN 2012

The demonstration session of the Portuguese Web Archive aimed to bring participants to experience, in an informal and relaxed way, the Archive and its features in order to identify faults and points to correct, and record users’ suggestions to improve the system.

The participants were invited, in case they were interested, to take a challenge with three steps, which included finding historical pages in the Archive.

The participants were asked to fill in a satisfaction survey, in an increasing scale of satisfaction from 1 to 7, with questions related to their experience with the Portuguese Web Archive. The obtained results show that users liked to use the service (6.1 average), that they easily learned to use it (5.9 average) and that they easily found the information they were seeking (5.1 average). It should be noted that the users claimed that they would use the service in the future (6.1) and they would talk about it to their friends (6.2).

The results obtained are positive regarding the quality of the new interface and allowed to set priorities for future service improvements.

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+Email this to someone