Technical report documents the creation of a searchable web archive

Last updated on September 29th, 2017 at 02:17 pm

This report presents some of the work developed to create an efficient and effective web archive service, from data acquisition to user interface design.

The results of this research were applied to create the Portuguese Web Archive that is publicly available since January 2010. It supports full-text search over 1 billion contents archived from 1996 to 2010. The developed software is available as an open source project.

I. P. Santarém, 7th and 8th Feb.: learn more about the Portuguese Web Archive

Last updated on September 29th, 2017 at 02:22 pm

Come and meet the Archive’s team.

The Portuguese Web Archive will be presented at Jornadas FCCN on 7th and 8th of February 2012, with the following activities (in Portuguese):

93% of the searches are answered in less than 5 seconds

Last updated on September 29th, 2017 at 02:40 pm

Data from April to June, 2011

  • 93% of the full-text searches performed on the Portuguese Web Archive were responded in less than 5 seconds.
  • 95% of the URL searches were answered in less than 5 seconds.
  • 73% of the user clicks are on the first page of results.
  • We wrote 72 000 lines of code to improve the original search system based on the Archive-access project.
  • Try our search!

Trends in the evolution of the Web: we have published a video about our study

Last updated on December 20th, 2019 at 05:49 pm

Talk about the evolution of Web characteristics, based on a scientific study performed by the Portuguese Web Archive.

The presentation focuses on the following points:

  • The Web
  • Web Archiving and crawlers
  • Web characteristics and its evolution within 5 years
  • The importance of the study on Web trends in the design of tools that process its data

Find out more: