Technical report documents the creation of a searchable web archive

Last updated on August 4th, 2024 at 06:22 pm

This report presents some of the work developed to create an efficient and effective web archive service, from data acquisition to user interface design.

The results of this research were applied to create the Portuguese Web Archive that is publicly available since January 2010. It supports full-text search over 1 billion contents archived from 1996 to 2010. The developed software is available as an open source project.

93% of the searches are answered in less than 5 seconds

Last updated on August 4th, 2024 at 06:20 pm

Data from April to June, 2011

  • 93% of the full-text searches performed on the Portuguese Web Archive were responded in less than 5 seconds.
  • 95% of the URL searches were answered in less than 5 seconds.
  • 73% of the user clicks are on the first page of results.
  • We wrote 72 000 lines of code to improve the original search system based on the Archive-access project.
  • Try our search!

Trends in the evolution of the Web: we have published a video about our study

Last updated on August 4th, 2024 at 06:21 pm

Talk about the evolution of Web characteristics, based on a scientific study performed by the Portuguese Web Archive.

The presentation focuses on the following points:

  • The Web
  • Web Archiving and crawlers
  • Web characteristics and its evolution within 5 years
  • The importance of the study on Web trends in the design of tools that process its data

Find out more: