Technical report documents the creation of a searchable web archive

This report presents some of the work developed to create an efficient and effective web archive service, from data acquisition to user interface design.

The results of this research were applied to create the Portuguese Web Archive that is publicly available since January 2010. It supports full-text search over 1 billion contents archived from 1996 to 2010. The developed software is available as an open source project.

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+Email this to someone

93% of the searches are answered in less than 5 seconds

Data from April to June, 2011

  • 93% of the full-text searches performed on the Portuguese Web Archive were responded in less than 5 seconds.
  • 95% of the URL searches were answered in less than 5 seconds.
  • 73% of the user clicks are on the first page of results.
  • We wrote 72 000 lines of code to improve the original search system based on the Archive-access project.
  • Try our search!
Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+Email this to someone

Trends in the evolution of the Web: we have published a video about our study

Talk about the evolution of Web characteristics, based on a scientific study performed by the Portuguese Web Archive.

The presentation focuses on the following points:

  • The Web
  • Web Archiving and crawlers
  • Web characteristics and its evolution within 5 years
  • The importance of the study on Web trends in the design of tools that process its data

Find out more:

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+Email this to someone