Characterizing Search Behavior in Web Archives: we have published a video about a study we performed

Last updated on December 20th, 2019 at 04:03 pm

Talk about the search behavior characterization of web archive users, based on a scientific study performed by the Portuguese Web Archive.

The presentation focuses on the following points:

  • Web archiving
  • Search log analysis
  • How do users search?
  • Time dimension in searches

Find out more:

Paper was considered one of the best at CIWAI-2008

Last updated on December 20th, 2019 at 04:08 pm

The paper about the archive and measurement of the Portuguese web was considered one of the best of the CIWAI-2008 conference.

The paper Arquivo e medição da web portuguesa (in Portuguese) about the archive and measurement of the Portuguese web was considered one of the best of the CIWAI-2008 conference.

This recognition was given to 3 of the 66 published works. The conference received 216 submissions.

The paper presents the Portuguese web archive project and a characterization of the Portuguese web obtained from a crawl performed in February 2008.

According to the obtained results the Portuguese web is composed at least by 56 million contents which correspond to 2.8 TB of data.

The authors were invited to publish an extended version of the paper in a scientific journal.

Open position at the Portuguese Web Archive

Last updated on October 2nd, 2017 at 11:19 am

Open position at the Portuguese Web Archive project

The Portuguese Web Archive announces an open position for a gradutate/MSc in Computer Science with experience in Java and Web usability.

Speaking and writing skills in Portuguese are required.

The application deadline is the 2nd March 2009.

For further information please check the description of the open position.

New member of the PWA

Last updated on October 2nd, 2017 at 11:05 am

David Cruz is the newest member of the Portuguese Web Archive team.

The opened position at the Portuguese Web Archive was filled by David Cruz.

David is 25 years old, he got his bachelor and master degrees in Informatics at the University of Lisbon.

He was a researcher at the human-computer interfaces group HCIM, his master thesis focused on geographic information retrieval and he collaborated with the natural language processing for Portuguese Linguateca.

At the PWA, he is now responsible for the usability and accessibility of the user interfaces.

He will also be in charge of the integration of historical web data collections supplied by external entities.

Welcome!

Lend a little disk space to preserve the Web

Last updated on October 2nd, 2017 at 10:56 am

Anyone can contribute to preserve the information published on the Web. You just need to install a simple application on your computer.

The information published on the web is a resource of great historical value that must be preserved for future generations.

RARC is an application that enables anyone to contribute to preserve the web, by providing a little bit of disk space to store a backup copy of the archived information.

This way, if information is lost from the central repository due to, for instance, a natural disaster, it can be retrieved from the computers of the people that installed rARC.

RARC includes a screen saver that presents examples of archived pages. You can uninstall rARC or reduce the donated space whenever you want.

Comments and suggestions are most welcome.

The Portuguese Web Archive publishes a top of the most generous contributors and the approximate location of the backup copies.

Start contributing to rARC now!

Thank you for your support.

Paper will be presented at LA-Web on November 11

Last updated on October 2nd, 2017 at 10:50 am

The paper Trends in Web characteristics will be presented at LA-Web 2009, on November 11, 2009.

The paper Trends in Web Characteristics, by João Miranda and Daniel Gomes, will be presented at the 7th Latin American Web Congress (LA-Web), in Merida, Mexico.

This paper presents trends in the evolution of the Web derived from the analysis of 3 characterizations performed within an interval of 5 years. The Web portion used as a case study was the Portuguese Web. Several metrics regarding site and content characteristics were analyzed.