2005 contents provided by the National Library can now be searched

The contents provided by the National Library of Portugal were successfully integrated and can be searched through our experimental search system.

In 2005 the National Library of Portugal in collaboration with INESC conducted a series of web crawls to gather information related to the national elections.

This project was named RECOLHA and the data collected comprised over 14 million contents (165 GB).

In 2009 these data was supplied to the Portuguese Web Archive.

It was successfully integrated and now it can be searched through our experimental search system.

Notice, that searches are performed over all the archived data, independently from their origin, so you will not be able to identify the RECOLHA contents.

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+Email this to someone

Scientific study about Web accessibility conducted in collaboration with the Archive

This research, conducted by the HCIM group from the University of Lisbon in collaboration with the Portuguese Web Archive, presents a measurement of the Portuguese Web accessibility for people with disabilities.

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+Email this to someone

Scientific study presents a characterization of the information needs of Web Archive users

This research focuses on what the users intents are and which topics are most interesting to them. Three instruments were used to collect quantitative and qualitative data: search logs, an online questionnaire and a laboratory study.

The paper Understanding the Information Needs of Web Archive Users, by Miguel Costa and Mário J. Silva, was presented at the 10th International Web Archiving Workshop, in Vienna, Austria.

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+Email this to someone

Web archivists: please answer 3 quick questions regarding resources committed to web archiving

We are conducting a preliminary study regarding the efforts being committed world-wide to web archiving.

If you are a web archivist, please send us your answers to these 3 quick questions:

  • What is the name of your web archive initiative (please state if you want to remain anonymous)?
  • Which is the amount of data that you have archived (number of files, disk space occupied)?
  • How many people work at your web archive (in person/month)?

Any additional comments are welcome.

Thank you very much.

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+Email this to someone

Characterizing Search Behavior in Web Archives: we have published a video about a study we performed

Talk about the search behavior characterization of web archive users, based on a scientific study performed by the Portuguese Web Archive.

The presentation focuses on the following points:

  • Web archiving
  • Search log analysis
  • How do users search?
  • Time dimension in searches

Find out more:

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+Email this to someone

Paper was considered one of the best at CIWAI-2008

The paper about the archive and measurement of the Portuguese web was considered one of the best of the CIWAI-2008 conference.

The paper Arquivo e medição da web portuguesa (in Portuguese) about the archive and measurement of the Portuguese web was considered one of the best of the CIWAI-2008 conference.

This recognition was given to 3 of the 66 published works. The conference received 216 submissions.

The paper presents the Portuguese web archive project and a characterization of the Portuguese web obtained from a crawl performed in February 2008.

According to the obtained results the Portuguese web is composed at least by 56 million contents which correspond to 2.8 TB of data.

The authors were invited to publish an extended version of the paper in a scientific journal.

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+Email this to someone

New member of the PWA

David Cruz is the newest member of the Portuguese Web Archive team.

The opened position at the Portuguese Web Archive was filled by David Cruz.

David is 25 years old, he got his bachelor and master degrees in Informatics at the University of Lisbon.

He was a researcher at the human-computer interfaces group HCIM, his master thesis focused on geographic information retrieval and he collaborated with the natural language processing for Portuguese Linguateca.

At the PWA, he is now responsible for the usability and accessibility of the user interfaces.

He will also be in charge of the integration of historical web data collections supplied by external entities.

Welcome!

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+Email this to someone