Scientific and cultural institutions can contribute to the development of the Arquivo.pt – the Portuguese Web Archive.
The Portuguese Web Archive requires research and development. Given the scope of the service and the challenges that must be overcome, collaborations with external institutions may yield interesting results. Next, we present tasks that may be implemented as part of research and development projects.
If you work in these fields, or others that you may find relevant for the project, and you are interested in collaborating with the Portuguese Web Archive, please contact us.
To create a Web Archive useful to the community it is necessary to understand the users’ needs and expectations. The Web Archive may be useful to common citizens or researchers from several areas such as historians, linguists and sociologists, which have different requirements and expectations regarding the system. Performing studies to identify different users’ profiles of a web archive would be a valuable contribution.
Testing of developed systems
The developed systems will be thoroughly tested before being released to the public. The participation of people with critical sense is crucial to detect required improvements, for instance, in terms of usability and systems security.
Textual information retrieval over historical collections
In addition to archiving the information published on the web it is crucial to maintain it accessible. The algorithms currently used by search engines address only a single web collection and do not consider the existence of historical content collections incrementally built across time. The search for information over web historical archives is a complex problem and research on this has just begun.
A picture is worth a thousand words, but sometimes a thousand words are not enough to find the image we want. Web search engines look for images based on the texts associated with them. However, making this association is not trivial and often it generates erroneous results. The study of efficient mechanisms to enhance the extraction or association of texts to images could lead to an additional search service in our project. The Portuguese Web Archive will hold a large amount of images that will enable the development and testing of new image search algorithms using real data.
The amount of videos available on the Web has increased significantly during the last years. Information formerly published on text, such as user manuals, is now frequently published as videos. However, as it happens in image search, current search services only process the texts associated with the videos, and do not allow searching for information within the videos. Moreover, the results refer full videos, which requires the users to watch the whole video, even when they are only interested in the information contained in a few seconds of it. Thus, it is more difficult and time consuming to identify relevant information contained within a video than within a text. Research in mechanisms to enable information search within videos is an interesting field.
Appropriate user interfaces to search archived information
The usability of information systems user interfaces has repeatedly proven to be a key factor for the success of a project. The study of a user interface and middleware to provide access to the archived information is a challenging task, involving extensive research and testing with real users.