Arquivo.pt launched a new version called Isis on 7 January 2025.
Support for Flash using the Ruffle emulator
In the new version of Arquivo.pt, the feature that now allows you to play animations and interactive content in Flash stands out.
Flash technology was used on websites in the early years of the Web. However, it has become obsolete and current browsers, such as Google or Edge, no longer support it, preventing the visualisation of such content. Software emulation is a way of giving access to content produced by obsolete technologies.
Arquivo.pt has therefore included Ruffle, a Flash Player emulator that allows you to visualise Flash content that was previously inaccessible to the user.
Examples of Flash animations on Arquivo.pt
Access the following sites on Arquivo.pt, bearing in mind that they are generally designed to be used on a desktop computer.
Arquivo.pt query logs are unique resources for research
Arquivo.pt provides a “Google-like” service that enables searching pages and images collected from the web since the 1990s. Notice that Arquivo.pt search complements live-web search engines because it enables temporal search over information that is no longer available online on its original websites.
Analyzing user behavior is an important research topic to understand users’ information needs and enhance the quality of search results. Thus, when a user interacts with a search engine, the system records the user’s actions in a file called the query log. Query logs from web archives are unique resources for research because they describe the real needs of web-archive users about the historical information published online over time.
Research case study
Flavie Gallois and Adam Jatowt from the University of Innsbruck, and Ricardo Campos from the University of Beira Interior and INESC TEC analyzed user search behavior based on the Arquivo.pt search query log dataset collected over a period of 3 months from June to September 2021 (Analyzing User Search Behaviour in Temporal Web Repositories through Search Query Log Analysis).
This study analyzed query features such as length, type or frequency and compared the obtained results with previous work about user search behavior over web-archives and live-web search engines.
This study revealed interesting trends and patterns about how users search for information within web archives, with strong potential for future research work.
How do web-archive users search?
The users came from Portugal in 85.7% of the queries. However, the Portuguese language was identified through automatic language identification of queries as being used in only 37% of the queries. This suggests that users apply other languages than their own to search in web archives.
Users of Arquivo.pt tend to use longer queries with more words and characters in comparison to previous studies, both over web archives and live-web search engines. About 92% of the queries had 5 or fewer terms (average of 25 characters), with 3 being the most common number of submitted terms. In previous work about search behavior in web archives, it was observed that users tended to submit from 1 to 3 terms per query, with 1 term as the most common submission.
Users tend to issue multiple queries within a session instead of a single query, possibly indicating a need for refining their search queries or exploring multiple options for inquiry.
87,7% of the queries submitted to Arquivo.pt used Desktop Browsers, despite Arquivo.pt providing mobile-friendly user interfaces. Old web-archived pages are not responsive and render poorly on mobile devices. Thus, it is expectable that users mostly use web archives through their desktops.
Users refined the time span of the search (using the datepickers) in about 50% of queries which indicates awareness of temporal needs peculiar to web-archive usage. Interestingly, users modified the From datepicker more frequently than the To datepicker. Notice that keeping the default time span may fit the user information needs and does not necessarily indicate the lack of awareness about the existence of the function to define time span (peculiar to web-archive search).
Only a small percentage of users included specific years in their query terms (4%), potentially suggesting that in these cases the time span function was insufficient, or unnoticed by some users.
The obtained results suggest that users are more conscious of their information needs and have improved their search techniques to be more effective over web-archives instead of just using them out of curiosity as first-comers.
What is searched in a web-archive?
The authors of the study applied automatic named entity recognition over the user queries and derived a set of word clouds that graphically provide a glimpse of the most common information needs of Arquivo.pt users:
Access to research Arquivo.pt query dataset
Arquivo.pt released a set of resources to support research studies over its
Query_Dataset_ArquivoPT.7z (in UTF-8): this file contains to the full query log dataset available for research collected over a period of 3 months from June to September 2021. We advise to be careful when opening because some readers such as Microsoft Excel may use the wrong charset and damage the content for instance of column L “QUERY”.
The first step to understand user behavior is to define evaluation metrics. Defining metrics is a powerful tool to set long and short-term goals to decide which new products and features should be released to the users.
We share a work-in-progress report which aggregates information about Web Archive Search Evaluation Metrics. This contributes to comparing users’ search behavior between live-web and web-archive search engines. Feel free to comment directly on the collaborative document or to contact us.
This report also provides a summary of references about previous work, query workflows and structure of the corresponding query logs produced by Arquivo.pt, to facilitate the work from the researchers to study these data sets.
Some web-archived pages are reproduced incompletely due to problems occurred during the archiving process (e.g. deformatted or missing embedded images).
Complete page is a function of Arquivo.pt that allows to recover missing elements in web-archived pages, from other web archives or the original websites.
When a user views a page archived in Arquivo.pt, just needs to access the Options menu in the top right corner and choose Complete page.
This process is performed automatically.
How does Complete page work?
If you open a web-archived page that appears incomplete, try the Complete page option and wait.
Arquivo.pt will search for missing elements on the Internet and in other web archives using the Memento protocol. If it succeeds, the obtained elements will be immediately displayed on the web-archived page.
Later, these recovered elements are integrated into the Arquivo.pt collection, so that the web-archived page will appear more complete in the future accesses performed by any user.
Completing the home page of artist Cristina Guerra’s website found a missing image.
For example, the website of artist Cristina Guerra archived in 2005 had a missing image. By using Complete page, it was possible in 2021 to obtain this missing image from another web archive which preserved it.
Participate in collaborative curation to improve the quality of Arquivo.pt!
Due to the high number of web-archived pages, it is not possible for Arquivo.pt to complete them all automatically. Therefore, the collaboration of users to identify important pages with missing elements and try to complete them is important.
By using Complete page, the users are contributing to improve the quality of the historical webpages preserved in Arquivo.pt!
Always give it a try to complete web-archived pages may that look incomplete. If you detect any problem, contact us.
Spread the word about the Arquivo.pt Complete page!
This event is a meeting for sharing knowledge among the entities that make up the national higher education and research community.
The event counts with the participation of decision-makers of the institutions, people in charge of computer technical services and people responsible for libraries and documentation services, among others.
Arquivo.pt presented two 90-minute sessions, on June 28th from 2h30 p.m. to 6 p.m., under the theme “Arquivo.pt services for managing citations and cybersecurity” and the service Arquivo.pt Memorial in the Zapping session.
Agenda
June 28 2:30-16 p.m.: Arquivo.pt:availableservicesandsystemarchitecture
When a user enters a set of words about a topic in the Arquivo.pt search box and clicks on the “Narrative” button, the user is directed to the “Conta-me Histórias” service, which automatically analyzes the news from 25 websites archived by Arquivo.pt over time and presents a chronology of news related to the topic.
Figure 1: Search results for pages about “Justin Bieber”.
Figure 2: Narrative of news about “Justin Bieber” from Portuguese news sites preserved by Arquivo.pt generated by the “Conta-me Histórias” service.
Create your narrative now!
“Conta-me Histórias” researches, analyzes and aggregates thousands of results to generate each narrative about a topic. It is recommended to choose descriptive words about well-defined themes, personalities or events to obtain good narratives.
Creating a narrative is useful for researchers, journalists or citizens who want to quickly get an overview of the evolution of a topic along time, thus saving them a lot of time and effort.
Go to Arquivo.pt and try to create a narrative about a theme of your choice.
Web Archiving Conference 2021 – the most important meeting in the field of Web preservation, where experts share new knowledge and experiences
RESAW Conference – meeting of the European RESAW network (Research Infrastructure for the Study of Archived Web Materials) this year in its 4th edition, mainly addressed to the community of researchers from non-technological scientific areas, such as Social Sciences, Arts and Humanities.
Contributions of Arquivo.pt to the international community
Arquivo.pt presented some results of the work developed in the last year, with emphasis on the functionalities that improve the reproduction of the archived contents, such as the “Complete the page”.
Two historical collections were integrated on the Arquivo.pt: the Geocities and the Internet Memory Foundation. Arquivo.pt did special collections about the 2019 European Elections and Covid-19.
The contents of Arquivo.pt are accessible to any researcher regardless of the country they are in and therefore it is a useful service to the international community.
Presentations
Arquivo.pt updates 2021: presentation at the IIPC – General Assembly, by Daniel Gomes (Vídeo)
Complete the page. 1 minute drop in (presentation at the IIPC – General Assembly “complete the page”), by Daniel Gomes (Slide, Video)
A transnational and cross-lingual crawl of the European Parliamentary Elections 2019, by Ivo Branco (Slides, Vídeo)
Enhancing access to research the Geocities historical collection, by Pedro Gomes (Slides, Vídeo)
Arquivo.pt launched a new version, called Basileus, on November 11, 2020.
The purpose of this version was to improve the user experience when browsing through the different interfaces of Arquivo.pt.
Adjustments were made at the level of Web design which resulted in greater consistency in the structure of the code, in the graphic aspects and in the interactions, such as colors, fonts and buttons.
On November 28 2017, Arquivo.pt launched a new version named Afrodite.
The main novelty is the adaptation of user interfaces to mobile devices.
On the other hand, it also enables access to the mobile versions of the preserved sites.
Arquivo.pt began to preserve the mobile web too!
Mobile version of the homepage of Arquivo.pt.
It is now easier to use Arquivo.pt everywhere
Using your mobile phone, try searching for all of the versions that Arquivo.pt preserved from the website of the organization where you worked or studied.
List of preserved versions of a site.
Mobile version of textual search.
Mobile versions of preserved sites can also be accessed.
And more…
Improvements were also made to the desktop user interfaces, including a new responsive footer and a new language selection bar.
The alpha version of a new API was also published to improve the automatic access to the information preserved by Arquivo.pt.