May 2025 – sobre.arquivo.pt

Portuguese Legislative Elections 2025 had a special collection by Arquivo.pt

May 28, 2025May 26, 2025 by Ricardo Basílio

Last updated on May 28th, 2025 at 09:04 am

Arquivo.pt carried out a special collection of content published online in connection with the Legislative Elections of 18 May 2025.

More than 8,000 unique pages were recorded, before and after the elections, resulting in around 250 Gigabytes of information.

This collection includes news items from the media, party websites and other citizen publications documenting this important event in Portuguese life.

The data collected is available for researchers to use in their work and projects.

Methodology for collecting the electoral event

The collection was carried out using a semi-automatic methodology that allows information to be identified and collected quickly and saves resources. The steps were as follows:

preparation of a list of search terms;
automatic search with the Bing Search API;
extraction of a list of page addresses or URLs;
recording (using Browsertrix-crawler);
integration into Arquivo.pt;
making the dataset available for research.

The starting point for identifying pages for this electoral event was a list of search terms, including words, names, dates, website addresses and also words in other languages. For example, we used ‘eleições’ “legislativas”, 2025, candidate names, party websites, newspaper websites and ‘eleições Portugal’ in other European languages to find foreign media pages that referred to the Portuguese elections. A total of 384 search terms were used.

The extracted addresses or URLs are then recorded, assuming that there are pages that miss the target and favouring speed, an important factor in this type of event.

A search was carried out to identify web pages before the elections and two the following week, with the corresponding recording, in order to add new content to the collection.

Finally, all the data from this special collection was published. Researchers are invited to use this information for projects or studies and to compete for the annual Arquivo.pt Award.

Legislative Elections 2025 data set

The dataset Legislative elections 2025: list of web pages with electoral content for preservation at Arquivo.pt was published at the open data Dados.gov.pt.

Find out more about electoral recalls from previous years

MOOC on Arquivo.pt and web archives launched and open to the community

May 21, 2025May 16, 2025 by Ricardo Basílio

Last updated on May 21st, 2025 at 11:42 am

The online training programme on Arquivo.pt, entitled The Web of the Past: Preservation and Research, has been launched and is open free of charge on the NAU platform to anyone who wants to deepen their knowledge of web archiving and Arquivo.pt services.

Daniel Gomes, manager of Arquivo.pt, who developed this training programme, announced it, at a first-hand, at the Faculdade de Letras da Universidade de Coimbra, during the workshop Digital preservation: tools as practices, held on May 7, 2025.

Registration open on the NAU platform for the MOOC web archiving

NAU – Sempre a Aprender is the e-learning platform of the FCCN, Foundation for Science and Technology (FCT) digital services unit. The NAU initiative focuses on supporting the publication and dynamisation of content in the Massive Open Online Courses (MOOC) format in Portuguese.

The aim of this programme is to develop skills in searching the Web’s digital memory, with an emphasis on using Arquivo.pt both in everyday life and in the context of studies and research.

The programme is divided into four courses:

Preservação da web e arquivos (Preserving the web and archives)
Pesquisar e aceder ao passado com o Arquivo.pt (Search and access the past with Arquivo.pt)
Bem publicar para bem preservar (Publish well to preserve well)
Casos de uso do Arquivo.pt (Arquivo.pt use cases)

No special requirements are needed, apart from a computer with Internet access and a browser such as Google, Chrome, Internet Explorer.

Spread the word: arquivo.pt/mooc

Know more

Interview Internet Day, May 17, published on NAU website

Arquivo.pt at the University of Coimbra to talk about digital preservation

May 22, 2025May 10, 2025 by Ricardo Basílio

Last updated on May 22nd, 2025 at 06:45 pm

Arquivo.pt took part in the workshop entitled “Digital preservation: tools and practices”, promoted by the Faculty of Letters of the University of Coimbra, on the afternoon of May 7, 2025. Moderated by Inês Santos, we highlight the initial panel with excellent speeches by Moisés Rockembach (University of Coimbra), Humberto Innarelli (Unicamp, Brazil) and Daniel Gomes (Arquivo.pt, digital service of FCCN-FCT).

The aim of the meeting was to offer the community a critical reflection on new trends in digital preservation tools and practices.

Digital preservation is a cross-cutting issue for organizations, as they all produce and generate information in digital format. There is a growing range of tools and solutions that promise greater efficiency in information processing. Many are labeled Artificial Intelligence. Such an abundance of products and frameworks calls for greater discussion and a critical approach. And this was achieved brilliantly by the panel of speakers.

Three approaches to Artificial Intelligence and Digital Preservation

This meeting brought together three authors of works on digital preservation at the Amphitheatre III of the Faculty of Letters of the University of Coimbra and discussed different approaches.

Moisés Rockembach, co-author with Caterina Pavão of Arquivamento da Web e preservação digital (Archiving the Web and Digital Preservation), the first work in Portuguese on web archives, focused his presentation on the impact of Artificial Intelligence on digital preservation systems, namely on searching for and accessing information, in classification and indexing processes, for example. With regard to the impact of the new tools that digital technology offers us, he referred to a phrase by Demi Gretscko: “The process of searching for and capturing information described in the text could certainly be improved in the future, especially when considering the contribution of new tools, such as those of Artificial Intelligence”.

There are Artificial Intelligence tools that allow interesting access to information through novelty and format. Archiving must take this reality into account and test the extent to which it can transform the way in which many types of content are disseminated and accessed. One example to illustrate this idea was the presentation of a Podcast generated by Artificial Intelligence from An example to illustrate this idea was the presentation of a Podcast generated by Artificial Intelligence, based on chapter 2 of the book on Web Archives, which deals with digital preservation policies.

Link to Podcast generated by Artificial Intelligence (published on Instagram, in Portuguese)

Humberto Innarelli, author of Criptex da preservação digital (Digital preservation cryptex), coordinator of the Arquivo Edgard Leuenroth (AEL) and specialist archival researcher at Unicamp, São Paulo and PhD professor at the Paula Souza Centre, São Paulo, posed the question of the future of digital preservation. Until now, the practice for preserving dynamic digital content has been to convert it into static documents. On the other hand, information is increasingly given to us dynamically, from databases or algorithms and Artificial Intelligence. What’s the next step? Archival practice needs to look not only at metadata, as it has done in recent years, but also at what explains how the information was generated (what we might call paradata). This is the only way to put archives and digital preservation in the long-term perspective. A hundred or two hundred years from now we should still be able to access the digital information produced today.

Daniel Gomes, editor of the book The Past Web and founder of Arquivo.pt, discussed the issue of Artificial Intelligence as it relates to non-artificial, human-produced content. What added value do tools that generate text, images, audio or video bring? If we consider, for example, that a Podcast on digital preservation used a book written by a human author as its basis, what new knowledge did it generate? Little or none. So, what has come to be called Artificial Intelligence can be considered a way of presenting human knowledge and in no way exempts humanity from continuing to think, research and produce new knowledge.

Arquivo.pt preserves content that has been published by individuals and organizations and in this sense is a unique source of its kind. Information published on the web is important for reporting and better understanding recent history, since the 1990s. Any Artificial Intelligence tool will have to go back to the point where the information was created by people. The human origin of the content preserved by Arquivo.pt, and the same can be said of traditional archives, makes them of enormous value, even considering their economic value. How much is the information stored in a web archive worth?

New MOOC (Massive Online Open Course) about web archiving

Daniel Gomes, Manager of Arquivo.pt, has announced first-hand the online course on the NAU platform: The Web of the Past: Preservation and Research (in Portuguese).

The online course or MOOC (Massive Online Open Course) is available for those who want to deepen their knowledge of web preservation.

The short link for dissemination is arquivo.pt/mooc

Preserved Arquivo.pt data and its automatic processing by APIs

Vasco Rato, developer of Arquivo.pt, showed how the automatic processing interfaces, Application Programming Interfaces (APIs), work.

Arquivo.pt data can be processed by Artificial Intelligence. The works competing for the Arquivo.pt Award have already demonstrated this, as have projects such as GlórIA, a Large Language Model developed at NOVA-FCT.

Finally, Ricardo Basílio, digital curator, showed how anyone can save a page or an entire website on their own computer in a standardized format, compatible with web archives. ArchiveWeb.page and browsertrix-crawler were used for this, as training tools. This practice allows the community to be increasingly active in preserving institutional information published on the Web.

Agenda

14h30 Panel – Moderator: Inês Santos, University of Coimbra

Digital Preservation and Artificial Intelligence – Moisés Rockembach, University of Coimbra – Slides
Cryptex for Digital Preservation: The Next Step – Humberto Innarelli, Unicamp – Slides
Arquivo.pt and Web Preservation – Daniel Gomes, FCCN-FCT – Slides

16h00 Break

Open Data for Research. Automatic information processing through APIs – Vasco Rato, FCCN-FCT – Slides
Demo – Archiving the Web: do-it-yourself – Ricardo Basílio, FCCN-FCT – Slides
- Manual recording demo with ArchiveWeb.page
- Automatic recording demo with Browsertrix-crawler

17h00 – Final

Image gallery

Images on the Coimbra University social media

Video of some moments from the event (published on Facebook)

Workshop na Faculdade de Letras da Universidade de Coimbra

Arquivo.pt in Coimbra at scientific computing event Jornadas FCCN 2025

May 17, 2025May 10, 2025 by Ricardo Basílio

Last updated on May 17th, 2025 at 12:52 pm

The Arquivo.pt team was in Coimbra between 6 and 8 May, at Jornadas FCCN to promote the preservation of the Portuguese Internet, as dissemination and promotion are an important part of its mission.

The Jornadas FCCN event is the responsibility of FCT’s digital services and annually brings together hundreds of participants from higher education institutions and other entities linked to science and technology.

On Tuesday morning, Pedro Gomes presented the highlights of the FCCN Zapping session and in the afternoon, from 4.30pm to 6pm, there was the Arquivo.pt session, Hands on for archiving the Web.

On Wednesday 7th, at 2.30pm, the Arquivo.pt team went to the University of Coimbra to take part in a meeting organised by the Faculty of Arts and Humanities (FCUL) entitled Digital preservation: tools and practices (Amphitheatre III, Floor 4).

Late on Wednesday afternoon, Daniel Gomes took part in the session Democratising AI: making Artificial Intelligence accessible to all on the contribution of Arquivo.pt to LLM AMÁLIA.

Arquivo.pt highlights at FCCN’s Zapping session

Pedro Gomes, who is in charge of Arquivo.pt’s collections, showed the oldest image archived on Arquivo.pt, which is on the old University of Coimbra website. He emphasised the new functionality that allows Flash content to be played, the statistical data of the Arquivo.pt, prizes, and the data sets.

Hands-on web archiving

This session, led by Ricardo Basílio, digital curator at Arquivo.pt, showed how to save web pages in standardised format using your own computer.

We believe that a ‘do-it-yourself!’ training is part of Arquivo.pt’s mission to promote the preservation of the Internet. By showing how website recording works, we’re also strengthening the community’s connection to Arquivo.pt.

For those who need to save high-quality copies of websites, this session will help. Participants were challenged to record static pages and others with interactive content, videos and social networks. Based on the questions that arose during the practical exercises, we clarified doubts and showed that archiving web content is very easy.

We used the ArchiveWeb.page extension, a tool from Webercorder.net, which the participants could obtain free of charge and install on their own computers.

If you are a computer scientist or advanced IT user

For those who expect and need to save entire websites automatically, we’ll briefly mention Browsertrix-crawler, an advanced tool that runs on a Docker, on Linux. Computer scientists and advanced IT users are all invited to try their hand at recording and archiving websites.

The demonstrations and exercises we propose using ArchiveWeb.page or Browsertrix-crawler also apply to advanced use cases and respond to organizations’ day-to-day web archiving needs.

Materials for the “hands-on” session

Democratising AI: making Artificial Intelligence accessible to everyone

On the second day of the FCCN Conference, 8 May 2025, in the session dedicated to Artificial Intelligence, Daniel Gomes, from FCNN-FCT, and João Magalhães, from NOVA-FCT, presented “AMÁLIA: Automatic Multimodal Language Assistant with AI”.

Daniel Gomes explained how Arquivo.pt is used for large-scale processing, specifically through the Arquivo.pt Application Programming Interfaces (APIs).

APIs allow researchers to access information from Arquivo.pt automatically and develop various applications in research projects. For example, projects such as Conta-me Histórias, the Portuguese language model GlórIA LLM and, currently, AMÁLIA LLM have used APIs.

Presentation slides