Thematic collections to discover in the online sessions “Café with Arquivo.pt”

cafe-with-arquivo-pt-squaree en

Last updated on December 4th, 2025 at 01:15 pm

“Café with Arquivo.pt” consists of short online sessions so that anyone can attend during working hours. Its aim is to raise awareness of Arquivo.pt and gather contributions from the community on topics related to web preservation.

In December 2025, a new series was launched dedicated to the thematic collections that Arquivo.pt publishes in the form of data sets on the Dados.Gov open data platform.

For example, websites related to theatre, music, schools, parishes, elections and other topics are preserved in Arquivo.pt. In the thematic sessions of Café com o Arquivo.pt, we will highlight sets of websites whose history can be found in the web archive.

Each session is dedicated to a specific topic and features a guest speaker who talks about their institution or project and comments on the topic of the day.

Thematic collections series

1st session – Local elections: how we archive websites and election programmes

  • Guest speakers: Mário Rui André e Gonçalo Pereira Costa – LPP / Lisboa Para Pessoas newspaper
  • Date: december 3, 2025
  • Language: Portuguese, translation do English available on Zoom
  • Registration (free)

Materials

Summary

Guests Mário Rui André and Gonçalo Pereira Costa, journalists from the newspaper LPP / Lisboa Para Pessoas, talked to us about the Portal das Autárquicas da Lisboa Metropolitana (Lisbon Metropolitan Local Elections Portal) they created, which provides information about the candidates and their electoral programmes. Arquivo.pt has collected thousands of electoral pages and websites, more than 3 terabytes of information, and briefly explained the methodology used.

In this session, you will learn

  • How the local elections in the Lisbon Metropolitan Area went from a journalistic perspective;
  • What methodology was used to collect electoral content on the Internet;
  • How to use the web archive to obtain information from the past.

Previous seasons

In-person session dedicated to arquivo.pt closes the “Archives of Knowledge” cycle

Last updated on December 16th, 2025 at 08:12 pm

On 19 November, the last session of 2025 of the cycle Archives of Knowledge: Science, History and Memory (Arquivos do Saber: Ciência, História e Memória) cycle took place, an initiative of the FCT Science and Technology Archive.

The event took place in the small auditorium of the FCCN premises, FCT’s digital services unit, at Avenida do Brasil, 101, in Lisbon.

More than 30 participants attended, and it was an opportunity for them to learn more about Arquivo.pt.

Event programme

This session openned with speeches by Maria Paula Diogo, member of the Board of Directors of the Fundação para a Ciência e a Tecnologia (FCT), Paula Meireles, coordinator of the Science and Technology Archive (Arquivo de Ciência e Tecnologia), and João Nuno Ferreira, vice-president of FCT and general coordinator of the digital services unit, FCCN.

The guest speakers are Rúben Almeida, from INESC TECFEUP, who will give a presentation entitled Minha Região – O Teu Portal Autárquico, and Joaquim José, from the Instituto Politécnico da Guarda, who will talk about Memor.pt – Explore a Memória Digital Portuguesa, both winners of the Arquivo.pt 2025 Award, 1st and 2nd places, respectively. The session will be moderated by João Gomes, area director of FCCN, FCT’s digital services unit.

Programa_19NOV_Arquivos-do-Saber_2025_page-0001

19 November programme – ‘Arquivos do Saber’ cycle

The Science and Technology Archive and the dissemination of its collection

The cycle Archives of Knowledge: Science, History and Memory (Arquivos do Saber: Ciência, História e Memória), organised by FCT, has been running since February this year, with the aim of disseminating the documentary collection of its Science and Technology Archive (Arquivo de Ciência e Tecnologia), as well as others relevant to the history and memory of Science and Technology in Portugal. The sessions are short and take place in an informal and sharing environment.

Image gallery

5ª sessão do ciclo Arquivos do Saber: Ciência, História e Memória, na FCCN

20251119-sessao-arquivos-do-saber-fccn-11
20251119-sessao-arquivos-do-saber-fccn-10
20251119-sessao-arquivos-do-saber-fccn-1
20251119-sessao-arquivos-do-saber-fccn-8
20251119-sessao-arquivos-do-saber-fccn-12
20251119-sessao-arquivos-do-saber-fccn-13
20251119-sessao-arquivos-do-saber-fccn-14
20251119-sessao-arquivos-do-saber-fccn-18
20251119-sessao-arquivos-do-saber-fccn-17
20251119-sessao-arquivos-do-saber-fccn-21
20251119-sessao-arquivos-do-saber-fccn-20
20251119-sessao-arquivos-do-saber-fccn-19
20251119-sessao-arquivos-do-saber-fccn-22
20251119-sessao-arquivos-do-saber-fccn-11 20251119-sessao-arquivos-do-saber-fccn-10 20251119-sessao-arquivos-do-saber-fccn-1 20251119-sessao-arquivos-do-saber-fccn-8 20251119-sessao-arquivos-do-saber-fccn-12 20251119-sessao-arquivos-do-saber-fccn-13 20251119-sessao-arquivos-do-saber-fccn-14 20251119-sessao-arquivos-do-saber-fccn-18 20251119-sessao-arquivos-do-saber-fccn-17 20251119-sessao-arquivos-do-saber-fccn-21 20251119-sessao-arquivos-do-saber-fccn-20 20251119-sessao-arquivos-do-saber-fccn-19 20251119-sessao-arquivos-do-saber-fccn-22

Photos by Leonor Arrimar, FCT

Session video

Speakers and presentation

Ranking search results on Arquivo.pt on World Digital Preservation Day

Anotação de resultados de pesquisa no Arquivo.pt

Last updated on November 7th, 2025 at 03:55 pm

Anotação de resultados de pesquisa no Arquivo.pt

On World Digital Preservation Day, Arquivo.pt is promoting an online session dedicated to annotating search results on Arquivo.pt.

On World Digital Preservation Day, Arquivo.pt promoted an online session dedicated to annotating search results on Arquivo.pt, on 6 November, from 3 p.m. to 4 p.m.

The following topics were covered:

i) Access as a priority – text search as a search engine for the past
ii) How archived content is processed
iii) Annotations as quality assurance – demonstration

Importance of ranking results

The Arquivo.pt team has been reimplementing text search on Arquivo.pt, but needs to measure the quality of the new implementation by comparing it with the previous one. To do this, it is calling on the community for help.

How to rank results on Arquivo.pt

1. Acess to: https://anota.arquivo.pt

2. A random survey will appear (in Portuguese).

Example: “cavalo lusitano” “Associação Portuguesa do Cavalo Puro Sangue Lusitano” Entre 6 de agosto de 1991 e 1 de janeiro de 2010

3. Indicate the relevance of the result by selecting one of the buttons:

Annotation buttons: Very relevant, Partially relevant, Not relevant, Inaccessible content.

4. After finishing your annotation session, you should ‘Export’ (using the button for this purpose, which will download a file annotations.json).

5. Submit by clicking the ‘Enviar’ (Submit button and uploading the annotations.json file. Alternatively, you can send it by email to contacto@arquivo.pt.

Please refer to the guide (Guia de anotação de resultados de pesquisa) for a complete list of instructions.

Dataset on 2025 Portuguese Local Elections at Arquivo.pt

Last updated on December 3rd, 2025 at 12:56 pm

Local elections (“autárquicas”) were held in Portugal on 12 October 2025, and Arquivo.pt compiled a special collection of electoral content published on the web, resulting in 3.5 terabytes of information for research and academic work.

440 search terms were used to obtain 43,000 page addresses, along with the websites of parishes, municipalities, and political parties.

Here we explain the various steps involved in collecting data on the elections:

How to identify election-related content on the web

To identify content related to the elections, we used a list of search terms, for example, “eleições autárquicas 2025″, “habitação autárquicas 2025″, “promessas “autárquicas 2025”. After the elections, other terms were added, such as “vitória autárquicas 2025”, “resultados autárquicas 2025”.

The search terms are words that aim to include various topics related to the elections, such as politics, society, economics, among others, media, candidate names, and regions of the country.

In the collection on local elections, the Google search engine was used to perform each search. Some advanced search parameters were used: number of results (&num=100), news results (&tbm=nws), image results (&udm=2). After the elections, the results were restricted using the “last week” filter.

In each search, the addresses of the search engine results pages (SERP) were extracted using the Google Rank Checker,Keyword SERP Ranking Tool. This tool works as a browser extension that exports the list of results in JSON format.

In total, 1,400 searches or queries were performed on Google (800 before the elections and 600 after the elections). Finally, the results of all searches (.json files) were compiled into a document and converted into a table. Each result contains various data, such as relevance, the domain from which it was extracted, the link or URL, the title of the publication, the date of the search, and the query.

It should be noted that the list obtained represents only a small portion of everything published on the Web about the elections. In addition, the same list contains results unrelated to the purpose of the collection (false positives) and some repetitions. To save time, no lines were deleted.

This exercise resulted in 45,000 pages (seeds) with news, articles, and publications related to the elections to be used in the collection process by Arquivo.pt. This dataset, 2025 Local Elections, is available on the open data platform Dados.Gov.

A list of parish councils, municipal councils and political parties with their respective websites has also been added.

How the contents were recorded and limitations to be taken into account

The addresses obtained before and after the elections were recorded in two web crawlers, Heritrix and Browsertix-crawler . These tools record pages from a given starting address (seed), then follow the links there, up to a certain limit, in this case a maximum of five times (five hops).

Heritrix was used for an initial generic collection of pages, as it is capable of quickly processing lists containing thousands of addresses: 25,858 URLs before the elections and 17,258 URLs after the elections. It generated 541 gigabytes of information.

The Browsertix-crawler was used to improve the collection of dynamic content. This crawler’s recording is browser-based. Recording takes longer, but captures content that would otherwise escape collection.

The collection was carried out using the Browsertix-crawler, in stages, first by recording the parish websites in August and September, and then, between October 9 and November 5, by recording news about the elections and 8,850 social media posts. It generated 2.9 terabytes of information.

As for the limits of the collection, we were able to identify a few: access blocked by some websites that defend themselves against automatic access, despite the Arquivo.pt agent being identified; social media content behind a login that cannot be reproduced on Arquivo.pt; videos that cannot be reproduced due to their format.

How and when to access data for research and work creation

EAWP48 is the identifying name of the collection that will bring together content on the Local Elections of 12 October 2025. It is described in the list of collections at Arquivo.pt.

Nos próximos meses, o conteúdo será indexado e os índices CDXJ ficarão disponíveis para os investigadores na lista de datasets do Arquivo.pt.

In the coming months, the content will be indexed and the CDXJ indexes will be available to researchers in the Arquivo.pt dataset list.

After one year, the collected content will be accessible through the Arquivo.pt search engine. Anyone will then be able to search election pages by text or image.

For further information, please contact us.

Data collected on the 2025 Local Elections

Find out more about electoral recalls from previous years