Arquivo.pt in Porto at the FCCN 2026 Conference

thumbnail-jornadas-fccn-porto

The Arquivo.pt team held a session open to the public on 5 May during the Jornadas FCCN 2026.

The session was attended by around 80 participants and covered topics that are currently central to Arquivo.pt’s work. These included the use of the preserved collection for research, its application in artificial intelligence (AI) tools, and participation in large language model (LLM) projects for the Portuguese language.

The national meeting Jornadas FCCN 2026 took place at the Faculdade de Economia da Universidade do Porto between 5 and 7 May 2026. About 1,000 people attended. It was an opportunity to meet many of the people we interact with throughout the year.

How the Internet Archive is being used for research, AI and LLMs

How can three decades of the history of the Portuguese web be used for research, technological innovation and to train artificial intelligence models? In this session by Arquivo.pt at the FCCN Conference, it was demonstrated, in a practical and accessible way, how the preserved collection is now being given a new lease of life — from generative AI projects to the development of open-source tools for the entire academic community.
The session was divided into five parts, each focusing on specific new features and real-world use cases.

1. Amália AI: AI trained using data from Arquivo.pt – inspiration, methods and results

Pedro Gomes demonstrated how historical data from Arquivo.pt was used in the development of Amália, a large language model (LLM) for the Portuguese language. He explained the data preparation process, the specific challenges of the Portuguese web, and provided examples of what the model can generate when drawing on decades of national digital archives.

It was an inspiring presentation for anyone wishing to understand the real impact of archived web collections on AI projects.

2. New text search with Apache Solr: faster, more modern and scalable

In 2025, we redesigned the text search system for Arquivo.pt. In this part of the session, Vasco Rato spoke about this ongoing work:

  • how a search engine works internally for older pages;
  • what challenges arise when indexing billions of pages;
  • and how the new architecture using Apache Solr paves the way for more comprehensive, faster and more flexible searches.

3. The use of AI for code generation

Ivo Branco demonstrated how the use of Artificial Intelligence to generate code is significantly speeding up the development of Arquivo.pt. What once began as a ‘vague improvement’ now quickly becomes a concrete task on the work plan, thanks to AI’s ability to propose solutions, structure code and support process automation.

The manager of Arquivo.pt also highlighted improvements to the page replay system, which is now based on ZipNum, a technology that drastically reduces the time taken to access archived content — even when dealing with billions of records.

The use of AI enables us to implement these optimisations more quickly, improve the quality of the code produced, and free up the team’s time for areas of greater innovation and research.

4. Upload your website straight away

To conclude, Ricardo Basílio gave a practical demonstration of how to file documents on one’s own initiative:

  • archive a page directly to Arquivo.pt in seconds using ArchivePageNow;
  • save content to your own computer in WARC format using Webrecorder;
  • understand how these files can be reused, analysed or preserved in the long term.

5. Thematic collections: preserving your memories

From the environment to elections, and from science to digital culture, Arquivo.pt regularly produces themed collections to preserve key moments in society.

This point was not covered during the session (it will be made available shortly). However, we have included a comment at the end of the session video. We wanted to explain how these special collections are defined, curated and preserved, and how they can be used for teaching, research or simply out of historical curiosity.

Session sponsor

Patrício Cachaço presented Fortinet Secure LAN solutions: Security-Driven Networking with AIOps.

Materiais da sessão

Image gallery

Jornadas FCCN 2026

jornadas-fccn-1
jornadas-fccn-1

Arquivo.pt at the IIPC Web Archiving Conference in Brussels

thumbnail-iipc-wac-2026

The Arquivo.pt team attended the Web Archiving Conference (WAC) and the IIPC General Assembley in Brussels from April 20 to 23, 2026.

The Web Archiving Conference is the largest event dedicated to Internet preservation. It brings together initiatives from around the world, such as the Internet Archive, national libraries, and research centers that develop methodologies for using historical web content.

The International Internet Preservation Consortium (IIPC) is a consortium that seeks to bring together various web archiving initiatives, coordinating efforts to maintain and develop standards, tools, collections, and training.

The Belgian Web Archive

The KBR (De Koninklijke Bibliotheek van België), the Belgian National Library, located in the heart of Brussels, hosted the Web Archiving Conference, which drew approximately 250 participants. The conference’s opening session featured a presentation of the results and the conclusion of the pilot project for the Belgian Web Archive.

In 2017, Belgium launched a project called PROMISE ((PReserving Online Multiple Information: towards a Belgian Strategy) for the national Web archive. Starting in 2019, with funding from the Belgian Science Policy Office (BELSPO), a five-year pilot phase was conducted, culminating in the presentation of a web archive prototype in 2026. Partners in this project included the national archives, the State Archives of Belgium (AGR), and, on the research side, the Ghent University

The collection of Belgian web content was carried out under the existing legal deposit system for printed materials, which was adapted in December 2016 to include digital web content.

The PROMISE project used open-source tools shared by the IIPC community (for data collection, the Browsertrix-crawler from Webrecorder.net; for playback, the pywb software). Access to the content is restricted and limited to the library system, and the collection has been enriched with metadata and catalog information.

Presentations by Arquivo.pt

To showcase what Arquivo.pt has been doing to promote access and demonstrate the value of its service, we presented three presentations.

Image gallery

IIPC WAC 2026

Abertura da WAC 2026 na KBR
Equipa do Arquivo.pt no IIPC WAC
Apresentação de Vasco Rato
Apresentação de Pedro Gomes
wac-iipc-memorial-cabon-footprint-1
Abertura da WAC 2026 na KBR Equipa do Arquivo.pt no IIPC WAC Apresentação de Vasco Rato Apresentação de Pedro Gomes wac-iipc-memorial-cabon-footprint-1

The Bridges project at the Universidade de Évora is collaborating with Arquivo.pt

thumbnail-ii-seminario-bridges

Arquivo.pt is collaborating with the Bridges “Ponte Cultural”, in CIDEHUS – Centro Interdisciplinar de História, Culturas e Sociedades of Universidade de Évora (Portugal).

Arquivo.pt’s contribution consists of providing educational content on the preservation of content published on the Internet and curating special collections related to the project’s thematic focus, such as women, immigration, and regional content from the Alentejo in the blogosphere.

The BRIDGES, project, led by Principal Investigator María Zozaya, is a work held at University of Évora funded by PRR after a Portuguese National Contest by PLANAPP-FCT on Scicence for Policy (S4P25-LT 24: BRIDGES). Work “developed under the Science4Policy 2025, an anual Science for Policy Project call, an initiative promoted by Centre for Planning and Evaluation of Public Policies in partnership with the Foundation for Science and Technology, financed by Portugal’s Recovery and Resilience Plan”.

International Seminar “Women in Focus”

On March 17 and 18, an international seminar titled “Women in Focus: From Narrative to Representation in Language, Art, Heritage, and the Digital World” was held.

Arquivo.pt participated in the online session with a presentation titled “Women’s Visibility on the Web: A Mirror of Society Since the 1990s.” The presentation demonstrated how the archive contains historical web content that is useful for studying women’s issues. Three research projects that utilized Arquivo.pt were highlighted, namely:

As part of this initiative, Arquivo.pt is compiling a thematic collection on Portuguese women who have made significant contributions to culture, art, and science. The list of URLs will be available on the open data portal Dados.gov.

Materials of the online session

II International Seminar “I have a dream”

From April 15 to 17, the 2nd International Seminar of the BRIDGES Project was held, entitled: «I have a dream. À luz da diversidade: arte, cultura, políticas públicas e mundo digital» (In the Light of Diversity – Art, Culture, Public Policy, and the Digital World).

Erik Bran Marino and Rafael Prezado, doctoral students at the University of Évora, presented Narrative Monitoring: Analysis of Conspiracy Theories on Population Replacement in the Portuguese Web Archive (1996–2021).

On the Narrative Monitor website, one can view the results and take a quiz.

The project “Narrative Monitoring” is one of the winners of the 2025 Arquivo.pt Award, having placed third. It was developed by the CIDEHUS research team, consisting of Erik Bran Marino, Rafael Prezado, Ana Sofia Ribeiro, and Renata Vieira. It serves as an example of how Arquivo.pt is used in a research context.

The digital curator of Arquivo.pt addressed the topic “Data on Multiracial Diversity at Arquivo.pt,” demonstrating how the web has served as a space for freedom of expression and for self-affirmation or advocacy.

The work of archiving the Web and preserving its memory, in turn, requires initiative and community participation. Several international and national examples of this “activist” aspect of Web archiving were cited:

As part of this session, Arquivo.pt is compiling a thematic collection on migration and the PRCT (Comparative Analysis of Conspiracy Theories in Europe), based on the 150 search terms used in Erik Bran Marino’s research.

Materials of the online session

FCCN presents Arquivo.pt at the “File Not Found” event in Lisbon

 

goethe-institut-file-evento-file-not-found

From March 23 to 26, Lisbon hosted the File Not Found event, organized by the Goethe-Institut. Over the course of four days, the initiative brought together national and international experts to explore the role of archives in the digital age, particularly their cultural, social, and political value in a constantly evolving digital world. The discussion highlighted practices, challenges, and responsibilities associated with the preservation of information heritage in this context of increasing digitization.

On the final day of the conference, March 26, João Gomes, area director at FCCN, the digital services unit of FCT, participated in the panel “Archiving Online: Power, Risk, and Digital Care Practices.” His presentation focused on Arquivo.pt, the public service for preserving Portuguese web content, developed by FCCN.

João Gomes presented the mission and progress of Arquivo.pt, emphasizing the importance of ensuring that information published online can be preserved and reused by researchers, journalists, public entities, and citizens. He also highlighted the service’s role in promoting digital literacy and advocating for open access to information.

Learn more about collaborations with Arquivo.pt

Arquivo.pt participated in the International Digital Curation Conference in Zagreb

IDCC 2026 Zagreb

Last updated on March 16th, 2026 at 12:38 pm

IDCC 2026 Zagreb

Arquivo.pt participated in the International Digital Curation Conference with a presentation entitled How Arquivo.pt is Preserving scientific research project websites and promoting data reuse, represented by Ricardo Basílio, digital curator.

IDDC 2026 took place in Zagreb, Croatia, between February 16 and 18. The organizer of this annual event is the Digital Curation Center, a leading consortium in the field of data management and curation for scientific research. This event had 219 attendees from 30 countries including 5 from Portugal.

The same panel, moderated by Mikala Narlock, from the Indiana University, featured the following presentations: Organizing a community to survive research ecosystem instability, by Lauren Phegley, from the University of Pennsylvania, What should be saved? The impact of austerity on data rescue, by Shona Jane Fergusonm, from the UK Centre for ecology and hydrology, and How do you calculate the carbon footprint of your digital preservation activities?, by Jenny Mitcham  from the Digital Preservation Coalition.

Contemporary challenges in digital curation

The theme of this year’s conference was “AI, austerity, and authoritarianism: contemporary challenges in digital curation.”

At the opening, Antica Čulinam, from the Ruder Boskovic Institute, addressed the issue of the reliability of science, which requires transparent, scrutinized processes and well-documented, unbiased data.

In parallel sessions, other current challenges were addressed, such as carbon footprint, the use of AI, successful cases of data management, and community engagement.

In the closing session, the topic of web preservation was highlighted with a presentation by Mikala Narclock from Indiana University and Linda Kellam from Pennsylvania University on the Data Rescue Project.

Urgency is a determining factor in web preservation, especially when scientific research results are involved.

Tribute to Kevin Ashley

The final moment of the conference was to honor Kevin Ashley, director of the DCC since April 2010. Since the 1990s, he has worked on the development and provision of digital preservation services as head of digital archives at the University of London Computing Center (ULCC). As leader of the DCC and a great communicator, he has played a charismatic role in the development of data management planning, advice, guidance, and training.

In Portugal, we have records of two presentations by Kevin Ashley at the 5ª Conferência Luso-Brasileira sobre Acesso Aberto (CONFOA) at the Universidade de Coimbra in 2014, which we recall here:

Contribution of Arquivo.pt to the preservation of scientific research results

Arquivo.pt, a digital service provided by FCT, has among its priorities the preservation of all types of information published on the Web related to research projects, such as project websites, abstracts of scientific publications, news in the media related to projects and, in general, all information on the Web referenced in scientific publications.

For example, and this was the case presented to conference participants, in 2021, Arquivo.pt identified and collected 17 terabytes of information related to projects funded by the European Commission’s H2020 program. Until then, 46% of H2020 projects did not mention their websites or project pages in the data published on the European data portal Cordis.

Based on this successful initiative, Arquivo.pt has been systematically collecting content related to the projects, in collaboration with RCAAP, PTCRIS, and Ciência Vitae, from which URLs of publications available on the Web are obtained.

Use of Arquivo.pt by researchers

At the same time that Arquivo.pt took the initiative to record web content produced by researchers, the number of use cases of Arquivo.pt increased year on year. In other words, we have more researchers making use of the data and testing methodologies. We found some examples in LLMs for the Portuguese language, such as GlórIA and AmálIA, and in the works competing for the Arquivo.pt Award.

For example, in 2025, a group of researchers from CIDEHUS – Centro Interdisciplinar de História, Culturas e Sociedades da Universidade de Évora, used Arquivo.pt to create the work Narrative Monitoring: Analysis of Conspiracy Theories of Population Replacement in the Portuguese Web Archive (1996-2021).

The aim was to show the audience that the preservation of scientific research results requires the involvement of the researchers themselves. Once they are familiar with and use Arquivo.pt, they are also better prepared to take care of the preservation of their publications.

Know more

Special collection of web content on the Presidential Elections. We need your help!

Presidenciais 2026 -logo-PR2026-thumbnail

Last updated on March 13th, 2026 at 11:30 am

The 2026 Portuguese presidential election took place between January 18 and February 15. Arquivo.pt collected 2.3 terabytes of electoral content and now provides data on the entire process, such as search terms, identified content, and archived content.

The 2026 Presidential Elections took place in two rounds, the first on January 18 and the second on February 8, followed by a second round in 20 parishes, in the wake of the storms that ravaged the country. Thus, it is expected to find news about the affected areas as well as the political interventions of the presidential candidates in the collection.

Call for community participation in identifying and archiving election-related content

On January 15, Arquivo.pt invited the community to participate in collecting information about the elections: “Candidates’ websites, news articles, opinion columns, or social media posts—everything is useful for representing our life in democracy. Have you found interesting election-related content? Participate in identifying and archiving election-related content.”

Two modalities were suggested:

Arquivo.pt methodology for thematic coverage of the elections

Following the practice adopted in previous elections, the procedure consisted of the following steps:

  • definition of search terms
  • identification of search engine results pages (SERP)
  • phased recording of seeds (starting addresses for crawler use)
  • integration into Arquivo.pt
  • availability of data set

A search term is a combination of words used in a search engine. For example: candidate_name+presidential_elections 2026+Portugal.

Google was used to identify electoral content, and the Google Rank Checker,Keyword SERP Ranking Tool were also used to extract the results. The limitations recently imposed by the search engine on simple manual searches of results by a user (10 at a time) make this method less efficient.

The recording was phased as follows: before and after the first round, on January 12 and 23, before and after the second round on February 5 and 12, and a final recording of all seeds on February 18.

The result was 2.3 terabytes of data, comprising 11.4 million files, obtained from approximately 34,000 seeds using Heritrix and Browsertrix-crawler.

The contents are archived in the collection with the ID EAWP51 collection and will be accessible on the Arquivo.pt interface after one year. For now, information about searching and identifying content is available.

2026 Presidential Election Data Set

Available on the open data platform Dados.gov:

Find out more about electoral recalls from previous years

Thematic collections to discover in the online sessions “Café with Arquivo.pt”

cafe-with-arquivo-pt-squaree en

Last updated on March 31st, 2026 at 01:05 pm

“Café with Arquivo.pt” consists of short online sessions so that anyone can attend during working hours. Its aim is to raise awareness of Arquivo.pt and gather contributions from the community on topics related to web preservation.

In December 2025, a new series was launched dedicated to the thematic collections that Arquivo.pt publishes as datasets on the Dados.Gov open data platform. For example, websites related to theater, music, schools, parishes, elections, and other topics are preserved in the Arquivo.pt. We aim to highlight sets of websites whose history is preserved in the Arquivo.pt to improve their preservation.

Next session

April 15 – Replacing “Coffee with Arquivo.pt”, an online session in collaboration with the Bridges – Ponte Cultural project, CIDEHUS, Universidade de Évora, dedicated to the topic of “Immigration.”

The work Narrative Monitoring: Analysis of Conspiracy Theories on Population Replacement in the Portuguese Web Archive (1996–2021), which placed third in the 2025 Arquivo.pt Award, authored by Erik Bran Marino, Rafael Prezado, Ana Sofia Ribeiro, and Renata Vieira, will be presented.

Arquivo.pt is contributing to this session by presenting a special collection on the topic, collected this month.

Guest speakers: Erik Bran Marino, Rafael Prezado

Held sessions

24/03/2026 – Feminist activism and digital memory: archival practices and fragile data

Materials

Summary

The online sessions “Coffee with Arquivo.pt” aim to raise awareness of efforts in Portugal to preserve content published on the Internet and to encourage participation by researchers and the public. The highlight of this session is the project “FEMglocal – Glocal Feminist Movements: Interactions and Contradictions.” A theoretical framework will be presented, followed by a presentation and discussion of the results and activities carried out. For example, identifying websites and other digital channels used by feminist movements has yielded a useful dataset for studying the topic. This raises the question: how can we archive all this fragile digital content circulating on the Internet? As a contribution from Arquivo.pt, we will briefly demonstrate how to collect thousands of pieces of content published on the Web about a specific topic—for example, feminism.

“FEMglocal – Glocal Feminist Movements: Interactions and Contradictions” (PTDC/COM-CSS/4049/2021 / DOI 10.54499/PTDC/COM-CSS/4049/2021), a project funded by national funds through the FCT — Foundation for Science and Technology, I.P. With the participation of DivIntLab (CICANT). With the participation of the DigiPlArt Exploratory Project (2024.13064.PEX), also funded through the FCT.

Learn more about the project:  www.femglocal.pt

3/12/2025 – Local elections: how we archive websites and election programmes

  • Guest speakers: Mário Rui André e Gonçalo Pereira Costa – LPP / Lisboa Para Pessoas newspaper
  • Date: december 3, 2025
  • Language: Portuguese, translation do English available on Zoom
  • Registration (free). Closed.

Materials

Summary

Guests Mário Rui André and Gonçalo Pereira Costa, journalists from the newspaper LPP / Lisboa Para Pessoas, talked to us about the Portal das Autárquicas da Lisboa Metropolitana (Lisbon Metropolitan Local Elections Portal) they created, which provides information about the candidates and their electoral programmes. Arquivo.pt has collected thousands of electoral pages and websites, more than 3 terabytes of information, and briefly explained the methodology used.

In this session, you will learn

  • How the local elections in the Lisbon Metropolitan Area went from a journalistic perspective;
  • What methodology was used to collect electoral content on the Internet;
  • How to use the web archive to obtain information from the past.

Previous seasons

In-person session dedicated to arquivo.pt closes the “Archives of Knowledge” cycle

Last updated on December 16th, 2025 at 08:12 pm

On 19 November, the last session of 2025 of the cycle Archives of Knowledge: Science, History and Memory (Arquivos do Saber: Ciência, História e Memória) cycle took place, an initiative of the FCT Science and Technology Archive.

The event took place in the small auditorium of the FCCN premises, FCT’s digital services unit, at Avenida do Brasil, 101, in Lisbon.

More than 30 participants attended, and it was an opportunity for them to learn more about Arquivo.pt.

Event programme

This session openned with speeches by Maria Paula Diogo, member of the Board of Directors of the Fundação para a Ciência e a Tecnologia (FCT), Paula Meireles, coordinator of the Science and Technology Archive (Arquivo de Ciência e Tecnologia), and João Nuno Ferreira, vice-president of FCT and general coordinator of the digital services unit, FCCN.

The guest speakers are Rúben Almeida, from INESC TECFEUP, who will give a presentation entitled Minha Região – O Teu Portal Autárquico, and Joaquim José, from the Instituto Politécnico da Guarda, who will talk about Memor.pt – Explore a Memória Digital Portuguesa, both winners of the Arquivo.pt 2025 Award, 1st and 2nd places, respectively. The session will be moderated by João Gomes, area director of FCCN, FCT’s digital services unit.

Programa_19NOV_Arquivos-do-Saber_2025_page-0001

19 November programme – ‘Arquivos do Saber’ cycle

The Science and Technology Archive and the dissemination of its collection

The cycle Archives of Knowledge: Science, History and Memory (Arquivos do Saber: Ciência, História e Memória), organised by FCT, has been running since February this year, with the aim of disseminating the documentary collection of its Science and Technology Archive (Arquivo de Ciência e Tecnologia), as well as others relevant to the history and memory of Science and Technology in Portugal. The sessions are short and take place in an informal and sharing environment.

Image gallery

5ª sessão do ciclo Arquivos do Saber: Ciência, História e Memória, na FCCN

20251119-sessao-arquivos-do-saber-fccn-11
20251119-sessao-arquivos-do-saber-fccn-10
20251119-sessao-arquivos-do-saber-fccn-1
20251119-sessao-arquivos-do-saber-fccn-8
20251119-sessao-arquivos-do-saber-fccn-12
20251119-sessao-arquivos-do-saber-fccn-13
20251119-sessao-arquivos-do-saber-fccn-14
20251119-sessao-arquivos-do-saber-fccn-18
20251119-sessao-arquivos-do-saber-fccn-17
20251119-sessao-arquivos-do-saber-fccn-21
20251119-sessao-arquivos-do-saber-fccn-20
20251119-sessao-arquivos-do-saber-fccn-19
20251119-sessao-arquivos-do-saber-fccn-22
20251119-sessao-arquivos-do-saber-fccn-11 20251119-sessao-arquivos-do-saber-fccn-10 20251119-sessao-arquivos-do-saber-fccn-1 20251119-sessao-arquivos-do-saber-fccn-8 20251119-sessao-arquivos-do-saber-fccn-12 20251119-sessao-arquivos-do-saber-fccn-13 20251119-sessao-arquivos-do-saber-fccn-14 20251119-sessao-arquivos-do-saber-fccn-18 20251119-sessao-arquivos-do-saber-fccn-17 20251119-sessao-arquivos-do-saber-fccn-21 20251119-sessao-arquivos-do-saber-fccn-20 20251119-sessao-arquivos-do-saber-fccn-19 20251119-sessao-arquivos-do-saber-fccn-22

Photos by Leonor Arrimar, FCT

Session video

Speakers and presentation

Ranking search results on Arquivo.pt on World Digital Preservation Day

Anotação de resultados de pesquisa no Arquivo.pt

Last updated on November 7th, 2025 at 03:55 pm

Anotação de resultados de pesquisa no Arquivo.pt

On World Digital Preservation Day, Arquivo.pt is promoting an online session dedicated to annotating search results on Arquivo.pt.

On World Digital Preservation Day, Arquivo.pt promoted an online session dedicated to annotating search results on Arquivo.pt, on 6 November, from 3 p.m. to 4 p.m.

The following topics were covered:

i) Access as a priority – text search as a search engine for the past
ii) How archived content is processed
iii) Annotations as quality assurance – demonstration

Importance of ranking results

The Arquivo.pt team has been reimplementing text search on Arquivo.pt, but needs to measure the quality of the new implementation by comparing it with the previous one. To do this, it is calling on the community for help.

How to rank results on Arquivo.pt

1. Acess to: https://anota.arquivo.pt

2. A random survey will appear (in Portuguese).

Example: “cavalo lusitano” “Associação Portuguesa do Cavalo Puro Sangue Lusitano” Entre 6 de agosto de 1991 e 1 de janeiro de 2010

3. Indicate the relevance of the result by selecting one of the buttons:

Annotation buttons: Very relevant, Partially relevant, Not relevant, Inaccessible content.

4. After finishing your annotation session, you should ‘Export’ (using the button for this purpose, which will download a file annotations.json).

5. Submit by clicking the ‘Enviar’ (Submit button and uploading the annotations.json file. Alternatively, you can send it by email to contacto@arquivo.pt.

Please refer to the guide (Guia de anotação de resultados de pesquisa) for a complete list of instructions.

Dataset on 2025 Portuguese Local Elections at Arquivo.pt

Last updated on December 3rd, 2025 at 12:56 pm

Local elections (“autárquicas”) were held in Portugal on 12 October 2025, and Arquivo.pt compiled a special collection of electoral content published on the web, resulting in 3.5 terabytes of information for research and academic work.

440 search terms were used to obtain 43,000 page addresses, along with the websites of parishes, municipalities, and political parties.

Here we explain the various steps involved in collecting data on the elections:

How to identify election-related content on the web

To identify content related to the elections, we used a list of search terms, for example, “eleições autárquicas 2025″, “habitação autárquicas 2025″, “promessas “autárquicas 2025”. After the elections, other terms were added, such as “vitória autárquicas 2025”, “resultados autárquicas 2025”.

The search terms are words that aim to include various topics related to the elections, such as politics, society, economics, among others, media, candidate names, and regions of the country.

In the collection on local elections, the Google search engine was used to perform each search. Some advanced search parameters were used: number of results (&num=100), news results (&tbm=nws), image results (&udm=2). After the elections, the results were restricted using the “last week” filter.

In each search, the addresses of the search engine results pages (SERP) were extracted using the Google Rank Checker,Keyword SERP Ranking Tool. This tool works as a browser extension that exports the list of results in JSON format.

In total, 1,400 searches or queries were performed on Google (800 before the elections and 600 after the elections). Finally, the results of all searches (.json files) were compiled into a document and converted into a table. Each result contains various data, such as relevance, the domain from which it was extracted, the link or URL, the title of the publication, the date of the search, and the query.

It should be noted that the list obtained represents only a small portion of everything published on the Web about the elections. In addition, the same list contains results unrelated to the purpose of the collection (false positives) and some repetitions. To save time, no lines were deleted.

This exercise resulted in 45,000 pages (seeds) with news, articles, and publications related to the elections to be used in the collection process by Arquivo.pt. This dataset, 2025 Local Elections, is available on the open data platform Dados.Gov.

A list of parish councils, municipal councils and political parties with their respective websites has also been added.

How the contents were recorded and limitations to be taken into account

The addresses obtained before and after the elections were recorded in two web crawlers, Heritrix and Browsertix-crawler . These tools record pages from a given starting address (seed), then follow the links there, up to a certain limit, in this case a maximum of five times (five hops).

Heritrix was used for an initial generic collection of pages, as it is capable of quickly processing lists containing thousands of addresses: 25,858 URLs before the elections and 17,258 URLs after the elections. It generated 541 gigabytes of information.

The Browsertix-crawler was used to improve the collection of dynamic content. This crawler’s recording is browser-based. Recording takes longer, but captures content that would otherwise escape collection.

The collection was carried out using the Browsertix-crawler, in stages, first by recording the parish websites in August and September, and then, between October 9 and November 5, by recording news about the elections and 8,850 social media posts. It generated 2.9 terabytes of information.

As for the limits of the collection, we were able to identify a few: access blocked by some websites that defend themselves against automatic access, despite the Arquivo.pt agent being identified; social media content behind a login that cannot be reproduced on Arquivo.pt; videos that cannot be reproduced due to their format.

How and when to access data for research and work creation

EAWP48 is the identifying name of the collection that will bring together content on the Local Elections of 12 October 2025. It is described in the list of collections at Arquivo.pt.

Nos próximos meses, o conteúdo será indexado e os índices CDXJ ficarão disponíveis para os investigadores na lista de datasets do Arquivo.pt.

In the coming months, the content will be indexed and the CDXJ indexes will be available to researchers in the Arquivo.pt dataset list.

After one year, the collected content will be accessible through the Arquivo.pt search engine. Anyone will then be able to search election pages by text or image.

For further information, please contact us.

Data collected on the 2025 Local Elections

Find out more about electoral recalls from previous years