The Arquivo.pt team held a session open to the public on 5 May during the Jornadas FCCN 2026.
The session was attended by around 80 participants and covered topics that are currently central to Arquivo.pt’s work. These included the use of the preserved collection for research, its application in artificial intelligence (AI) tools, and participation in large language model (LLM) projects for the Portuguese language.
The national meeting Jornadas FCCN 2026 took place at the Faculdade de Economia da Universidade do Porto between 5 and 7 May 2026. About 1,000 people attended. It was an opportunity to meet many of the people we interact with throughout the year.
How the Internet Archive is being used for research, AI and LLMs
How can three decades of the history of the Portuguese web be used for research, technological innovation and to train artificial intelligence models? In this session by Arquivo.pt at the FCCN Conference, it was demonstrated, in a practical and accessible way, how the preserved collection is now being given a new lease of life — from generative AI projects to the development of open-source tools for the entire academic community.
The session was divided into five parts, each focusing on specific new features and real-world use cases.
1. Amália AI: AI trained using data from Arquivo.pt – inspiration, methods and results
Pedro Gomes demonstrated how historical data from Arquivo.pt was used in the development of Amália, a large language model (LLM) for the Portuguese language. He explained the data preparation process, the specific challenges of the Portuguese web, and provided examples of what the model can generate when drawing on decades of national digital archives.
It was an inspiring presentation for anyone wishing to understand the real impact of archived web collections on AI projects.
2. New text search with Apache Solr: faster, more modern and scalable
In 2025, we redesigned the text search system for Arquivo.pt. In this part of the session, Vasco Rato spoke about this ongoing work:
- how a search engine works internally for older pages;
- what challenges arise when indexing billions of pages;
- and how the new architecture using Apache Solr paves the way for more comprehensive, faster and more flexible searches.
3. The use of AI for code generation
Ivo Branco demonstrated how the use of Artificial Intelligence to generate code is significantly speeding up the development of Arquivo.pt. What once began as a ‘vague improvement’ now quickly becomes a concrete task on the work plan, thanks to AI’s ability to propose solutions, structure code and support process automation.
The manager of Arquivo.pt also highlighted improvements to the page replay system, which is now based on ZipNum, a technology that drastically reduces the time taken to access archived content — even when dealing with billions of records.
The use of AI enables us to implement these optimisations more quickly, improve the quality of the code produced, and free up the team’s time for areas of greater innovation and research.
4. Upload your website straight away
To conclude, Ricardo Basílio gave a practical demonstration of how to file documents on one’s own initiative:
- archive a page directly to Arquivo.pt in seconds using ArchivePageNow;
- save content to your own computer in WARC format using Webrecorder;
- understand how these files can be reused, analysed or preserved in the long term.
5. Thematic collections: preserving your memories
From the environment to elections, and from science to digital culture, Arquivo.pt regularly produces themed collections to preserve key moments in society.
This point was not covered during the session (it will be made available shortly). However, we have included a comment at the end of the session video. We wanted to explain how these special collections are defined, curated and preserved, and how they can be used for teaching, research or simply out of historical curiosity.
Session sponsor
Patrício Cachaço presented Fortinet Secure LAN solutions: Security-Driven Networking with AIOps.
Materiais da sessão
- Slides – “Como o arquivo da Internet está a ser usado para a investigação, IA e LLMs” (“How the Internet Archive is being used for research, AI and LLMs”)
- Slides – Demo “Arquive o seu site na hora” e “Coleções temáticas” (“Archive your website instantly” and “Thematic collections”)
- Video of the session (available soon)



