Arquivo.pt was recognised in the ‘Promoting a more Innovative and Digital Society’ category.
This category highlights the innovative aspect of organisations’ digital transition.
The manager of Arquivo.pt, Daniel Gomes, and the web developer in charge of Arquivo.pt’s collections, Pedro Gomes, were present at the ceremony, which took place in Oeiras on 3 December 2024.
Daniel Gomes explains how a web preservation service contributes to a more sustainable information society, in a video prepared for the awards ceremony.
The Digital Transformation Award (4th edition in 2024) aims to ‘recognise and disseminate best practices in the adoption and implementation of information and communication technologies (ICT), with a view to a more digital society sustained by public and private institutions that are more efficient and closer to the citizen’ (APDSI website).
The 2024 edition of this award received 33 nominations in 3 categories:
Effectiveness/Efficiency of Organisations
Proximity to Citizens and a More Inclusive Society
The aim of this joint FCT team was precisely to bring about the meeting and sharing of experiences between various institutions that inevitably have to manage information, both in traditional formats such as paper and in digital formats.
The meeting had 243 participants and 29 speakers throughout the day. Nine of the twenty-seven presentations were submitted for a session called ‘Community Space’.
Digital information was the main theme of the speeches. At the opening, the Head of the DGLAB – Direção Geral do Livro, dos Arquivos e das Bibliotecas (Directorate for Books, Archives and Libraries), Silvestre Lacerda, recalled that the DGLAB was a pioneer among public organisations in tackling the issue of digital preservation. FCT vice-president Francisco Santos emphasised the economic value of data for scientific research.
Digital preservation is not just about technology, as Henrique São Mamede, Professor at Universidade Aberta, INESC TEC, said at the opening conference. It’s also about people, the human factor, the environment outside organisations and new sensibilities such as sustainability and ecology. Hence the importance of creating bridges, of using Artificial Intelligence, for example, in conjunction with ethics. Presentation.
Throughout the day, four panels brought together presentations on various preservation contexts such as the digitisation of sound, image and video, research data, regulatory frameworks, management systems for digitised or born-digital information, dissemination and access, and use in academic research.
Panel 1: Digital preservation initiatives and realities
The first panel was moderated by João Gomes, Director of Advanced Services at FCT, and brought to the table the diversity of contexts in which the issue of preservation and access arises. Here we highlight one aspect of each presentation and invite you to follow the links to learn more about these initiatives.
Moisés Rockemback, Professor at the University of Coimbra and co-author of the book Arquivamento da web e preservação digital (Web archiving and digital preservation), spoke about the first initiatives carried out in Brazil to preserve content published on the Web. The websites of the candidates in the Brazilian elections, for example, are ephemeral by nature but have become material for historiographical research by being preserved in a web archive. From a more theoretical perspective, he addressed the issue of memory. Preserving the web allows us to bring to light events that were only broadcast on digital media such as the web and, in this sense, postpones the end of history expressed in the metaphor of the ‘Dark Age’, a time of darkness, empty of information. Presentation.
Pedro Penteado, Director of Archival and Standardisation Services, presented a set of instruments that the DGLAB has developed, such as the Macro Estrutura Funcional (MEF) (Macro Functional Structure, the Avaliação Suprainstitucional da Informação Arquivística (ASIA) (Super-institutional Assessment of Archival Information) project and the Lista Consolidada na Plataforma CLAV (Consolidated List on the CLAV Platform), which allows the different public administration bodies to comply with legislation and standardise classification and assessment practices. He recalled that these tools are flexible to meet the specific needs of organisations. Presentation.
Pedro Príncipe, Head of the Documentation Services Division at the University of Minho, spoke about research data. The preservation of and access to data is fundamental to the production of science. To achieve this, it is necessary to combine initiatives and work in networks and create communities of practice. The GDI Forum is an example of how useful it is to meet professionals. Certification is highly recommended, as demonstrated by the University of Minho, which has certified its repository, as it is an extra reason to create robustness and to achieve the FAIR (Findable, Accessible, Interoperable, and Reusable) objectives. Presentation.
Hilário Lopes, RTP’s Deputy Director of Institutional Relations and Archive, described the path to digitalisation that has completely changed the way we access the RTP archive (Portuguese Radio and Televison). If until 2001 digitisation was done on request, from that year onwards the contents were massively digitised. Since 2007, the contents have been accessible in digital format, which has facilitated access and use. RTP Memória and Portal RTP are two examples of access to the audiovisual heritage of public radio and television. Presentation.
Panel 2: Preserving and reusing Web information
The theme of web archiving was highlighted in the second panel, moderated by Daniel Gomes, manager of Arquivo.pt and its initiator on 8 November 2007.
Ricardo Basílio, digital curator at Arquivo.pt, presented the online exhibition ‘Memories of 25 April on the Internet’, created in collaboration with the 50 Years of 25 April Commemorative Commission, based on preserved web pages. Select pages about the 25 April celebrations across the country were highlighted through a guided tour of the exhibition. Presentation.
António Campos and Hélder Mestre, from the Arquivo Municipal de Sines (Sines City Council Archive), showed how, since 2020, they have been preserving web content of local interest in collaboration with Arquivo.pt. They record web pages with ArchiveWeb.page, a Webrecorder tool, send a copy of the files to Arquivo.pt, transcribe images and videos verbatim, and also use PDF as the most traditional format for archiving news. The issue of accessibility to content for people with special needs is fundamental in the preservation process. Presentation.
Finally, Daniel Gomes emphasised how much has been done in the last 17 years in the field of web preservation, to the point where we now have a functional service that everyone can use. As a testimony to those early days, we found a page from Diário Digital newspaper from November 2006.
Panel 3: Preserving the present and safeguarding the future
The third panel was moderated by Paula Meireles, Coordinator of the Archive, Documentation and Information service at the Foundation for Science and Technology (FCT) and brought four other realities to the table.
Filipe Guimarães Silva, Executive Director of the Fundação Mário Soares e Maria Barroso, and António Coelho, Digital Reproduction Coordinator, delved into the technical issues related to digitisation, based on the case of the collection, which is also accessible on the Casa Comum portal. Quality control is the most important factor in obtaining a preservable digital version. You don’t always need expensive technology to get good results. It is essential to follow standards and ensure that quality metadata is generated. Presentation.
Fernanda Gonçalves, Director of Archives at the São João Local Health Unit, showed how the São João Digital Clinical Repository is transforming access to clinical files with advantages in terms of both speed and quality of information. The information management model at this huge institution poses immense challenges for preservation and continued access, as it involves creating interoperability between multiple systems. What’s more, this is sensitive data with different levels of access. This is where the archive comes in as an asset. The archive service must rise to the challenges of any organisation in order to serve all its ‘clients’. Presentation.
Augusto Ribeiro, head of the Documentation and Information Management Service at UPdigital, University of Porto, explained how the university collection is being preserved. From the treatment of paper documents to their digitisation and inclusion in the digital repository, it’s important to guarantee their robustness. This work has been progressive and systematic, i.e. it follows a plan where all the pieces fit together as the work is carried out. Presentation.
Pedro Penteado (DGLAB) presented the ‘Digital Preservation Guide’ project that is being developed in collaboration with the Asociación Latinoamericana de Archivos (ALA). This initiative will structure content on digital preservation in a pragmatic way. Soon, professionals will have a knowledge base to consult whenever they carry out digital preservation activities. Presentation.
Panel 4: Community space
The fourth panel, moderated by Paula Carvalho, from FCT’s Science and Technology Archive, included 9 short presentations submitted by the community. Below, we present the abstracts submitted by the authors:
Justiça do Futuro: + Digital – Alexandra Lourenço, Albertina Catrola, Alexandra Henriques, António Dias, Cristina Ferreira, Inês Nunes, Rute Ramos | SGMJ
It was shown the work that the Commission has been doing to identify archives, documentation centres and collections of all kinds with material about 25 April. There are public collections that are practically unknown, and others that are in private collections. Inventorying and publicising them is therefore the first step in promoting study and knowledge about 25 de Abrril.
Finally, Maria Inácia Rezola announced the award of the Honourable Mention ‘25 de Abril and Democracy’, together with a prize of 5,000 euros, in the Arquivo.pt Award 2025, to the best work on 25 April that uses Arquivo.pt.
Image gallery
Encontro Dia Mundial da Preservação Digital 2024 #WDPD2024
Credits: photos by Leonor Arrimar (FCT). Included are some images of mobile devices sent in by participants.
Arquivo.pt won the award in the category of “Best Digital Project of Central Public Administration”.
This category annually recognizes a project that has contributed “unequivocally to the development of the Central Public sector through digital means, as well as the Digital Economy in Portugal”.
The Head of Arquivo.pt Daniel Gomes, the FCCN Deputy General Coordinator Salomé Branco and the FCT vice-president Francisco Santos were present at the ceremony held on October 24 at the Técnico Innovation Center in Lisbon and received the award.
Arquivo.pt receives Award for Best Governmental service
Navegantes XXI Awards
The Navegantes XXI (Navigators XXI) Awards are an annual initiative by ACEPI – Digital Economy Association, created with the mission “To Promote and Develop the Digital Economy in Portugal”.
The competition rewards the best of the Digital Economy and Society in Portugal in its most diverse aspects. It currently comprises 20 categories that reward the most innovative and digital transformation Portuguese projects, ideas and institutions. Three prizes are also awarded for special categories outside the competition.
The month of September marks the beginning of a year’s work and also the end of many websites that are hopelessly lost. Remodelled or shut down without making a good copy of their content, this is how historic websites are lost unnecessarily.
There are tools that allow websites to be saved immediately by the organisations that manage them. In addition, there is the on-demand archiving service for high-quality websites that Arquivo.pt provides to partner organisations or in occasional collaborations.
This article aims to highlight the Browsertrix Crawler used by Arquivo.pt, without excluding other tools, which can be useful to information managers and IT departments.
Use of Browsertrix-crawler by Arquivo.pt for high-quality collections
Browsertrix Crawler is a tool that lets you record entire websites and lists of web pages automatically and in a format compatible with web archives.
Arquivo.pt uses the Browsertrix Crawler to make high-quality site collections (RAQs) on-demand of the community. For example, when a site is about to be shut down, when it’s going to undergo remodelling or, periodically, to maintain a good history of a particular site.
Requests for high-quality collections (RAQs) to Arquivo.pt are increasingly frequent: 77 requests from January to September 2024. This is a sign that there is greater concern about the preservation of web content.
What you need to use Browsertrix-crawler locally
The group that developed the Browsertrix Crawler, Webrecorder.net, led by Ilya Kreymer, has the motto ‘web archiving for all’. Its tools make it possible to record the Internet in a decentralised way and on a small scale.
The Browsertrix Crawler is available and can be installed on your computer for small collections.
The basic version of Browsertrix that Arquivo.pt is using requires basic command line knowledge, which is the only barrier for non-experts.
From Arquivo.pt’s own experience, using the Browsertrix Crawler is easy in multidisciplinary teams, where there is always someone with minimal knowledge to use Linux commands and provide occasional support.
Demonstration of recording entire websites on your own computer
To promote the preservation of sites in Web archive format, Arquivo.pt presents a use case for the Browsertrix Crawler. It’s useful for anyone who wants to deepen their knowledge and practice of saving sites in a local environment.
Other tools used by Arquivo.pt to record content
Brozzler: a tool for improving the history of daily and monthly collection sites
Brozzler is a similar tool to Browsertrix Crawler in that it also bases its recording on a browser. It is used and maintained by the Internet Archive.
Arquivo.pt has been using Brozzler since at least 2018 to record web pages with interactive content present on the web pages and for high-quality collections (RAQs).
Lists of up to 200 sites are successfully recorded by Brozzler. For example, the 125 daily collection sites (FAWP) are recorded with Brozzler at the beginning of each month.During the month, another list of 75 monthly collection sites (MAWP) is recorded using Brozzler.
At the end of 2023, Arquivo.pt compared Brozzler and Browsertrix Crawler and chose to keep these two tools.
Heritrix, pywb and ArchiveWeb.page: tools for thousands of sites or one page
The Heritrix crawler is Arquivo.pt’s main recording tool. It is used on huge lists of websites, such as the .PT domain sites, to which other Portuguese sites are added, totalling more than half a million.
To complete the list of recording tools used by Arquivo.pt, mention should be made of pywb, which comes into play, for example, when an Arquivo.pt user uses the ‘Complete the page’ functionality or the ArchivePageNow service.
IPL – Politécnico de Lisboa, through its Distance Learning Group (EaD@IPL), organised a series of webinars for its community dedicated to Arquivo.pt and the preservation of content published on the Internet.
This initiative was attended by IPL – Politécnico de Lisboa lecturers and researchers, as well as people linked to the institution’s communications department.
The cycle of webinars took place in three sessions, between May and July 2024, and followed the training programme that Arquivo.pt has been offering for several years.
Presentation materials
1st webinar – Arquivo.pt: a new tool for researching the past. Well publish to well preserve. June 5 , 2024.
Archiving content published on the web and using a web archive on a day-to-day basis is an unusual practice, largely due to the community’s lack of knowledge about the existence and operation of Arquivo.pt.
As a result of this series of webinars, the collaboration between the IPL – Politécnico de Lisboa and Arquivo.pt was strengthened, with a view to preserving its institutional websites and other interesting content that is available on various online media (news, events, references to teachers, researchers and students).
May 18, International Museum Day, was celebrated all over the country with free admission, guided tours, entertainment and exhibitions related to memory and heritage.
Arquivo.pt contributed with an exhibition of old web pages, entitled “Digital Memory through the Internet of the Past”, which was on display at one of the stands at the National Coach Museum in Lisbon.
The pages were selected to show different aspects of the Alentejo over time. From 2016, pages relating to the Heritales project were selected.
Heritales and Crowd-Recycling drew attention to the preservation of the Internet’s memory
Heritales is a project based in Évora that aims to study and disseminate heritage in all its manifestations. It is known for its main event created in 2016, HERITALES – International Heritage Film Festival.
Crowd-Recycling is a project focused on good practices for sustainability.
Heritales, Crowd-Recycling and Arquivo.pt carried out this action in collaboration with the aim of giving visibility to content published on the web over time. Preserving and giving access to digital content is fundamental to enhancing heritage.
Why an exhibition of old websites is a good idea
Making an exhibition of websites over time is relatively easy, all you have to do is come up with a theme, which can also be the history of an institution, and choose pages preserved on Arquivo.pt.
An exhibition of old websites is an original idea for the target audience. It often features texts and images that only existed on the web.
By drawing attention to the websites, we realize that many things were left unrecorded and this changes our view of the content we publish today. We start taking more care to save important pages, for example by taking action or saving them on the spot with SavePageNow.
World Internet Day was on May 17th
The day before International Museum Day was World Internet Day (May 17). The proximity of the two commemorations ties in with the theme of preserving memory.
Portugal connected to the Internet for the first time in 1991, with the FCCN project “RCCN IP Service”.
The initiatives were as follows: a journey through time, a special collection on the theme “Abril 25”, a presentation at the “50 years of April International Congress” and the inclusion of a special mention in the 2025 edition of the Arquivo.pt Award.
Memories of April 25 on the Internet exhibition
The exhibition Memories of April 25 on the Internet presents a selection of web pages about the celebrations of April 25 in various regions of the country, since the beginning of the web in the 1990s.
The criteria for choosing the pages for the exhibition were as follows:
Pages relating to the April 25 commemorations;
Pages found on Arquivo.pt on dates close to the anniversary each year;
Diversity to include different areas of the country;
Popular demonstrations and official ceremonies.
A historical memory without web archives is incomplete. The aim of this journey through time is to invite citizens to travel back in time, browsing through old web pages and reliving recent episodes in our life as a democracy.
The dataset contains a list of keywords put into a search engine in order to obtain results on the topic of “April 25”. The search considered names of people, places, political, social and cultural aspects, as well as words associated with the event.
The searches were carried out on March 22, 2024 using the Bing Search API, an automatic search service that returns results according to the relevance criteria of the Bing service itself and others configured by us.
A total of 12,650 unique web page addresses were obtained. It is hoped that the recording of these pages will be useful for the organizations that produced this content, for researchers who want to study our history and for citizens who cultivate a sense of memory and democracy.
Participation in the 50 years of April International Congress
On May 2, 2024, João Gomes, Director of Advanced Services at the FCCN Scientific Computing Unit of the Foundation for Science and Technology I.P., presented Arquivo.pt to the participants of the 50 years of April International Congress, as a distinctive service, open to citizens and useful for organizations.
Arquivo.pt is a web preservation service available to all citizens who want to search for old content published on the web.
Using Arquivo.pt contributes to a better understanding of our history. It also provides useful services for cybersecurity, such as the Arquivo.pt Memorial, which is able to maintain institutions’ old websites, preventing attacks and saving them resources.
Special mention for “April 25 and Democracy” at the Arquivo.pt Awards 2025
In 2025, as part of the celebrations for the 50th anniversary of April 25, a special mention will be made of work on the theme “April 25 and Democracy”.
We therefore challenge researchers and interested citizens to create innovative works using Arquivo.pt.
If you have any questions about the Arquivo.pt Award, please contact us.
The session held during the Jornadas FCCN 2024 was entitled “Arquivo.pt at the service of culture” and aimed to highlight two of Arquivo.pt’s collaborations in the field of culture and knowledge, namely with Wikipedia Portugal and the Virtual Museum of Tourism (MUVITUR).
At the FCCN Zapping session, Arquivo.pt presented the Arquivo404 service, which allows websites to offer historical content instead of the negative “Page not found”.
The post-Day Workshop, promoted by ARDITI, was open to regional institutions and citizens in general. It was entitled “Arquivo.pt and the preservation of Internet memory”.
The contents were structured according to the training program run by Arquivo.pt and preceded by a framework between the other services of the FCCN – FCCN – Computação Científica da FCT.
Just as important as the content was the dialog that was established during the sessions between the participants and the Arquivo.pt team to clarify doubts or ask questions.
Web preservation is increasingly important for organizations that want to preserve part of their institutional memory and develop security policies.
ARDITI gave an important signal about preserving the web memory of Madeiran institutions by hosting and promoting the Arquivo.pt training sessions.
If you want to promote the preservation of web content in your organization, check out the Arquivo.pt training and contact us.
Artificial Intelligence (AI), covers various areas of knowledge, such as linguistics and computing, and is present in the new technologies used by citizens on a daily basis.
For example, when we search for information on the Internet and the computer generates an amazingly accurate response, in a language very close to our own.
Natural Language Processing (NLP) is what allows machines to perfect the algorithm that generates these answers tailored to Internet users.
The problem is that natural language processing models have been developed more for the English language and less for Portuguese and other languages with less representation.
The more the processing models are trained on a language, the better they will be able to interpret the complexities of the language. But this is only possible if quality data is available.
Portuguese text collection on Arquivo.pt available for research
Arquivo.pt appears here as the largest Portuguese-language textual dataset in Portugal, available in open access, for researchers to train NLP models.
In recent years, researchers from various research groups and projects have drawn attention to the usefulness of preserved web data for large-scale processing.
Arquivo.pt has more than 1 Petabyte of preserved web content dating back to the 1990s, including everything that can be found on web pages. It’s not just text, but also images, audio files, video, page code and various metadata.
The content is accessible via the search interface and the Arquivo.pt APIs.
One of the projects that used Arquivo.pt to obtain large amounts of text is called GlórIA and is a large-scale language model (LLM) focused on the European Portuguese language.
Arquivo.pt, Portugal’s national web preservation service, has earned a prominent position by being named one of the top 3 government services in the 2023 Portugal Digital Awards. This recognition is testimony to the crucial role played by Arquivo.pt in the preservation and accessibility of Portugal’s digital heritage.
The three finalists in the category Best Government Project (best digital transformation project in the public administration sector) were Arquivo.pt, the Porto Digital Association and Banco de Portugal, which received the winning award.
Mission and recognition
Arquivo.pt, developed by the FCCN – National Scientific Computing , stands out as an innovative initiative in the field of digital preservation. Its mission is to collect and archive web content, allowing users to access past versions of web pages, documents and other online resources.
The recognition at the Portugal Digital Awards highlights not only the importance of digital preservation, but also the effectiveness and relevance of Arquivo.pt as a government service. By providing a journey through time via the Internet, this resource becomes a valuable tool for researchers, academics and the general public.
Commitment to digital preservation
Participation in the award underlines Arquivo.pt’s commitment to improving the historical record of the evolution of the Web in Portugal. This service not only contributes to the country’s digital memory, but also facilitates research, promoting understanding of digital evolution over time.
In addition, Arquivo.pt’s distinction reflects FCCN’s ongoing effort to develop and improve innovative services that benefit society. Digital preservation is a crucial component in ensuring that Portugal’s digital heritage is passed on to future generations, and Arquivo.pt fulfills this role in a unique way.
In conclusion, recognition in the Portugal Digital Awards 2023, a competition that received over 300 candidate services, solidifies Arquivo.pt’s position as one of the leading government services at the forefront of digital preservation. This achievement highlights the growing importance of digital preservation in the digital age in which we live.