World Digital Preservation Day celebrated at Portuguese National Archive Torre do Tombo

Last updated on November 18th, 2024 at 11:23 am

Let’s talk about preservation and access!

On November 7, 2024, the New Paths to Information Preservation and Access Meeting was held, organised jointly by Arquivo.pt and the Arquivo de Ciência e Tecnologia, the first located on Avenida do Brasil and the latter on Avenida D. Carlos I, in Lisbon, both services of the Fundação para a Ciência e a Tecnologia (FCT).

The aim of this joint FCT team was precisely to bring about the meeting and sharing of experiences between various institutions that inevitably have to manage information, both in traditional formats such as paper and in digital formats.

The meeting had 243 participants and 29 speakers throughout the day. Nine of the twenty-seven presentations were submitted for a session called ‘Community Space’.

The Portuguese Association of Librarians, Archivists, Information and Documentation Professionals APBAD made an important contribution to publicising the event to the community and was present with an information stand.

An international day dedicated to digital preservation

On this day, World Digital Preservation Day was celebrated, an initiative of the Digital Preservation Coalition (DPC) to which Arquivo.pt has been associated since the first edition in 2017. Jane Winters, Chair of the DPC, sent a video message to join this initiative in Portugal.

Digital information was the main theme of the speeches. At the opening, the Head of the DGLAB – Direção Geral do Livro, dos Arquivos e das Bibliotecas  (Directorate for Books, Archives and Libraries), Silvestre Lacerda, recalled that the DGLAB was a pioneer among public organisations in tackling the issue of digital preservation. FCT vice-president Francisco Santos emphasised the economic value of data for scientific research.

Digital preservation is not just about technology, as Henrique São Mamede, Professor at Universidade Aberta, INESC TEC, said at the opening conference. It’s also about people, the human factor, the environment outside organisations and new sensibilities such as sustainability and ecology. Hence the importance of creating bridges, of using Artificial Intelligence, for example, in conjunction with ethics.

Throughout the day, four panels brought together presentations on various preservation contexts such as the digitisation of sound, image and video, research data, regulatory frameworks, management systems for digitised or born-digital information, dissemination and access, and use in academic research.

Panel 1: Digital preservation initiatives and realities

The first panel was moderated by João Gomes, Director of Advanced Services at FCT, and brought to the table the diversity of contexts in which the issue of preservation and access arises. Here we highlight one aspect of each presentation and invite you to follow the links to learn more about these initiatives.

Moisés Rockemback, Professor at the University of Coimbra and co-author of the book Arquivamento da web e preservação digital  (Web archiving and digital preservation), spoke about the first initiatives carried out in Brazil to preserve content published on the Web. The websites of the candidates in the Brazilian elections, for example, are ephemeral by nature but have become material for historiographical research by being preserved in a web archive. From a more theoretical perspective, he addressed the issue of memory. Preserving the web allows us to bring to light events that were only broadcast on digital media such as the web and, in this sense, postpones the end of history expressed in the metaphor of the ‘Dark Age’, a time of darkness, empty of information.

Pedro Penteado, Director of Archival and Standardisation Services, presented a set of instruments that the DGLAB has developed, such as the Macro Estrutura Funcional (MEF) (Macro Functional Structure, the Avaliação Suprainstitucional da Informação Arquivística (ASIA) (Super-institutional Assessment of Archival Information) project and the Lista Consolidada na Plataforma CLAV (Consolidated List on the CLAV Platform), which allows the different public administration bodies to comply with legislation and standardise classification and assessment practices. He recalled that these tools are flexible to meet the specific needs of organisations.

Pedro Príncipe, Head of the Documentation Services Division at the University of Minho, spoke about research data. The preservation of and access to data is fundamental to the production of science. To achieve this, it is necessary to combine initiatives and work in networks and create communities of practice. The GDI Forum is an example of how useful it is to meet professionals. Certification is highly recommended, as demonstrated by the University of Minho, which has certified its repository, as it is an extra reason to create robustness and to achieve the FAIR (Findable, Accessible, Interoperable, and Reusable) objectives.

Hilário Lopes, RTP’s Deputy Director of Institutional Relations and Archive, described the path to digitalisation that has completely changed the way we access the RTP archive (Portuguese Radio and Televison). If until 2001 digitisation was done on request, from that year onwards the contents were massively digitised. Since 2007, the contents have been accessible in digital format, which has facilitated access and use. RTP Memória and Portal RTP are two examples of access to the audiovisual heritage of public radio and television.

Panel 2: Preserving and reusing Web information

The theme of web archiving was highlighted in the second panel, moderated by Daniel Gomes, manager of Arquivo.pt and its initiator on 8 November 2007.

Ricardo Basílio, digital curator at Arquivo.pt, presented the online exhibition ‘Memories of 25 April on the Internet’, created in collaboration with the 50 Years of 25 April Commemorative Commission, based on preserved web pages. Select pages about the 25 April celebrations across the country were highlighted through a guided tour of the exhibition.

Joana Paulino, historian and researcher at the Faculdade de Ciências Sociais e Humanas da Universidade Nova de Lisboa, showed how technologies contribute to the development of studies in areas traditionally far removed from technologies, based on her experience at the Digital Humanities Laboratory.

António Campos and Hélder Mestre, from the Arquivo Municipal de Sines (Sines City Council Archive), showed how, since 2020, they have been preserving web content of local interest in collaboration with Arquivo.pt. They record web pages with ArchiveWeb.page, a Webrecorder tool, send a copy of the files to Arquivo.pt, transcribe images and videos verbatim, and also use PDF as the most traditional format for archiving news. The issue of accessibility to content for people with special needs is fundamental in the preservation process.

António Ramiro and Carmen Fonseca, winners of the Arquivo.pt 2024 Award, presented their work Noticioso.pt. It’s a project that reuses information from Arquivo.pt to challenge citizens’ critical capacity.

Finally, Daniel Gomes emphasised how much has been done in the last 17 years in the field of web preservation, to the point where we now have a functional service that everyone can use. As a testimony to those early days, we found a page from Diário Digital newspaper from November 2006.

Panel 3: Preserving the present and safeguarding the future

The third panel was moderated by Paula Meireles, Coordinator of the Archive, Documentation and Information service at the Foundation for Science and Technology (FCT) and brought four other realities to the table.

Filipe Guimarães Silva, Executive Director of the Fundação Mário Soares e Maria Barroso,  and António Coelho, Digital Reproduction Coordinator, delved into the technical issues related to digitisation, based on the case of the collection, which is also accessible on the Casa Comum portal. Quality control is the most important factor in obtaining a preservable digital version. You don’t always need expensive technology to get good results. It is essential to follow standards and ensure that quality metadata is generated.

Fernanda Gonçalves, Director of Archives at the São João Local Health Unit, showed how the São João Digital Clinical Repository is transforming access to clinical files with advantages in terms of both speed and quality of information. The information management model at this huge institution poses immense challenges for preservation and continued access, as it involves creating interoperability between multiple systems. What’s more, this is sensitive data with different levels of access. This is where the archive comes in as an asset. The archive service must rise to the challenges of any organisation in order to serve all its ‘clients’.

Augusto Ribeiro, head of the Documentation and Information Management Service at UPdigital, University of Porto, explained how the university collection is being preserved. From the treatment of paper documents to their digitisation and inclusion in the digital repository, it’s important to guarantee their robustness. This work has been progressive and systematic, i.e. it follows a plan where all the pieces fit together as the work is carried out.

Pedro Penteado (DGLAB) presented the ‘Digital Preservation Guide’ project that is being developed in collaboration with the Asociación Latinoamericana de Archivos (ALA). This initiative will structure content on digital preservation in a pragmatic way. Soon, professionals will have a knowledge base to consult whenever they carry out digital preservation activities.

Panel 4: Community space

The fourth panel, moderated by Paula Carvalho, from FCT’s Science and Technology Archive, included 9 short presentations submitted by the community. Below, we present the abstracts submitted by the authors:

Celebrating the 50th anniversary of 25 de Abril at the closing session

Maria Inácia Rezola, Executive Commissioner of the Mission Structure for the Commemorations of the 50th Anniversary of the Revolution of 25 de Abril 1974, presented a historical perspective of the impact of 25 April on Portuguese society, namely through the way it is commemorated throughout the country.

It was shown the work that the Commission has been doing to identify archives, documentation centres and collections of all kinds with material about 25 April. There are public collections that are practically unknown, and others that are in private collections. Inventorying and publicising them is therefore the first step in promoting study and knowledge about 25 de Abrril.

Finally, Maria Inácia Rezola announced the award of the Honourable Mention ‘25 de Abril and Democracy’, together with a prize of 5,000 euros, in the Arquivo.pt Award 2025, to the best work on 25 April that uses Arquivo.pt.

Image gallery

Encontro Dia Mundial da Preservação Digital 2024 #WDPD2024

Carmen Fonseca, O Noticioso.pt
Ricardo Basílio, Arquivo.pt -FCT
Hélder Mestre e António Campos, Arquivo Municipal de Sines
Hélder Mestre e António Campos, Arquivo Municipal de Sines
Ricardo Basílio, Arquivo.pt -FCT
Joana Paulino, NOVA-FCSH
António Ramiro, Noticioso.pt
2º Painel - António Ramiro e Carmen Fonseca, Noticioso.pt
António Ramiro e Carmen Fonseca, Noticioso.pt
Encontro Novos Caminhos para a Preservação e o Acesso à Informação
2º painel - Encontro Novos Caminhos para a Preservação e o Acesso à Informação
Encontro Novos Caminhos para a preservação e o aEncontro Novos Caminhos para a Preservação e o Acesso à Informaçãoesso à informação
1º Painel - Encontro Novos Caminhos para a Preservação e o Acesso à Informação
Encontro Novos Caminhos para a Preservação e o Acesso à Informação
Encontro Novos Caminhos para a Preservação e o Acesso à Informação
Encontro Novos Caminhos para a Preservação e o Acesso à Informação
Encontro Novos Caminhos para a Preservação e o Acesso à Informação
Pedro Príncipe, Universidade do Minho
Moisés Rockemback, Universidade de Coimbra
Hilário Lopes, Arquivo da RTP
Stand do Arquivo.pt - Encontro Novos Caminhos para a Preservação e o Acesso à Informação
1º Painel - Encontro Novos Caminhos para a Preservação e o Acesso à Informação
Pedro Penteado, DGLAB
Encontro Novos Caminhos para a Preservação e o Acesso à Informação
1º Painel - Encontro Novos Caminhos para a Preservação e o Acesso à Informação
Moisés Rockemback, Univ. Coimbra, Ricardo Basílio, Arquivo.pt
Henrique São Mamede, Universidade Aberta, INESC TEC
Stand do Arquivo.pt - Encontro Novos Caminhos para a Preservação e o Acesso à Informação
Stand do Arquivo.pt - Encontro Novos Caminhos para a Preservação e o Acesso à Informação
Moisés Rockemback, Universidade de Coimbra
Sessão de Abertura - Silvestre Lacerda, Diretor da DGLAB e Francisco Santos, Vice-Presidente da FCT
3º Painel - Paula Meireles, FCT
Henrique São Mamede, Universidade Aberta, INESC TEC
Sessão de Abertura - João Gomes, Diretor Serviços Avançados da FCT
Sessão de Abertura - Silvestre Lacerda, Diretor da DGLAB e Francisco Santos, Vice-Presidente da FCT
Sessão de Abertura - Jane Winters, Digital Preservation Coalition (DPC)
Sessão de Abertura - Silvestre Lacerda, Diretor da DGLAB e Francisco Santos, Vice-Presidente da FCT
Sessão de Abertura - Silvestre Lacerda, Diretor da DGLAB e Francisco Santos, Vice-Presidente da FCT
Sessão de Abertura - Silvestre Lacerda, Diretor da DGLAB e Francisco Santos, Vice-Presidente da FCT
Augusto Ribeiro, Universidade do Porto, UPDigital
3º painel - Encontro Novos Caminhos para a Preservação e o Acesso à Informação
Pedro Penteado, DGLAB
wdpd_encontro-preservacao-vasco-rato-arquivo-pt
wdpd_encontro-preservacao-pedro-gomes-citationsaver-fccn-1
wdpd_encontro-preservacao-rita-cepa-nova-fcsh
wdpd_encontro-preservacao-pedro-gomes-citationsaver-fccn
wdpd_encontro-preservacao-joao-pedro-oliveira-nova-fcsh
wdpd_encontro-preservacao-uab-madalena-carvalho
wdpd_encontro-preservacao-suzana-oliveira-act-fct-1
wdpd_encontro-preservacao-susana-torrao-pedro-cavaco-nova-fcsh
wdpd_encontro-preservacao-inacia-rezola
wdpd_encontro-preservacao-inacia-rezola-1
moises-rockembach
arquivamento-da-web-moises-rockembach
paula-meireles-inacia-rezola-sessao-de-encerramento
pedro-principe-uminho
wdpd-paula-meireles
Carmen Fonseca, O Noticioso.pt Ricardo Basílio, Arquivo.pt -FCT Hélder Mestre e António Campos, Arquivo Municipal de Sines Hélder Mestre e António Campos, Arquivo Municipal de Sines Ricardo Basílio, Arquivo.pt -FCT Joana Paulino, NOVA-FCSH António Ramiro, Noticioso.pt 2º Painel - António Ramiro e Carmen Fonseca, Noticioso.pt António Ramiro e Carmen Fonseca, Noticioso.pt Encontro Novos Caminhos para a Preservação e o Acesso à Informação 2º painel - Encontro Novos Caminhos para a Preservação e o Acesso à Informação Encontro Novos Caminhos para a preservação e o aEncontro Novos Caminhos para a Preservação e o Acesso à Informaçãoesso à informação 1º Painel - Encontro Novos Caminhos para a Preservação e o Acesso à Informação Encontro Novos Caminhos para a Preservação e o Acesso à Informação Encontro Novos Caminhos para a Preservação e o Acesso à Informação Encontro Novos Caminhos para a Preservação e o Acesso à Informação Encontro Novos Caminhos para a Preservação e o Acesso à Informação Pedro Príncipe, Universidade do Minho Moisés Rockemback, Universidade de Coimbra Hilário Lopes, Arquivo da RTP Stand do Arquivo.pt - Encontro Novos Caminhos para a Preservação e o Acesso à Informação 1º Painel - Encontro Novos Caminhos para a Preservação e o Acesso à Informação Pedro Penteado, DGLAB Encontro Novos Caminhos para a Preservação e o Acesso à Informação 1º Painel - Encontro Novos Caminhos para a Preservação e o Acesso à Informação Moisés Rockemback, Univ. Coimbra, Ricardo Basílio, Arquivo.pt Henrique São Mamede, Universidade Aberta, INESC TEC Stand do Arquivo.pt - Encontro Novos Caminhos para a Preservação e o Acesso à Informação Stand do Arquivo.pt - Encontro Novos Caminhos para a Preservação e o Acesso à Informação Moisés Rockemback, Universidade de Coimbra Sessão de Abertura - Silvestre Lacerda, Diretor da DGLAB e Francisco Santos, Vice-Presidente da FCT 3º Painel - Paula Meireles, FCT Henrique São Mamede, Universidade Aberta, INESC TEC Sessão de Abertura - João Gomes, Diretor Serviços Avançados da FCT Sessão de Abertura - Silvestre Lacerda, Diretor da DGLAB e Francisco Santos, Vice-Presidente da FCT Sessão de Abertura - Jane Winters, Digital Preservation Coalition (DPC) Sessão de Abertura - Silvestre Lacerda, Diretor da DGLAB e Francisco Santos, Vice-Presidente da FCT Sessão de Abertura - Silvestre Lacerda, Diretor da DGLAB e Francisco Santos, Vice-Presidente da FCT Sessão de Abertura - Silvestre Lacerda, Diretor da DGLAB e Francisco Santos, Vice-Presidente da FCT Augusto Ribeiro, Universidade do Porto, UPDigital 3º painel - Encontro Novos Caminhos para a Preservação e o Acesso à Informação Pedro Penteado, DGLAB wdpd_encontro-preservacao-vasco-rato-arquivo-pt wdpd_encontro-preservacao-pedro-gomes-citationsaver-fccn-1 wdpd_encontro-preservacao-rita-cepa-nova-fcsh wdpd_encontro-preservacao-pedro-gomes-citationsaver-fccn wdpd_encontro-preservacao-joao-pedro-oliveira-nova-fcsh wdpd_encontro-preservacao-uab-madalena-carvalho wdpd_encontro-preservacao-suzana-oliveira-act-fct-1 wdpd_encontro-preservacao-susana-torrao-pedro-cavaco-nova-fcsh wdpd_encontro-preservacao-inacia-rezola wdpd_encontro-preservacao-inacia-rezola-1 moises-rockembach arquivamento-da-web-moises-rockembach paula-meireles-inacia-rezola-sessao-de-encerramento pedro-principe-uminho wdpd-paula-meireles

Credits: photos by Leonor Arrimar (FCT). Included are some images of mobile devices sent in by participants.

Video


Video by Leonor Arrimar (FCT)

Know more

Previous editions of World Digital Preservation Day with Arquivo.pt

Arquivo.pt received the award for Best Central Public Administration Digital Project

Arquivo.pt receives Award for Best Governmental service

Last updated on October 31st, 2024 at 12:42 pm

premio-acepi-2024-atribuido-a-arquivo-pt

Arquivo.pt, a digital service of the Foundation for Science and Technology (FCT)-FCCN Unit,  was one of the winners of the Navegantes XXI Awards, 2024 edition.

Arquivo.pt won the award in the category of “Best Digital Project of Central Public Administration”.

This category annually recognizes a project that has contributed “unequivocally to the development of the Central Public sector through digital means, as well as the Digital Economy in Portugal”.

The Head of Arquivo.pt Daniel Gomes, the FCCN Deputy General Coordinator Salomé Branco and the FCT vice-president Francisco Santos were present at the ceremony held on October 24 at the Técnico Innovation Center in Lisbon and received the award.

Arquivo.pt receives Award for Best Governmental service

WhatsApp Arquivo.pt vence Prémio Navegantes XXI2024-10-25 at 14.30.42
Arquivo.pt vence Prémio Navegantes XXI
Arquivo.pt receives Award for Best Governmental service
Premios-Navegantes-XXI-Arquivo.pt_
Equipa do Arquivo.pt recebe Prémio Navegantes XXI
WhatsApp Arquivo.pt vence Prémio Navegantes XXI2024-10-25 at 14.30.42 Arquivo.pt vence Prémio Navegantes XXI Arquivo.pt receives Award for Best Governmental service Premios-Navegantes-XXI-Arquivo.pt_ Equipa do Arquivo.pt recebe Prémio Navegantes XXI

Navegantes XXI Awards

The Navegantes XXI (Navigators XXI) Awards are an annual initiative by ACEPI – Digital Economy Association, created with the mission “To Promote and Develop the Digital Economy in Portugal”.

The competition rewards the best of the Digital Economy and Society in Portugal in its most diverse aspects. It currently comprises 20 categories that reward the most innovative and digital transformation Portuguese projects, ideas and institutions. Three prizes are also awarded for special categories outside the competition.

Meet all the winners.

Save websites before they disappear with the Browsertrix Crawler tool

print-browsertrix-tutorial

Last updated on September 14th, 2024 at 10:07 pm

The month of September marks the beginning of a year’s work and also the end of many websites that are hopelessly lost. Remodelled or shut down without making a good copy of their content, this is how historic websites are lost unnecessarily.

There are tools that allow websites to be saved immediately by the organisations that manage them. In addition, there is the on-demand archiving service for high-quality websites that Arquivo.pt provides to partner organisations or in occasional collaborations.

This article aims to highlight the Browsertrix Crawler used by Arquivo.pt, without excluding other tools, which can be useful to information managers and IT departments.

Use of Browsertrix-crawler by Arquivo.pt for high-quality collections

Browsertrix Crawler is a tool that lets you record entire websites and lists of web pages automatically and in a format compatible with web archives.

Arquivo.pt uses the Browsertrix Crawler to make high-quality site collections (RAQs) on-demand of the community. For example, when a site is about to be shut down, when it’s going to undergo remodelling or, periodically, to maintain a good history of a particular site.

An illustrative case is the Almada City Council website, recorded in April 2021 at the request of the Municipal Archive. Another case is the website of the newspaper Notícias de Leiria, which was recorded before its closure in December 2023.

Requests for high-quality collections (RAQs) to Arquivo.pt are increasingly frequent: 77 requests from January to September 2024. This is a sign that there is greater concern about the preservation of web content.

What you need to use Browsertrix-crawler locally

The group that developed the Browsertrix Crawler, Webrecorder.net, led by Ilya Kreymer, has the motto ‘web archiving for all’. Its tools make it possible to record the Internet in a decentralised way and on a small scale.

The Browsertrix Crawler is available and can be installed on your computer for small collections.

The basic version of Browsertrix that Arquivo.pt is using requires basic command line knowledge, which is the only barrier for non-experts.

From Arquivo.pt’s own experience, using the Browsertrix Crawler is easy in multidisciplinary teams, where there is always someone with minimal knowledge to use Linux commands and provide occasional support.

Demonstration of recording entire websites on your own computer

To promote the preservation of sites in Web archive format, Arquivo.pt presents a use case for the Browsertrix Crawler. It’s useful for anyone who wants to deepen their knowledge and practice of saving sites in a local environment.

Other tools used by Arquivo.pt to record content

Brozzler: a tool for improving the history of daily and monthly collection sites

Brozzler is a similar tool to Browsertrix Crawler in that it also bases its recording on a browser. It is used and maintained by the Internet Archive.

Arquivo.pt has been using Brozzler since at least 2018 to record web pages with interactive content present on the web pages and for high-quality collections (RAQs).

Lists of up to 200 sites are successfully recorded by Brozzler. For example, the 125 daily collection sites (FAWP) are recorded with Brozzler at the beginning of each month.During the month, another list of 75 monthly collection sites (MAWP) is recorded using Brozzler.

At the end of 2023, Arquivo.pt compared Brozzler and Browsertrix Crawler and chose to keep these two tools.

Heritrix, pywb and ArchiveWeb.page: tools for thousands of sites or one page

The Heritrix crawler is Arquivo.pt’s main recording tool. It is used on huge lists of websites, such as the .PT domain sites, to which other Portuguese sites are added, totalling  more than half a million.

On the opposite side is the ArchiveWeb.page extension, which Arquivo.pt uses for short page-by-page recordings and also for the Web archiving: do-it-yourself! training course.

To complete the list of recording tools used by Arquivo.pt, mention should be made of pywb, which comes into play, for example, when an Arquivo.pt user uses the ‘Complete the page’ functionality or the SavePageNow service.

2024 European and Portuguese elections in special Arquivo.pt collections

European Elections

Last updated on October 9th, 2024 at 05:48 pm

Arquivo.pt made special collections on the three elections that took place this year: the Parlamentary elections on 10 March, the elections in Madeira island on 26 May and the European elections on 9 June.

More than 70,000 pages with content related to the elections and political life in Portugal and Europe were identified and around 4 terabytes of information collected.

We would like to thank the people who contributed to the selection of pages. Teachers and students are encouraged to do work using the special collections on elections that Arquivo.pt has produced over the years.

Find out more about the collection procedure and the results obtained.

Portuguese Parlamentary Elections (Legislativas 2024)

The Portuguese Parlamentary Elections  took place on 10 March 2024 to elect the members of the Assembly of the Republic for the 16th Legislature of the Third Portuguese Republic.

We would like to highlight the community’s contribution to this collection with a manual selection of 827 pages, which helped to improve the quality of the collection.

Around 500 compound terms or keywords were used to search for content published on the web about the elections. The service used for the automatic search was the Bing Search API. The results were limited to the top 20.

For example, the compound term ‘head-to-head legislative 2024’ found pages relating to debates between candidates. The term ‘legislative housing 2024’ found pages relating to party proposals for housing. The term ‘legislativas 2024 site:expresso.pt’ identified Expresso pages about the elections. The names of the candidates were also used.

After the elections, search terms specific to that period were used, such as ‘legislative victory 2024’, ‘legislative defeat 2024’ or ‘legislative results 2024’, among others.

The automatic search in the Bing Search API resulted in 34,120 addresses obtained before the elections and 5,803 after the elections.

The websites of political parties, including parties without parliamentary seats, were also collected during the election period.

Not all the content identified could actually be recorded, due to the limitations of the recording tools or the restrictions of the websites themselves.

The tools Heritrix, Brozzler and Browsertrix-cloud (beta version), courtesy of Webrecorder.net, were used for the recording.

The recording took place between 6 and 20 March and resulted in 3.2 Terabytes of information. The contents have been included in the EAWP45 special collection and will be available after one year.

To find out more, consult the open dataset:

Madeira Legislative Assembly elections 2024

The elections for the Legislative Assembly of Madeira took place on 26 May. Arquivo.pt carried out a special collection of content published on the web.

We began by automatically searching for news, election pages and websites related to the elections in Madeira. We used a list of search terms to put into the Bing Search API.

The aim was to obtain as many URLs as possible related to the event or topic in question, i.e. the Madeiran elections. To do this, several limits were set for the results: top 10, top20, top50 and top100. This process was documented, which shows that the more we expand the number of results, the greater the number of pages that are not very relevant and sometimes outside the intended target.

All the addresses (12,656) were recorded on 7 June in the Heritrix crawler.

Find out more by consulting the open dataset:

European elections 2024 in multilingual collection

The European elections took place on 9 June in Portugal. In some countries, such as Estonia, Czechia and Italy, the elections were held on a different date.

Arquivo.pt collected pages relating to the European Elections in the 27 countries of the European Union and in the 24 official languages.

The same methodology was used for the 2019 European Elections collection, i.e. a multilingual and semi-automatic search.

A list of 40 compound terms or keywords was used and translated into the 24 official EU languages. The terms were translated into the various languages in 2019 by the EU Publications Office. This resulted in a multilingual list of 960 terms to put into the Bing Search API.

Before the elections, on 3 June, the first search was carried out, resulting in 8,986 unique addresses, limiting the number of results to the top 20.

After the elections, new search terms were added with the names of the main candidates for the European Parliament in each country of the European Union. This second post-election search yielded 15,371 unique addresses.

The tool used for this collection was Heritrix. The collection was limited to three ‘hops’. In this case, the crawler follows links up to three times. This means that we opted for a certain restraint in the depth of the recording. Three ‘hops’ in the Heritrix crawler is enough to record one page (in other applications also called ‘page’ or ‘single page’ recording).

The content was recorded between 7 and 20 June and included in the EAWP46 special collection. It will be available after 1 year.

Find out more by consulting the open dataset:

Know more about past collections about elections

Portuguese at the 2024 Olympics and Paralympics in IIPC’s international collection of websites

print-replay-comiteolimpicoportugal-

Last updated on September 11th, 2024 at 04:23 pm

print-noticia-rtp-rececao-atletas-paralimpicos-paris-2021
Paralympic Games. Miguel Monteiro, gold medallist, returns to Lisbon (News on the RTP website, 2 September, selected for international collection)

Arquivo.pt has contributed to the international collection of web pages on the Summer Olympics Games taking place in Paris from 26 July to 11 August 2024 and is doing the same for the Summer Paralympics taking place from 28 August to 8 September.

The initiative to create the “2024 Summer Olympics/Paralympics IIPC CDG” collection is the responsibility of the International Internet Preservation Consortium (IIPC), the world’s leading organisation in the field of Internet preservation, through its Content Development Working Group.

The IIPC’s collaborative collections aim to promote the creation of thematic collections and collections based on international events. The web pages are recorded and then made available on the Archive-it service.

The pages of this collection will also be available on Arquivo.pt for those who want to carry out studies on sport and Olympism.

How the pages about Portuguese athletes were selected

At the Olympic Games 73 athletes represented Portugal in 15 sports, and at the Paralympic Games 27 athletes in 10 sports.

The criterion for selecting pages for the international collection was news about the athletes. For each athlete, pages were selected about their expectations before the games, their performance in the competition and their comments during and after the competition.

Some athletes have more news selected than others, and the same goes for the sites from which the news comes. The selection of pages was not limited to the first results presented by the search engine. We looked for a variety of channels and news from regional and local sites, some from the region or city where the athletes came from.

More than 500 pages to remember the Portuguese presence in Paris

The contribution of Arquivo.pt, as you can see in the table, already has more than 500 web pages.

print-tabela-seeds-ilustrativa-jogosolimpicos
Portuguese Seeds – 2024 Summer Olympics and Paralímpics, International Internet Preservation Consortium – Content Development Working Group (IIPC CDG)

Collaborate in the collection via the IIPC form

Helena Byrne, curator of web archives at the British Library and main curator of this collection, invites everyone to send in interesting pages to record: And we’re off – Get Involved in Web Archiving the Summer Games – Paris 2024.

The following public form is available to contribute:

2024 Summer Olympics & Paralympics

IPL – Politécnico de Lisboa organised a series of webinars with Arquivo.pt

thumbnail-ciclo-de-webinars-ipl

IPL – Politécnico de Lisboa, through its Distance Learning Group (EaD@IPL), organised a series of webinars for its community dedicated to Arquivo.pt and the preservation of content published on the Internet.

This initiative was attended by IPL – Politécnico de Lisboa lecturers and researchers, as well as people linked to the institution’s communications department.

The cycle of webinars took place in three sessions, between May and July 2024, and followed the training programme that Arquivo.pt has been offering for several years.

Presentation materials

Why training on web preservation is important

Archiving content published on the web and using a web archive on a day-to-day basis is an unusual practice, largely due to the community’s lack of knowledge about the existence and operation of Arquivo.pt.

For example, in this cycle of webinars with the IPL – Politécnico de Lisboa, participants were given tools that allow them to use the web archive immediately and creatively, such as the SavePageNow service, the historical content search service and, for use in interdisciplinary teams, Application Programming Interfaces (APIs).

As a result of this series of webinars, the collaboration between the IPL – Politécnico de Lisboa and Arquivo.pt was strengthened, with a view to preserving its institutional websites and other interesting content that is available on various online media (news, events, references to teachers, researchers and students).

Meet the winners of the Arquivo.pt Award 2024!

thumbnail-video-venced

Last updated on September 26th, 2024 at 06:13 pm

The winners of the Arquivo.pt 2024 Award were announced by the Público newspaper, the official media partner for this edition.

27 applications were received.

The awards ceremony took place during the closing session of the Ciência 2024 meeting, on 5 July, at Centro de Congressos da Alfândega do Porto.

1st place – “Noticioso – Desafiar percepções”

The winner of the 10,000 euro prize was the work “Noticioso – Challenging perceptions” developed by Carmen Fonseca and António Ramiro (Cubbo Team).

“Noticioso” is a platform where users can compare media coverage of various topics through a game (Quiz). It also allows users to explore trends over time using an analytical tool. How well do you know Portuguese news? Come and find out.

For example, which topic made the most news between 2000 and 2020: global warming or Sporting Club of Portugal? Arquivo.pt data says it was Sporting football club.

2nd place – “Habitação.PT: Uma visão do Mercado de Habitação em Portugal”

The 2nd prize of 3,000 euros was awarded to the work “Habitação.PT: Uma visão do Mercado de Habitação em Portugal” (“Habitação: An overview of the housing market in Portugal”) by Diogo Gonçalves.

“Habitação” is a tool that allows the user to interactively explore the evolution of the average value of the Portuguese housing and rental market, contextualised with news published on the subject and housing policies.

For example, in Lisbon in 2009 the price was around 1600 €/m2, rising to 4800 €/m2 in 2023. The rise in housing prices is contextualised by news stories over time.

3rd place – “Pegada Lusa”

The 3rd place prize of 2,000 euros was awarded to the work “Pegada Lusa” (Portuguese green footprint), developed by Sérgio Teixeira and Diana Teixeira.

“Pegada Lusa” is a work that shows the evolution of sustainable policies and initiatives in the various regions of the country, based on an analysis of projects and good practice from the United Nations Sustainable Development Goals (SDGs).

For example, the Porto region has a sustainability index (“Green Score”) of 57%, based on the content of the news analysed.

Honorable Mention granted by Público newspaper: “Uma viagem no tempo com o Público e o Expresso”

The newspaper Público, official partner of the 7th edition of the Arquivo.pt prize, awarded its Honourable Mention to the work “Uma viagem no tempo com o Público e o Expresso” (“A time travel with Público and Expresso newspappers”), by Rita Marques Costa and Beatriz Malveiro.

“A journey through time with Público and Expresso” analyses and compares the web pages of Público and Expresso since 1998, showing the website user how the digital versions of these media have evolved.

For example, in 2014, both Público and Expresso began to emphasise the headlines on their homepages and Expresso began to have a daily digital edition.

Honorable Mention granted by Aveiro Media Competence Center (AMCC): “discordAR: a Proximidade dos Partidos na Assembleia da República”

The Aveiro Media Competence Centre (AMCC) has awarded its Honourable Mention to the work “discordAR: a Proximidade dos Partidos na Assembleia da República”, by Miguel Salema and Sebastião Fonte.

“discordAR: The Proximity of the Parties in the Assembly of the Republic” is an app that shows the proximity between political parties, using votes in the Portuguese Parliament.

For example, we can see the percentage of votes in the same direction between the Parties in the period relating to the XII Legislature (2012 to 2015).

Honorable Mention granted by .PT: “ArquivoNC – o arquivo web do Jornal de Notícias da Covilhã”

The DNS.PT Association awarded an Honourable Mention to the Professor who encouraged the submission of the work “ArquivoNC – o arquivo web do Jornal de Notícias da Covilhã” (“ArquivoNC – the web archive of the Jornal de Notícias da Covilhã”), thus promoting the use of Arquivo.pt as a training and learning tool in the classroom. The work was created by student Rodrigo Dias da Silva, supervised by Professor Ricardo Campos, from the University of Beira Interior (UBI).

“ArquivoNC – the web archive of the Jornal de Notícias da Covilhã” is a work within the scope of the final project of the Engineering course at the University of Beira Interior (UBI) that provides access to ten years of web pages of the newspaper Notícias da Covilhã from the news preserved by Arquivo.pt between 2009 and 2019.

Awards Ceremony

Image gallery

premio-arquivo--10
premio-arquivo--8
premio-arquivo--7
premio-arquivo--6
premio-arquivo--1
premio-arquivo--3
premio-arquivo--4
premio-arquivo--2
premio-arquivo--9
premio-arquivo--10 premio-arquivo--8 premio-arquivo--7 premio-arquivo--6 premio-arquivo--1 premio-arquivo--3 premio-arquivo--4 premio-arquivo--2 premio-arquivo--9

Interviews

Dissemination materials

Press

Know more

Higher education library mobility program brings professionals to Arquivo.pt

FCCN_A Minha Biblioteca_24 maio 2024_2

Arquivo.pt operado pela FCCN FCT e localizado no Campus do LNEC

Arquivo.pt headquarters, operated by FCCN FCT, in Lisbon.

On May 24, the FCCN welcomed professionals from Higher Education Libraries (HEL) for the first time as part of the program promoted by the Higher Education Libraries Working Group (GT-BES) of the Portuguese Association of Librarians, Archivists, Documentalists and Information Professionals (BAD), My library is your library.

This is a mobility program that aims to carry out short-term visits with a view to exchanging experiences and hands-on contact with good practices, fostering collaboration and knowledge of Portuguese HEIs among professionals in the field.

Advanced services for knowledge

In this first edition of the program at FCCN, the participating colleagues (3 professionals from the University of Lisbon and 1 from the Catholic University of Porto) were offered a tour of the digital support services for higher education institutions operated by FCCN-FCT

Some services are familiar to information professionals, such as B-On and RCAAP. Others are back-office services and therefore less visible, but they are essential for higher education institutions. For example, Eduroam, which guarantees access to the Internet, RCTSaai for authentication or RCTS CERT for responding to security incidents.

Highlights include the Arquivo.pt and NAU services

The day highlighted Arquivo.pt and the NAU Platform, two services in the field of knowledge that are available to higher education institutions and also to society.

The Arquivo.pt team showed the backoffice of this Internet preservation service in Portugal and carried out a practical exercise in recording and integrating content into the web archive.

The NAU Platform is a platform for MOOCs (Massive Open Online Courses) created with the aim of democratizing knowledge, promoting digital literacy, enabling education and training for broad communities of users, particularly the Portuguese and Lusophone population.

More recently, with its integration into the North American platform edx.org, it has also been made available to all potential Portuguese-speaking trainees around the world. Participants in the program were shown how to build a MOOC course on the edx platform.

The program also included a visit to the Data Center and the professional television studio at the FCCN.

Visit by participants in the Higher Education Libraries mobility program to the FCCN Tv Studio
Visit by participants in the Higher Education Libraries mobility program to the FCCN Tv Studio

To know more

Week job shadowing at the Arquivo.pt from Prague to Lisbon

FCCN TV studio

By: Marie Haškovcová and Luboš Svoboda, Webarchiv, National Library of the Czech Republic, May 13th to 17th, 2024.

A visit within the EU Erasmus+ programme

Thanks to the EU Erasmus+ programme, focused on adult education – staff mobility, we were able to spend a week job shadowing at the Portuguese web archive Arquivo.pt and compare the strategies of the Czech web archive – Webarchiv with the approaches of our Portuguese colleagues.

In both cases, these are archives focused on national (Czech and Portuguese) content on the Internet.

The Arquivo.pt

While the Czech web archive is part of the National Library of the Czech Republic, the Portuguese archive (Arquivo.pt) is part of the FCCN, under the FCT – Foundation for Science and Technology, which aims to contribute to the development of science, technology and knowledge.

FCT provides IT services to the Portuguese higher education and research system, as well as high-speed internet connectivity. The institutional background of both archives is also reflected in the specifics of their concepts.

The visit included a presentation of the team and the campus and departmental spaces, a presentation of the activities of both archives and a discussion of the different aspects of our work – technical and curatorial tools, technologies and processes, the legislative environment and ethical issues, data storage, some services, research activities, perspectives and future plans.

The Czech web archive

The Czech web archive was founded in 2000, the oldest archival copies date back to 2001 and currently has more than 580 TB of data. Like Arquivo.pt, it harvests content on a national domain based on a list of url addresses obtained from its provider. It supplements these so-called comperhensive harvests with thematic and selective harvests in its acquisition strategy.

Topic collections relate to a specific topic or event, can be one-off or continuously built, and combine manually selected and automated scraped resources. Selective ones are intended for long-term harvesting, have detailed cataloging records that are part of the Czech national bibliography and are licensed – archival copies are therefore freely available through the catalogue.

From the Webarchive’s research activities, we presented our project aimed at detecting so-called dead webs through the Extinct Websites application and creating a database to serve as a basis for monitoring broader changes in the Czech web, and the WACloud project aimed at extracting big data from the web archive.

Exchanging knowledge and experience

Among the Portuguese projects we were interested in, for example, CitationSaver, and we also discussed the Memorial project, the harvesting of the Portuguese Wikipedia, and the activities of the Portuguese archive related to education in web archiving (training courses).

The meeting was enriched by the discussion of specific topic collections.

  • The Czech net art collection documents digital art and its transformation in the online space, providing a unique art historical perspective.
  • Another important collection is the Social networks of Members of Parliament of the Czech Republic 2021-2025 collection, which preserves the online communications and interactions of Czech MPs, invaluable for the study of political marketing and public political life.
  • The GitHub collection archives important repositories from this popular developer platform, preserving key domestic software projects and their code for future generations.
  • Finally, the Crypto, NFT, Blockchain, Web3, Metaverse collection charts the rise and impact of technology in the digital asset space. These collections are key resources for research and analysis of digital culture, policy, and technology, and the discussion of these collections at web archivist meetings contributes to the further development of archival methods and technological innovation.

We focused on exchanging knowledge and experience in seeds acquisition, workflow optimization and sharing technical tips and tricks.

Sharing best practices

We discussed best practices for identifying and collecting key web resources, a critical step in ensuring a comprehensive and representative archive. We shared various strategies for automating and streamlining workflows, including the use of web scraping tools and advanced content filtering.

Technical discussions included solutions to common problems such as harvesting dynamic web pages and overcoming access restrictions. The meeting provided a valuable platform for sharing innovative methods and fostering collaboration among experts, furthering the development of effective and sustainable digital archiving.

Erasmus+ visti to FCCN TV studio
Luboš Svoboda, web curator, Marie Haškovcová, chief of the Webarchiv e Ricardo Basílio, Arquivo.pt web curator visiting the FCCN-FCT TV Studio.

Exhibition of old websites to mark International Museum Day

Heritales Crowd-Recycling e Arquivo.pt no Dia Internacional dos Museus

May 18, International Museum Day, was celebrated all over the country with free admission, guided tours, entertainment and exhibitions related to memory and heritage.

Arquivo.pt contributed with an exhibition of old web pages, entitled “Digital Memory through the Internet of the Past”, which was on display at one of the stands at the National Coach Museum in Lisbon.

The pages were selected to show different aspects of the Alentejo over time. From 2016, pages relating to the Heritales project were selected.

Heritales and Crowd-Recycling drew attention to the preservation of the Internet’s memory

Heritales is a project based in Évora that aims to study and disseminate heritage in all its manifestations. It is known for its main event created in 2016, HERITALES – International Heritage Film Festival.

Crowd-Recycling is a project focused on good practices for sustainability.

Heritales, Crowd-Recycling and Arquivo.pt carried out this action in collaboration with the aim of giving visibility to content published on the web over time. Preserving and giving access to digital content is fundamental to enhancing heritage.

Why an exhibition of old websites is a good idea

Making an exhibition of websites over time is relatively easy, all you have to do is come up with a theme, which can also be the history of an institution, and choose pages preserved on Arquivo.pt.

An exhibition of old websites is an original idea for the target audience. It often features texts and images that only existed on the web.

By drawing attention to the websites, we realize that many things were left unrecorded and this changes our view of the content we publish today. We start taking more care to save important pages, for example by taking action or saving them on the spot with SavePageNow.

Heritales Crowd-Recycling e Arquivo.pt no Dia Internacional dos Museus
Heritales, Crowd-Recycling and Arquivo.pt on International Museum Day at the National Coach Museum

World Internet Day was on May 17th

The day before International Museum Day was World Internet Day (May 17). The proximity of the two commemorations ties in with the theme of preserving memory.

Portugal connected to the Internet for the first time in 1991, with the FCCN project “RCCN IP Service”.

To remember how it all happened, here are the three suggestions that FCCN published on social media for this day: