Portuguese municipal elections 2021 preserved by Arquivo.pt

thumbnail_eleicoes_autarquicas

Last updated on May 8th, 2023 at 05:09 pm

Thousands of pages about the elections to preserve before they disappear

On 26 September 2021 the local elections were held in Portugal, an event marked by the Covid-19 pandemic. The communication of the candidates was mainly based on the media and publications through the Web.

Electoral websites are of manifest historical importance. However, they are difficult to identify because they appear and disappear quickly. In the case of municipal elections, the number of candidates and the variety of channels used makes the task even more challenging.

Arquivo.pt, as in previous elections, launched a special collection to preserve contents concerning the municipal elections.

How was the electoral content published on the Web identified

The first step was the manual identification of election-related content by municipality and parish. For this purpose help was requested from people and organisations with the following initiatives:

  • collaborative list “Municipal Elections 2021: we need your help!
  • request for collaboration from the archive services of the 308 municipalities in the identification of electoral sites and candidates of the respective municipality;
  • request to the Parties to send the names of their lead candidates.

The Eyedata – Social Data Lab site was used, which made the names of candidates from all over the country available on the Web.  The Wikipedia page Eleições autárquicas portuguesas de 2021 was also used as a source of information.

This manual identification process resulted in a list of 255 addresses which documented the candidacies for the 2021 Municipal Elections. Notice that 61% of the identified addresses pointed to private social media platforms: 54% facebook.com, 5% instagram.com and 2% twitter.com).

Much of this content of national interest could not be preserved because these foreign private companies do not allow it.

The list with names of candidates by county, party or coalition was used to create automatic searches in Bing that identified the most relevant electoral contents.

For instance, by combining the term “autárquicas 2021” with the name of a candidate and the respective municipality, one obtains results related to that candidate, such as news, initiatives of his/her campaign or the official page of his/her electoral campaign.

This methodology was applied in the Presidential Elections 2021 and in the Europeia Elections 2019. The technical report A transnational crawl of the European Parliamentary Elections 2019 details the applied methodology.

Content collection and availability in Arquivo.pt

Between 22nd August and 8th October 2021, the Arquivo.pt gathered, in an exhaustive manner, pages related to the Local Government Elections 2021.

The resulting collection called Municipal Elections 2021″ (EAWP39) gathers 31 million files that total 2.7 TeraBytes of information and will be available one year later.

Researchers who want to make a study on the 2021 Local Elections and need early access to the collected contents can contact Arquivo.pt.

To know more

Memory of events and festivals of art: PARA SEMPRE

Thumbnail-projeto-para-sempre

Last updated on February 8th, 2022 at 10:57 am

The exhibition Memória de festivais e eventos de arte proposes a look at the Portuguese art scene present on the Web and includes a chronology of these events.

This online information product is a presentation of the results in a systematic and structured way of the PARA SEMPRE project.

cartao-expo-memoria-festivais-e-eventos-de-arte

Online exhibition – arteparasempre.wordpress.com

The project’s second online product will be a directory of references of artists, galleries and projects in the area of contemporary Portuguese art to be made available during 2022, at the Gulbenkian Art Library webpage.

Cycle of Webinars “Art forever on the web”

A cycle of Webinars entitled “Art forever on the web” was held, between April and July 2021, oriented to artists, curators, gallerists and event producers, among others.

The average number of participants was 58 per session, who evaluated their satisfaction, on a scale from 1 to 5, with an average score of 4.6. The three sessions aimed at disseminating knowledge about digital preservation of information on the web and requirements for publishing preservable information.

Identification of artists, galleries and projects

The first step was to identify relevant artists, galleries and projects in the contemporary Portuguese art scene. We started from an initial set of 63 agents (artists, galleries and projects), to which 573 artists belonging to the Modern Collection of the Calouste Gulbenkian Foundation and the BAA – FCG Collection of Artist Books and Independent Publishing were added.

Throughout these months, 636 elements were thus identified (social networks and websites active in 2020), which were subsequently analysed.

The conclusions of the analysis carried out within the project were presented in the last webinar, held on July 1, 2021 :

Special feature on art websites and blogs

In April 2021, Arquivo.pt made a special collection based on the initial identification of artists, galleries and projects and obtained 2.8 terabytes of preserved information.

New contents about art websites were recorded, using tools that allow higher quality collections, such as Brozzler and Webrecorder.

A collaborative project of digital curation

“PARA SEMPRE” (forever) is a digital curatorial project applied to the information made available on the web by the several agents of the contemporary Portuguese art scene (artists, galleries and hybrid sites).

Its main purpose is to contribute to the preservation/reuse of past and future pages, to ensure the preservation of the digital memory of current Portuguese art available at Arquivo.pt, and to promote knowledge on this theme by presenting it in a systematized and structured way.

Its creation results from the encounter of the missions of two organizations: one that aims to ensure the preservation of the Portuguese web, Arquivo.pt, and another that assumes itself as an agent in the development of knowledge about contemporary Portuguese art, the Calouste Gulbenkian Foundation Art Library. This is part of the ROSSIO (Research Infrastructure in the Social Sciences, Arts and Humanities).

Training in colaboration with the City Council of Lisboa

Thumbnail_passaporte-competencias-digitais-arquivopt

Last updated on December 13th, 2021 at 12:02 pm

print_passaporte-competencias-digitais

A cycle of webinars was held between October and December 2021, organised by the Department of Development and Training of the Municipality of Lisbon, within the digital skills program Passaporte Competências DigitaisCâmara Municipal de Lisboa, in collaboration with Centro Qualifica +ValorLx, a Infraestrutura ROSSIO and Arquivo.pt Fundação para a Ciência e a Tecnologia I.P.

The aim of this initiative was to present the services of Arquivo.pt and disseminate their use so that the historical heritage published on the web can be preserved and exploited by any citizen.

The sessions were open by registration and had a total of 126 participants (average of 31 per session).

The speakers’ presentations were recorded and can now be accessed, along with the slides from each session.

Sessions held

September 15 – Arquivo.pt. What is it? What is it for?

Daniel Gomes, manager of Arquivo.pt, the public Web preservation service operated by the Fundação para a Ciência e a Tecnologia, I.P., explains how any citizen can use to consult Web pages from the past in the most diverse cases and talks about the importance of preserving the digital memory.

November 11 – API Arquivo.pt : automatic acess to the Web preserved information

Vasco Rato, web developer of Arquivo.pt, presented the Arquivo.pt’s APIs (Application Programming Interface). These enable the development of innovative and useful applications for organizations through the automatic processing of historical information preserved from the Web.

November 25 – Archive the Web: do-it-yourself!

Ricardo Basílio, curador digital do Arquivo.pt, apresentou um tutorial sobre a utilização das ferramentas do Webrecorder.net para gravação de páginas Web em formato normalizado no próprio computador, a qual permite que uma pessoa ou uma organização possa organizar em pequena escala o seu próprio arquivo da Web.

December 9 – Publish on the Web: best practices  by Arquivo.pt

Pedro Gomes, the engineer responsible for the Arquivo.pt crawls, addressed the issue of publishing preservable web contents. How many contents are in formats that make their future access difficult or impossible? These situations were illustrated with practical cases and recommendations on how to avoid them. Therefore, it all boils down to publishing well in order to preserve well.

Know more about Arquivo.pt training

Arquivo.pt is open to collaborations aiming at training professionals in organizations or common citizens on Web preservation.

Learn about the training modules and contact us.

 

H2020 projects preserved by Arquivo.pt

Thumbnail H2020 projects

Last updated on June 16th, 2023 at 01:40 pm

The main objective of Arquivo.pt is to preserve online information for research and education purposes.

Previously, Arquivo.pt identified and preserved Research & Development project websites funded by the European Union during the FP4, FP5, FP6 and FP7 programmes (1994-2013).

Now, Arquivo.pt contributed to preserve online information that documents R&D projects funded by the Horizon 2020 programme (2014-2021). It preserved 197 million web files (17 TB) related to science for future access.

H2020 projects publish valuable information online but are being lost

Websites about Research and Development (R&D) projects are increasingly being used to publish and disseminate important scientific information that complements published literature (e.g. data sets, documentation or software).

However, after projects ending, the corresponding websites usually disappear causing a permanent loss of unique and valuable scientific information.

Arquivo.pt automatically identified URLs that document H2020 Research and Development projects

The European Union’s Open Data Portal published a data set from the Community Research and Development Information Service (CORDIS) that documents H2020 research projects. However, from the 31 129 projects listed, only 46% presented a project URL.

Arquivo.pt developed a low-cost methodology that automatically identifies URLs related to R&D projects to be systematically preserved. This automatic identification is achieved through the combination of open data sets with web search services. This methodology is detailed on a scientific article published at the International Conference on Digital Preservation 2016.

In sum, we extracted 106 300 unique URLs from the following open data sets:

Then, we extracted the acronym and title of the projects from the data sets and automatically searched the web for additional URLs using the Bing Search API.

All the data sets and tools developed have been made publicly available in open access so that they can be reused and collaboratively enhanced. In particular, you can access the software developed to automatically identify additional URLs about H2020 projects.

197 million web files related to science were preserved

Arquivo.pt identified and preserved 197 million web files (17 TB) that document R&D projects funded by Horizon 2020.

In 2021, we can already witness project websites that are no longer available online, such as the Extended Model of Organic Semiconductors (EXTMOS) project (http://extmos.eu/). However, it was preserved and can be accessed at Arquivo.pt:

Archived version at Arquivo.pt (https://arquivo.pt/wayback/20170427182603/http://extmos.eu/) of the home page of the EXTMOS Research and Development project (http://extmos.eu/)funded by H2020.
Archived version at Arquivo.pt of the home page of the EXTMOS Research and Development project funded by H2020.

Contributions to complement the European Open Data Sets

All the resulting data sets were made publicly available so that they can be improved and reused by other organizations also interested on preserving this digital heritage:

If you want to know more information about this collection you can watch the video Preservation of web content related to Horizon 2020.

References

Are you a researcher?

“Major Minors” on World Digital Preservation Day 2021

Last updated on December 13th, 2021 at 12:03 pm

The winners of the Arquivo.pt Award 2021 were the guests of the Arquivo.pt online session on World Digital Preservation Day 2021.

As in previous years, Arquivo.pt joined this international initiative by holding an open session, where useful knowledge will be shared with the community.

Paulo Martins, Leandro Costa and Jose Carlos Ramalho, who guided this work, spoke about the “Major Minors” project and how they used the contents preserved by Arquivo.pt.

The “Major Minors” project is an ontology of press clippings from Portuguese newspapers with reference to social minorities.  It aims to map and study the representation of minorities in the Portuguese journalistic context over the first two decades of the 21st century.

Please share the slides and the video.

Agenda

November 4th

3:00 pm – Welcome and news by Daniel Gomes (slides PDF, 3MB)
3:10 pm – Major Minors by Paulo Martins, Leandro Costa  and José Carlos Ramalho (Slides PDF, 5MB)
3:40 pm – Questions and answers
4:00 pm – End

Session video

Create automatic narratives about any topic!

thumbnail-narrative-q2

Arquivo.pt provides a new function that allows you to automatically create temporal narratives on any topic.

The “Narrative” functionality, integrated into Arquivo.pt in September 2021, is the result of the collaboration between “Conta-me Histórias”, winner of the Arquivo.pt Award 2018, and Arquivo.pt.

The Conta-me Histórias” (Tell me Stories) project was developed by researchers from the Laboratory of Artificial Intelligence and Decision Support (LIAAD – INESCTEC )  and affiliated to the institutions Instituto Politécnico de Tomar – Center for Research in Smart Cities (CI2) ; University of Porto and University of Innsbruck .

How it works?

When a user enters a set of words about a topic in the Arquivo.pt search box and clicks on the “Narrative” button, the user is directed to the “Conta-me Histórias” service, which automatically analyzes the news from 25 websites archived by Arquivo.pt over time and presents a chronology of news related to the topic.

For example, if we search for “Just Bieber” and click on the “Narrative” button (Figure 1), we will be directed to the “Conta-me Histórias” , where we will automatically obtain a narrative of archived news (Figure 2).

example-narrative-arquivopt

Figure 1: Search results for pages about “Justin Bieber”.

example-tell-me-stories-arquivopt

Figure 2: Narrative of news about “Justin Bieber” from Portuguese news sites preserved by Arquivo.pt generated by the “Conta-me Histórias” service.

Create your narrative now!

“Conta-me Histórias” researches, analyzes and aggregates thousands of results to generate each narrative about a topic. It is recommended to choose descriptive words about well-defined themes, personalities or events to obtain good narratives.

Creating a narrative is useful for researchers, journalists or citizens who want to quickly get an overview of the evolution of a topic along time, thus saving them a lot of time and effort.

Go to Arquivo.pt and try to create a narrative about a theme of your choice.

Tell us about your experience so we can improve the service!

Book “The Past Web: Exploring Web Archives” available in Green Open access!

thumb-the-past-web

Last updated on September 13th, 2022 at 04:15 pm

Since 2006, a book has not been published that reflects the state-of-the-art in the area of ​​web preservation and the research that has been conducted on web archives.

The main goal of the new book The Past Web: Exploring Web Archives was to create a new, up-to-date resource to educate more people in the field of web preservation and to make web archives known to researchers and academics.

As such, the book is primarily aimed at the academic and scientific communities, and presents the most innovative methods for exploring information from the past preserved by web archives.

Daniel Gomes, head of Arquivo.pt led the book’s editorial team, along with the field specialists Elena DemidovaJane Winters and Thomas Risse. In total, the book resulted from the contributions of 40 authors from around the world who are experts in web archiving.

The book is divided into 6 parts where we find various resources for exploring pages archived from the Internet since the 1990s.

We can also learn how to preserve our collective memory in the Digital Era, which strategies to use when selecting online content, and what impact web archives have on preserving historical information.

The book aims to support professors in their mission to transmit innovative and adequate knowledge for the digital literacy required to train professionals for the 21st century.

Daniel Gomes from Arquivo.pt, alerts to the need of including web archives in teaching plans and emphasizes that this knowledge brings a great competitive advantage especially for students of Humanities and Social Sciences.

An innovative detail of this book is that all its cited links have been preserved by Arquivo.pt so that the references remain valid over time.

The book was available for free to be downloaded from Portuguese higher education institutions (b-On member entities) until March 6th 2022.

However, you can still download a pre-print version of the book (Green Open Access).

Links

Book launch at Jornadas FCCN 2021

Apresentação do livro
Apresentação do livro
Apresentação do livro
Apresentação do livro
Apresentação do livro
Apresentação do livro
Apresentação do livro
Apresentação do livro
Apresentação do livro Apresentação do livro Apresentação do livro Apresentação do livro Apresentação do livro Apresentação do livro Apresentação do livro Apresentação do livro

2019 websites available and Arquivo.pt surpasses 10 billion files

thumb_notre-dame-paris

Last updated on December 16th, 2021 at 06:43 pm

The information collected from the Web during 2019 is now avaliable in Arquivo.pt (in respect to the embargo period of 1 year).

Printed screen from www.politico.eu preserved by Arquivo.pt, collected in June 18, 2019. Article about the Notre Dame fire in Paris, "Notre Dame fire 'fully extinguished’ as fundraising starts".
Printed screen from www.politico.eu preserved by Arquivo.pt, collected in June 18, 2019. Article about the Notre Dame fire in Paris, “Notre Dame fire ‘fully extinguished’ as fundraising starts”.

Remember and research historical events in 2019, such as

Arquivo.pt has visited 2 million sites and collected 1,7 billion files, 131TB in total, so that you can access the memory of past events.

In 2021, Arquivo.pt provides open access to more than 10 billion files (721 TB) from 27 million websites.

Arquivo.pt certified as an open data provider

selo-dados-gov

Last updated on August 17th, 2022 at 08:39 am

Arquivo.pt has been collaborating with Agência Modernização Administrativa (AMA) with the aim of improving the preservation of Public Administration websites.

Collaboration is based on three action points:

AMA is the public organisation responsible for promoting digital means in Public Administration and aims to modernise and simplify citizens’ access to State services.

Arquivo.pt is a service operated by the Fundação para a Ciência e Tecnologia I.P. that preserves data published on the Web between 1996 and the present day, making them accessible to any citizen for memory and research purposes.

EU open data directive includes documents on websites

The Directive (EU) 2019/1024 of the European Parliament and of the Council of 20 June 2019 on open data and the re-use of public sector information stipulates the following:

“(30) This Directive lays down the definition of the term ‘document’ and that definition should include any part of a document. The term ‘document’ should cover any representation of acts, facts or information — and any compilation of such acts, facts or information — whatever its medium (paper, or electronic form or as a sound, visual or audiovisual recording.

(34) To facilitate re-use, public sector bodies should, where possible and appropriate, make documents, including those published on websites, available through an open and machine-readable format and together with their metadata, at the best level of precision and granularity, in a format that ensures interoperability

(35) A document should be considered to be in a machine-readable format if it is in a file format that is structured in such a way that software applications can easily identify, recognise and extract specific data from it. Data encoded in files that are structured in a machine-readable format should be considered to be machine-readable data. A machine-readable format can be open or proprietary. They can be formal standards or not.

(60) The Commission should facilitate the cooperation among Member States and support the design, testing, implementation and deployment of interoperable electronic interfaces that enable more efficient and secure public services.

Arquivo.pt is a public service that has the mission of preserving documents published on Internet sites to enable their long-term open access and provides interoperable electronic interfaces (APIs) for their automatic processing.

The Portuguese Law No. 68/2021 of 2021-08-26 approves the general principles on open data and transposes the European Directive.

Arquivo.pt was certified as a Public Administration open data provider

The AMA recognized Arquivo.pt as a public service and open data provider and awarded its certification seal on the Open Data Portal.

Arquivo.pt collects general information published on the Web of interest to the Portuguese community. However, it is also responsible for the preservation of Public Administration websites, such as the Portal do Governo, in collaboration with the Management Center for the Government Electronic Network (CEGER).

Any citizen can access the open data resulting from these historical archives and, for example, search for official information published on the websites of successive governments.

In 2021, Arquivo.pt provided open access to over 10 billion files (721 TB) from 27 million websites. The open data preserved by Arquivo.pt can be explored through the search interface, automatically through API (https://arquivo.pt/api) or by reusing derived datasets.

Derived datasets available on the Open Data Portal

Besides the original web artefacts preserved at Arquivo.pt, this service has generated open datasets derived from its activities, which are now available in open access so that they can be reused:

Resources list

Video presentation at the IIPC Web Archiving Conference 2022

Presentations in the IIPC Web Archiving Conference and RESAW 2021

Thumbnail IIPC WAC 2021

Last updated on November 17th, 2022 at 05:37 pm

During the week of 14 to 18 June, three international meetings were held by videoconference with the participation of the Arquivo.pt:

    • International Internet Preservation Consortium (IIPC) – General Assembly – general assembly of the consortium that gathers the Web archiving initiatives around the world
    • Web Archiving Conference 2021 – the most important meeting in the field of Web preservation, where experts share new knowledge and experiences
    • RESAW Conference – meeting of the European RESAW network (Research Infrastructure for the Study of Archived Web Materials) this year in its 4th edition, mainly addressed to the community of researchers from non-technological scientific areas, such as Social Sciences, Arts and Humanities.

Contributions of Arquivo.pt to the international community

Arquivo.pt presented some results of the work developed in the last year, with emphasis on the functionalities that improve the reproduction of the archived contents, such as the “Complete the page”.
Two historical collections were integrated on the Arquivo.pt: the Geocities and the Internet Memory Foundation. Arquivo.pt did special collections about the 2019 European Elections and Covid-19.
The contents of Arquivo.pt are accessible to any researcher regardless of the country they are in and therefore it is a useful service to the international community.

Presentations

  • Arquivo.pt updates 2021: presentation at the IIPC – General Assembly, by Daniel Gomes (Vídeo)
  • Complete the page. 1 minute drop in (presentation at the IIPC – General Assembly “complete the page”), by Daniel Gomes (Slide, Video)
  • A transnational and cross-lingual crawl of the European Parliamentary Elections 2019, by Ivo Branco (Slides, Vídeo)
  • Enhancing access to research the Geocities historical collection, by Pedro Gomes (Slides, Vídeo)
Complete the page - demo
Complete the page – demo. Slide used in the IIPC 1 minute presentation, at the IIPC General Assembly 2021