training in preserving open data published online.
AMA is the public organisation responsible for promoting digital means in Public Administration and aims to modernise and simplify citizens’ access to State services.
Arquivo.pt is a service operated by the Fundação para a Ciência e Tecnologia I.P. that preserves data published on the Web between 1996 and the present day, making them accessible to any citizen for memory and research purposes.
EU open data directive includes documents on websites
“(30) This Directive lays down the definition of the term ‘document’ and that definition should include any part of a document. The term ‘document’ should cover any representation of acts, facts or information — and any compilation of such acts, facts or information — whatever its medium (paper, or electronic form or as a sound, visual or audiovisual recording.
…
(34) To facilitate re-use, public sector bodies should, where possible and appropriate, make documents, including those published on websites, available through an open and machine-readable format and together with their metadata, at the best level of precision and granularity, in a format that ensures interoperability
…
(35) A document should be considered to be in a machine-readable format if it is in a file format that is structured in such a way that software applications can easily identify, recognise and extract specific data from it. Data encoded in files that are structured in a machine-readable format should be considered to be machine-readable data. A machine-readable format can be open or proprietary. They can be formal standards or not.
…
(60) The Commission should facilitate the cooperation among Member States and support the design, testing, implementation and deployment of interoperable electronic interfaces that enable more efficient and secure public services.
…
Arquivo.pt is a public service that has the mission of preserving documents published on Internet sites to enable their long-term open access and provides interoperable electronic interfaces (APIs) for their automatic processing.
Any citizen can access the open data resulting from these historical archives and, for example, search for official information published on the websites of successive governments.
In 2021, Arquivo.pt provided open access to over 10 billion files (721 TB) from 27 million websites. The open data preserved by Arquivo.pt can be explored through the search interface, automatically through API (https://arquivo.pt/api) or by reusing derived datasets.
Derived datasets available on the Open Data Portal
Besides the original web artefacts preserved at Arquivo.pt, this service has generated open datasets derived from its activities, which are now available in open access so that they can be reused:
Web Archiving Conference 2021 – the most important meeting in the field of Web preservation, where experts share new knowledge and experiences
RESAW Conference – meeting of the European RESAW network (Research Infrastructure for the Study of Archived Web Materials) this year in its 4th edition, mainly addressed to the community of researchers from non-technological scientific areas, such as Social Sciences, Arts and Humanities.
Contributions of Arquivo.pt to the international community
Arquivo.pt presented some results of the work developed in the last year, with emphasis on the functionalities that improve the reproduction of the archived contents, such as the “Complete the page”.
Two historical collections were integrated on the Arquivo.pt: the Geocities and the Internet Memory Foundation. Arquivo.pt did special collections about the 2019 European Elections and Covid-19.
The contents of Arquivo.pt are accessible to any researcher regardless of the country they are in and therefore it is a useful service to the international community.
Presentations
Arquivo.pt updates 2021: presentation at the IIPC – General Assembly, by Daniel Gomes (Vídeo)
Complete the page. 1 minute drop in (presentation at the IIPC – General Assembly “complete the page”), by Daniel Gomes (Slide, Video)
A transnational and cross-lingual crawl of the European Parliamentary Elections 2019, by Ivo Branco (Slides, Vídeo)
Enhancing access to research the Geocities historical collection, by Pedro Gomes (Slides, Vídeo)