Arquivo.pt has contributed to the international collection of web pages on the Summer Olympics Games taking place in Paris from 26 July to 11 August 2024 and is doing the same for the Summer Paralympics taking place from 28 August to 8 September.
The pages of this collection will also be available on Arquivo.pt for those who want to carry out studies on sport and Olympism.
How the pages about Portuguese athletes were selected
At the Olympic Games 73 athletes represented Portugal in 15 sports, and at the Paralympic Games 27 athletes in 10 sports.
The criterion for selecting pages for the international collection was news about the athletes. For each athlete, pages were selected about their expectations before the games, their performance in the competition and their comments during and after the competition.
Some athletes have more news selected than others, and the same goes for the sites from which the news comes. The selection of pages was not limited to the first results presented by the search engine. We looked for a variety of channels and news from regional and local sites, some from the region or city where the athletes came from.
More than 500 pages to remember the Portuguese presence in Paris
The contribution of Arquivo.pt, as you can see in the table, already has more than 500 web pages.
Some web-archived pages are reproduced incompletely due to problems occurred during the archiving process (e.g. deformatted or missing embedded images).
Complete page is a function of Arquivo.pt that allows to recover missing elements in web-archived pages, from other web archives or the original websites.
When a user views a page archived in Arquivo.pt, just needs to access the Options menu in the top right corner and choose Complete page.
This process is performed automatically.
How does Complete page work?
If you open a web-archived page that appears incomplete, try the Complete page option and wait.
Arquivo.pt will search for missing elements on the Internet and in other web archives using the Memento protocol. If it succeeds, the obtained elements will be immediately displayed on the web-archived page.
Later, these recovered elements are integrated into the Arquivo.pt collection, so that the web-archived page will appear more complete in the future accesses performed by any user.
Completing the home page of artist Cristina Guerra’s website found a missing image.
For example, the website of artist Cristina Guerra archived in 2005 had a missing image. By using Complete page, it was possible in 2021 to obtain this missing image from another web archive which preserved it.
Participate in collaborative curation to improve the quality of Arquivo.pt!
Due to the high number of web-archived pages, it is not possible for Arquivo.pt to complete them all automatically. Therefore, the collaboration of users to identify important pages with missing elements and try to complete them is important.
By using Complete page, the users are contributing to improve the quality of the historical webpages preserved in Arquivo.pt!
Always give it a try to complete web-archived pages may that look incomplete. If you detect any problem, contact us.
Spread the word about the Arquivo.pt Complete page!
The platform is maintained by the Celestino Domingues Library of The Estoril Higher Institute for Tourism and Hotel Studies (ESHTE) and has the participation of institutions from various areas of heritage that are content providers.
Among the digitized contents that can be consulted in the catalog and accessed in the provider institutions were sound, image, photography, printed material, but websites were missing.
Thus, the idea for the MUVITUR’s new “Web Pages” collection emerged.
Collaboration between MUVITUR and Arquivo.pt
In 2019, a collaboration between Arquivo.pt and MUVITUR began with the aim of identifying websites related to Tourism in Portugal and to disseminate the history of content published on the Web since 1996.
In 2022, a list was established with about 400 records of websites of various entities related to tourism, hotels, travel agencies, pages of municipalities’ websites dedicated to tourism and others.
Collection of records in the MUVITUR catalog with webpages preserved at Arquivo.pt.
How the integration was done
MUVITUR uses Nyron software, which allows content from different sources to be aggregated using the OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) interoperability protocol, which is very common among libraries, archives and museums to provide content to portals such as Europeana.
Arquivo.pt, however, does not make information available through OAI-PMH so it was necessary to find alternative ways to create a record in Nyron with descriptive information from preserved sites.
The procedure for integration was as follows:
The XML schema with the fields for the metadata, according to what works in Nyron, was exported to an Excel sheet.
The information was entered manually, respecting the format and syntax, in collaboration with the computer technicians.
The XML file with the inserted data was validated and imported into Nyron.
Creating records in catalogs is largely a manual task and requires human curation. However, it was possible to input information to be automatically processed in the records of the Website collection. For example, the thumbnail was obtained using the Arquivo.pt API, more specifically the linkToScreenShot, visible in the technical details of a preserved page (see the options menu on the top right of a replayed page).
For other elements, such as the site’s title, it would be possible to obtain them automatically through the Arquivo.pt API, however the quality of the information depends on what the site’s producers have inserted and may not be accurate. The dates to limit the temporal scope can also be obtained automatically, but the manual method was chosen to control the information presented.
In the continuation of the project, the collection will be increased with new records, as there are thousands of websites about the Tourism sector.
Description of Web contents in the MUVITUR catalog
In the collection “Paginas Web” the following data are used:
Denomination – usually the title of the website
Organization – the entity to which the publication belongs
Website address on the Internet
Address for version in Arquivo.pt
Moment(s) to remember
Link for miniature in Arquivo.pt
Descriptors
Geographical data (location, coordinates, geographical name)
The presentation of the information was adjusted to be aligned with that of other MUVITUR resources and contains links to Arquivo.pt.
For example, in the register of the Turismo do Algarve site, we find a link to a moment to remember in 2011 and another link to the history in Arquivo.pt under “Consultar objecto”.
Organizations can create collections of Websites from their area
The National Library of Australia, for example, included records of preserved Websites in its catalog. In the Library of Congress there are collections of old Websites alongside traditional resources.
However, websites are rarely included in museums.
With this unprecedented project we can say that preserved Web sites have gained citizenship in digital platforms dedicated to cultural heritage.
MUVITUR has paved the way with this project for other entities to create collections of websites of their interest on their own platforms.
The project’s second online product will be a directory of references of artists, galleries and projects in the area of contemporary Portuguese art to be made available during 2022, at the Gulbenkian Art Library webpage.
The average number of participants was 58 per session, who evaluated their satisfaction, on a scale from 1 to 5, with an average score of 4.6. The three sessions aimed at disseminating knowledge about digital preservation of information on the web and requirements for publishing preservable information.
Identification of artists, galleries and projects
The first step was to identify relevant artists, galleries and projects in the contemporary Portuguese art scene. We started from an initial set of 63 agents (artists, galleries and projects), to which 573 artists belonging to the Modern Collection of the Calouste Gulbenkian Foundation and the BAA – FCG Collection of Artist Books and Independent Publishing were added.
Throughout these months, 636 elements were thus identified (social networks and websites active in 2020), which were subsequently analysed.
The conclusions of the analysis carried out within the project were presented in the last webinar, held on July 1, 2021 :
In April 2021, Arquivo.pt made a special collection based on the initial identification of artists, galleries and projects and obtained 2.8 terabytes of preserved information.
New contents about art websites were recorded, using tools that allow higher quality collections, such as Brozzler and Webrecorder.
A collaborative project of digital curation
“PARA SEMPRE” (forever) is a digital curatorial project applied to the information made available on the web by the several agents of the contemporary Portuguese art scene (artists, galleries and hybrid sites).
Its main purpose is to contribute to the preservation/reuse of past and future pages, to ensure the preservation of the digital memory of current Portuguese art available at Arquivo.pt, and to promote knowledge on this theme by presenting it in a systematized and structured way.
The aim of this initiative was to present the services of Arquivo.pt and disseminate their use so that the historical heritage published on the web can be preserved and exploited by any citizen.
The sessions were open by registration and had a total of 126 participants (average of 31 per session).
The speakers’ presentations were recorded and can now be accessed, along with the slides from each session.
Sessions held
September 15 – Arquivo.pt. What is it? What is it for?
Daniel Gomes, manager of Arquivo.pt, the public Web preservation service operated by the Fundação para a Ciência e a Tecnologia, I.P., explains how any citizen can use to consult Web pages from the past in the most diverse cases and talks about the importance of preserving the digital memory.
November 11 – API Arquivo.pt : automatic acess to the Web preserved information
Vasco Rato, web developer of Arquivo.pt, presented the Arquivo.pt’s APIs (Application Programming Interface). These enable the development of innovative and useful applications for organizations through the automatic processing of historical information preserved from the Web.
Ricardo Basílio, curador digital do Arquivo.pt, apresentou um tutorial sobre a utilização das ferramentas do Webrecorder.net para gravação de páginas Web em formato normalizado no próprio computador, a qual permite que uma pessoa ou uma organização possa organizar em pequena escala o seu próprio arquivo da Web.
December 9 – Publish on the Web: best practices by Arquivo.pt
Pedro Gomes, the engineer responsible for the Arquivo.pt crawls, addressed the issue of publishing preservable web contents. How many contents are in formats that make their future access difficult or impossible? These situations were illustrated with practical cases and recommendations on how to avoid them. Therefore, it all boils down to publishing well in order to preserve well.
training in preserving open data published online.
AMA is the public organisation responsible for promoting digital means in Public Administration and aims to modernise and simplify citizens’ access to State services.
Arquivo.pt is a service operated by the Fundação para a Ciência e Tecnologia I.P. that preserves data published on the Web between 1996 and the present day, making them accessible to any citizen for memory and research purposes.
EU open data directive includes documents on websites
“(30) This Directive lays down the definition of the term ‘document’ and that definition should include any part of a document. The term ‘document’ should cover any representation of acts, facts or information — and any compilation of such acts, facts or information — whatever its medium (paper, or electronic form or as a sound, visual or audiovisual recording.
…
(34) To facilitate re-use, public sector bodies should, where possible and appropriate, make documents, including those published on websites, available through an open and machine-readable format and together with their metadata, at the best level of precision and granularity, in a format that ensures interoperability
…
(35) A document should be considered to be in a machine-readable format if it is in a file format that is structured in such a way that software applications can easily identify, recognise and extract specific data from it. Data encoded in files that are structured in a machine-readable format should be considered to be machine-readable data. A machine-readable format can be open or proprietary. They can be formal standards or not.
…
(60) The Commission should facilitate the cooperation among Member States and support the design, testing, implementation and deployment of interoperable electronic interfaces that enable more efficient and secure public services.
…
Arquivo.pt is a public service that has the mission of preserving documents published on Internet sites to enable their long-term open access and provides interoperable electronic interfaces (APIs) for their automatic processing.
Any citizen can access the open data resulting from these historical archives and, for example, search for official information published on the websites of successive governments.
In 2021, Arquivo.pt provided open access to over 10 billion files (721 TB) from 27 million websites. The open data preserved by Arquivo.pt can be explored through the search interface, automatically through API (https://arquivo.pt/api) or by reusing derived datasets.
Derived datasets available on the Open Data Portal
Besides the original web artefacts preserved at Arquivo.pt, this service has generated open datasets derived from its activities, which are now available in open access so that they can be reused:
Colectiva de Artistas. 2008.04.19 a 2008.06.07. Galeria Quadrado Azul. Porto. Composition from a Webpage preserved on Arquivo.pt: www.quadradoazul.pt, 22nd October 2008.
On April 29, May 27 and July 1, from 3 to 4:30 pm, webinars geared to the community of artists, curators, gallerists and event producers will be held, open also to anyone interested in learning more about preserving art websites.
Throughout the sessions, participants will learn in detail about the functionalities of Arquivo.pt in order to take advantage of this public Web preservation service. They will have technical information, in the form of recommendations and best practices, to create preservable websites. Finally, they will learn how to use available tools to save their websites in a standardized format so that their contents are not lost.
This cycle of Webinars is an initiative of the “Forever” Project, a collaboration between the Calouste Gulbenkian Foundation Art Library and Arquivo.pt under the ROSSIO infrastructure.
At the end of 2020, we recommend some texts that put the future in perspective.
We highlight the theme of preserving online content presented in the ebook “Tendências 2021” (Trends 2021). The contribution of Daniel Gomes, the Arquivo.pt manager, was entitled “Arquivos online ou do online?” (Online archives or archives of the online?).
I was invited to write about the challenges and threats to online archives. The first question that came to me was what is meant by an “online archive”?
My concern lies in the “archives of the online” because there is not even an established awareness about their need, whether at an academic, governmental or individual level.
It is technologically impossible to preserve all information available online. But it is absurd not to be aware that we have to preserve some of the information online for short, medium and long term access.
Memorial (hight quality preservation) and image search were highlighted as new developments in Arquivo.pt during Jornadas de Computação Científica 2019, held from 6 to 8 at the University of Azores in Ponta Delgada.
On the first day of this annual event, Arquivo.pt developed a training session in 4 parts:
Memory of the web: a forgoten heritage?, by Daniel Gomes (in portuguese);
Curation of institutional websites, by Ricardo Basílio (in portuguese);
Automatic access and processing of preserved Web data (APIs), by Fernando Melo (in portuguese);
Recommendations for web publication of preservable information, by Daniel Bicho (in portuguese).
Participants learned about the web preservation service offered to the community by the Arquivo.pt that for the purpose of researching and safeguarding the digital heritage, and how they can help preserve the Web.
In addition to the Jornadas 2019, the Arquivo.pt team also made two presentations in class context. The first was to students of the Informatics – Networks and Multimedia course at the University of Azores, and the second at the Escola Secundária das Laranjeiras (high school) in Ponta Delgada.
To schedule a training session with Arquivo.pt, contact us.
The hands-on presentation showed how anyone, even a non-TI expert, can adequately capture, store and replay a website or a social page of an institutional website. Basílio also gave specific examples on how to gather and share collections of institutional contents previously published on the Web: a list, an exhibition, a recovery of a past content to be published on Twitter or Facebook, etc.
A librarian can be a curator of websites
Human and qualitative evaluation is the focus of the digital curator, even when we use such a proficient tool like Webrecorder. The most important point is to enable librarians to practice micro-archiving and create local collections.
Video (40 minutes, in Portuguese)
Presentation (PDF, in Portuguese)