Completing webpages from the past: it is possible!

Last updated on October 16th, 2023 at 06:59 pm

Some web-archived pages are reproduced incompletely due to problems occurred during the archiving process (e.g. deformatted or missing embedded images).

Complete page is a function of Arquivo.pt that allows to recover missing elements in web-archived pages, from other web archives or the original websites.

When a user views a page archived in Arquivo.pt, just needs to access the Options menu in the top right corner and choose Complete page.

This process is performed automatically.

How does Complete page work?

If you open a web-archived page that appears incomplete, try the Complete page option and wait.

Arquivo.pt will search for missing elements on the Internet and in other web archives using the Memento protocol. If it succeeds, the obtained elements will be immediately displayed on the web-archived page.

Later, these recovered elements are integrated into the Arquivo.pt collection, so that the web-archived page will appear more complete in the future accesses performed by any user.

complete-page-website-cristina-guerra-en

Completing the home page of artist Cristina Guerra’s website found a missing image.

For example, the website of artist Cristina Guerra archived in 2005 had a missing image. By using Complete page, it was possible in 2021 to obtain this missing image from another web archive which preserved it.

Participate in collaborative curation to improve the quality of Arquivo.pt!

Due to the high number of web-archived pages, it is not possible for Arquivo.pt to complete them all automatically. Therefore, the collaboration of users to identify important pages with missing elements and try to complete them is important.

By using Complete page, the users are contributing to improve the quality of the historical webpages preserved in Arquivo.pt!

Always give it a try to complete web-archived pages may that look incomplete. If you detect any problem, contact us.

Spread the word about the Arquivo.pt Complete page!

Virtual Museum of Tourism MUVITUR created a collection of preserved Websites

Coleção registos no Catálogo do MUVITUR com páginas Web preservadas no Arquivo.pt

Last updated on February 26th, 2024 at 09:07 am

MUVITUR – Virtual Museum of Turism is a portal that aggregates digital content about Tourism in Portugal.

The platform is maintained by the Celestino Domingues Library of The Estoril Higher Institute for Tourism and Hotel Studies (ESHTE) and has the participation of institutions from various areas of heritage that are content providers.

Among the digitized contents that can be consulted in the catalog and accessed in the provider institutions were sound, image, photography, printed material, but websites were missing.

Thus, the idea for the MUVITUR’s new “Web Pages” collection emerged.

Collaboration between MUVITUR and Arquivo.pt

In 2019, a collaboration between Arquivo.pt and MUVITUR began with the aim of identifying websites related to Tourism in Portugal and to disseminate the history of content published on the Web since 1996.

In 2022, a list was established with about 400 records of websites of various entities related to tourism, hotels, travel agencies, pages of municipalities’ websites dedicated to tourism and others.

This database resulted in the first collection of preserved websites about Tourism in Portugal.

Collection of records in the MUVITUR catalog with webpages preserved at Arquivo.pt. 

How the integration was done

MUVITUR uses Nyron software, which allows content from different sources to be aggregated using the OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) interoperability protocol, which is very common among libraries, archives and museums to provide content to portals such as Europeana.

Arquivo.pt, however, does not make information available through OAI-PMH so it was necessary to find alternative ways to create a record in Nyron with descriptive information from preserved sites.

The procedure for integration was as follows:

  • The XML schema with the fields for the metadata, according to what works in Nyron, was exported to an Excel sheet.
  • The information was entered manually, respecting the format and syntax, in collaboration with the computer technicians.
  • The XML file with the inserted data was validated and imported into Nyron.

Creating records in catalogs is largely a manual task and requires human curation. However, it was possible to input information to be automatically processed in the records of the Website collection. For example, the thumbnail was obtained using the Arquivo.pt API, more specifically the linkToScreenShot, visible in the technical details of a preserved page (see the options menu on the top right of a replayed page).

For other elements, such as the site’s title, it would be possible to obtain them automatically through the Arquivo.pt API, however the quality of the information depends on what the site’s producers have inserted and may not be accurate. The dates to limit the temporal scope can also be obtained automatically, but the manual method was chosen to control the information presented.

In the continuation of the project, the collection will be increased with new records, as there are thousands of websites about the Tourism sector.

Description of Web contents in the MUVITUR catalog

In the collection “Paginas Web” the following data are used:

  • Denomination – usually the title of the website
  • Organization – the entity to which the publication belongs
  • Website address on the Internet
  • Address for version in Arquivo.pt
  • Moment(s) to remember
  • Link for miniature in Arquivo.pt
  • Descriptors
  • Geographical data (location, coordinates, geographical name)

The presentation of the information was adjusted to be aligned with that of other MUVITUR resources and contains links to Arquivo.pt.

For example, in the register of the Turismo do Algarve site, we find a link to a moment to remember in 2011 and another link to the history in Arquivo.pt under “Consultar objecto”.

Detalhe do registo do site "Turismo do Algarve"
Detalhe do registo do site Turismo do Algarve

Organizations can create collections of Websites from their area

The National Library of Australia, for example, included records of preserved Websites in its catalog. In the Library of Congress there are collections of old Websites alongside traditional resources.

However, websites are rarely included in  museums.

With this unprecedented project we can say that preserved Web sites have gained citizenship in digital platforms dedicated to cultural heritage.

MUVITUR has paved the way with this project for other entities to create collections of websites of their interest on their own platforms.

Other results of the collaboration

Memory of events and festivals of art: PARA SEMPRE

Thumbnail-projeto-para-sempre

Last updated on February 8th, 2022 at 10:57 am

The exhibition Memória de festivais e eventos de arte proposes a look at the Portuguese art scene present on the Web and includes a chronology of these events.

This online information product is a presentation of the results in a systematic and structured way of the PARA SEMPRE project.

cartao-expo-memoria-festivais-e-eventos-de-arte

Online exhibition – arteparasempre.wordpress.com

The project’s second online product will be a directory of references of artists, galleries and projects in the area of contemporary Portuguese art to be made available during 2022, at the Gulbenkian Art Library webpage.

Cycle of Webinars “Art forever on the web”

A cycle of Webinars entitled “Art forever on the web” was held, between April and July 2021, oriented to artists, curators, gallerists and event producers, among others.

The average number of participants was 58 per session, who evaluated their satisfaction, on a scale from 1 to 5, with an average score of 4.6. The three sessions aimed at disseminating knowledge about digital preservation of information on the web and requirements for publishing preservable information.

Identification of artists, galleries and projects

The first step was to identify relevant artists, galleries and projects in the contemporary Portuguese art scene. We started from an initial set of 63 agents (artists, galleries and projects), to which 573 artists belonging to the Modern Collection of the Calouste Gulbenkian Foundation and the BAA – FCG Collection of Artist Books and Independent Publishing were added.

Throughout these months, 636 elements were thus identified (social networks and websites active in 2020), which were subsequently analysed.

The conclusions of the analysis carried out within the project were presented in the last webinar, held on July 1, 2021 :

Special feature on art websites and blogs

In April 2021, Arquivo.pt made a special collection based on the initial identification of artists, galleries and projects and obtained 2.8 terabytes of preserved information.

New contents about art websites were recorded, using tools that allow higher quality collections, such as Brozzler and Webrecorder.

A collaborative project of digital curation

“PARA SEMPRE” (forever) is a digital curatorial project applied to the information made available on the web by the several agents of the contemporary Portuguese art scene (artists, galleries and hybrid sites).

Its main purpose is to contribute to the preservation/reuse of past and future pages, to ensure the preservation of the digital memory of current Portuguese art available at Arquivo.pt, and to promote knowledge on this theme by presenting it in a systematized and structured way.

Its creation results from the encounter of the missions of two organizations: one that aims to ensure the preservation of the Portuguese web, Arquivo.pt, and another that assumes itself as an agent in the development of knowledge about contemporary Portuguese art, the Calouste Gulbenkian Foundation Art Library. This is part of the ROSSIO (Research Infrastructure in the Social Sciences, Arts and Humanities).

Training in colaboration with the City Council of Lisboa

Thumbnail_passaporte-competencias-digitais-arquivopt

Last updated on December 13th, 2021 at 12:02 pm

print_passaporte-competencias-digitais

A cycle of webinars was held between October and December 2021, organised by the Department of Development and Training of the Municipality of Lisbon, within the digital skills program Passaporte Competências DigitaisCâmara Municipal de Lisboa, in collaboration with Centro Qualifica +ValorLx, a Infraestrutura ROSSIO and Arquivo.pt Fundação para a Ciência e a Tecnologia I.P.

The aim of this initiative was to present the services of Arquivo.pt and disseminate their use so that the historical heritage published on the web can be preserved and exploited by any citizen.

The sessions were open by registration and had a total of 126 participants (average of 31 per session).

The speakers’ presentations were recorded and can now be accessed, along with the slides from each session.

Sessions held

September 15 – Arquivo.pt. What is it? What is it for?

Daniel Gomes, manager of Arquivo.pt, the public Web preservation service operated by the Fundação para a Ciência e a Tecnologia, I.P., explains how any citizen can use to consult Web pages from the past in the most diverse cases and talks about the importance of preserving the digital memory.

November 11 – API Arquivo.pt : automatic acess to the Web preserved information

Vasco Rato, web developer of Arquivo.pt, presented the Arquivo.pt’s APIs (Application Programming Interface). These enable the development of innovative and useful applications for organizations through the automatic processing of historical information preserved from the Web.

November 25 – Archive the Web: do-it-yourself!

Ricardo Basílio, curador digital do Arquivo.pt, apresentou um tutorial sobre a utilização das ferramentas do Webrecorder.net para gravação de páginas Web em formato normalizado no próprio computador, a qual permite que uma pessoa ou uma organização possa organizar em pequena escala o seu próprio arquivo da Web.

December 9 – Publish on the Web: best practices  by Arquivo.pt

Pedro Gomes, the engineer responsible for the Arquivo.pt crawls, addressed the issue of publishing preservable web contents. How many contents are in formats that make their future access difficult or impossible? These situations were illustrated with practical cases and recommendations on how to avoid them. Therefore, it all boils down to publishing well in order to preserve well.

Know more about Arquivo.pt training

Arquivo.pt is open to collaborations aiming at training professionals in organizations or common citizens on Web preservation.

Learn about the training modules and contact us.

 

Arquivo.pt certified as an open data provider

selo-dados-gov

Last updated on August 17th, 2022 at 08:39 am

Arquivo.pt has been collaborating with Agência Modernização Administrativa (AMA) with the aim of improving the preservation of Public Administration websites.

Collaboration is based on three action points:

AMA is the public organisation responsible for promoting digital means in Public Administration and aims to modernise and simplify citizens’ access to State services.

Arquivo.pt is a service operated by the Fundação para a Ciência e Tecnologia I.P. that preserves data published on the Web between 1996 and the present day, making them accessible to any citizen for memory and research purposes.

EU open data directive includes documents on websites

The Directive (EU) 2019/1024 of the European Parliament and of the Council of 20 June 2019 on open data and the re-use of public sector information stipulates the following:

“(30) This Directive lays down the definition of the term ‘document’ and that definition should include any part of a document. The term ‘document’ should cover any representation of acts, facts or information — and any compilation of such acts, facts or information — whatever its medium (paper, or electronic form or as a sound, visual or audiovisual recording.

(34) To facilitate re-use, public sector bodies should, where possible and appropriate, make documents, including those published on websites, available through an open and machine-readable format and together with their metadata, at the best level of precision and granularity, in a format that ensures interoperability

(35) A document should be considered to be in a machine-readable format if it is in a file format that is structured in such a way that software applications can easily identify, recognise and extract specific data from it. Data encoded in files that are structured in a machine-readable format should be considered to be machine-readable data. A machine-readable format can be open or proprietary. They can be formal standards or not.

(60) The Commission should facilitate the cooperation among Member States and support the design, testing, implementation and deployment of interoperable electronic interfaces that enable more efficient and secure public services.

Arquivo.pt is a public service that has the mission of preserving documents published on Internet sites to enable their long-term open access and provides interoperable electronic interfaces (APIs) for their automatic processing.

The Portuguese Law No. 68/2021 of 2021-08-26 approves the general principles on open data and transposes the European Directive.

Arquivo.pt was certified as a Public Administration open data provider

The AMA recognized Arquivo.pt as a public service and open data provider and awarded its certification seal on the Open Data Portal.

Arquivo.pt collects general information published on the Web of interest to the Portuguese community. However, it is also responsible for the preservation of Public Administration websites, such as the Portal do Governo, in collaboration with the Management Center for the Government Electronic Network (CEGER).

Any citizen can access the open data resulting from these historical archives and, for example, search for official information published on the websites of successive governments.

In 2021, Arquivo.pt provided open access to over 10 billion files (721 TB) from 27 million websites. The open data preserved by Arquivo.pt can be explored through the search interface, automatically through API (https://arquivo.pt/api) or by reusing derived datasets.

Derived datasets available on the Open Data Portal

Besides the original web artefacts preserved at Arquivo.pt, this service has generated open datasets derived from its activities, which are now available in open access so that they can be reused:

Resources list

Video presentation at the IIPC Web Archiving Conference 2022

“Art Forever on the Web”: Cycle of Webinars

composicao sobre Colectiva de Artistas 2008 Quadrado Azul

Last updated on July 6th, 2021 at 01:23 pm

composicao sobre Colectiva de Artistas 2008 Quadrado Azul

Colectiva de Artistas. 2008.04.19 a 2008.06.07. Galeria Quadrado Azul. Porto. Composition from a Webpage preserved on Arquivo.pt: www.quadradoazul.pt, 22nd October 2008.

On April 29, May 27 and July 1, from 3 to 4:30 pm, webinars geared to the community of artists, curators, gallerists and event producers will be held, open also to anyone interested in learning more about preserving art websites.

Throughout the sessions, participants will learn in detail about the functionalities of Arquivo.pt in order to take advantage of this public Web preservation service. They will have technical information, in the form of recommendations and best practices, to create preservable websites. Finally, they will learn how to use available tools to save their websites in a standardized format so that their contents are not lost.

This cycle of Webinars is an initiative of the “Forever” Project, a collaboration between the Calouste Gulbenkian Foundation Art Library and Arquivo.pt under the ROSSIO infrastructure.

For more details and sharing, please see the program (PDF) (in Portuguese).

Sign up!

April 29 – The Arquivo.pt and the preservation of digital memory
May 27 – Recommendations for creating preservable websites for the future
July 1 – Archiving the Web: do-it-yourself!

Held sessions presentations

Online archives or archives of the online?

thumbnail_tendencias

At the end of 2020, we recommend some texts that put the future in perspective.

We highlight the theme of preserving online content presented in the ebook “Tendências 2021” (Trends 2021). The contribution of Daniel Gomes, the Arquivo.pt manager, was entitled “Arquivos online ou do online?” (Online archives or archives of the online?).

I was invited to write about the challenges and threats to online archives. The first question that came to me was what is meant by an “online archive”?

My concern lies in the “archives of the online” because there is not even an established awareness about their need, whether at an academic, governmental or individual level.

It is technologically impossible to preserve all information available online. But it is absurd not to be aware that we have to preserve some of the information online for short, medium and long term access.

The complete text (in Portuguese) is available at pages 23 to 26 of the open-access book “Tendências 2021”.

The challenge is to cultivate awareness about the importance of preserving content online by learning how to do it in practice.

Happy New Year!

Arquivo.pt training in Azores islands

Daniel Gomes in Azores islands

Last updated on July 15th, 2022 at 12:56 pm

Memorial (hight quality preservation) and image search were highlighted as new developments in Arquivo.pt during Jornadas de Computação Científica 2019, held from 6 to 8 at the University of Azores in Ponta Delgada.

On the first day of this annual event, Arquivo.pt developed a training session in 4 parts:

  • Memory of the web: a forgoten heritage?, by Daniel Gomes (in portuguese);
  • Curation of institutional websites, by Ricardo Basílio (in portuguese);
  • Automatic access and processing of preserved Web data (APIs), by Fernando Melo (in portuguese);
  • Recommendations for web publication of preservable information, by Daniel Bicho (in portuguese).

Participants learned about the web preservation service offered to the community by the Arquivo.pt that for the purpose of researching and safeguarding the digital heritage, and how they can help preserve the Web.

In addition to the Jornadas 2019, the Arquivo.pt team also made two presentations in class context. The first was to students of the Informatics – Networks and Multimedia course at the University of Azores, and the second at the Escola Secundária das Laranjeiras (high school) in Ponta Delgada.

To schedule a training session with Arquivo.pt, contact us.

Jornadas 2019

Daniel Gomes in Azores islands
Universidade dos Açores
Jornadas 2019
Jornadas 2019 Melo
Jornadas 2019 Daniel Bicho
Açores
Aula Universidade dos Açores
Açores Escola das Laranjeiras
Daniel Gomes in Azores islands Universidade dos Açores Jornadas 2019 Jornadas 2019 Melo Jornadas 2019 Daniel Bicho Açores Aula Universidade dos Açores Açores Escola das Laranjeiras

 

Webinar Web Archiving in academic libraries

Preservation- workflow

Last updated on July 15th, 2022 at 01:29 pm

Web archiving process

“Curation of preserved websites – how it works” was the subject of the webinar promoted by  Associação Portuguesa de Bibliotecários, Arquivistas e Documentalistas (APBAD, Lisbon), the Portuguese association of librarians, past October 9, and presented by Ricardo Basílio, librarian and digital curator at Arquivo.pt.

Gathering the online memory of the Universities

The hands-on presentation showed how anyone, even a non-TI expert, can adequately capture, store and replay a website or a social page of an institutional website. Basílio also gave specific examples on how to gather and share collections of institutional contents previously published on the Web: a list, an exhibition, a recovery of a past content to be published on Twitter or Facebook, etc.

A librarian can be a curator of websites

Human and qualitative evaluation is the focus of the digital curator, even when we use such a proficient tool like Webrecorder. The most important point is to enable librarians to practice micro-archiving and create local collections.

Video (40 minutes, in Portuguese)
Presentation (PDF, in Portuguese)