Arquivo.pt presentations at IIPC GA/WAC, RESAW 2023 and CLEOPATRA

Last updated on March 10th, 2024 at 05:23 pm

Meeting the Web Archive Community

The International Internet Preservation Consortium (IIPC), a consortium that brings together Web preservation initiatives from around the world, held its General Assembly with its members on May 10, 2023.

On the following days, May 11 and 12, the IIPC Web Archiving Conference (IIPC WAC) was held, an initiative open to the community, where people or entities not associated with the IIPC and interested in the Web preservation domain can participate.

The two events were jointly hosted by KB – National Library of the Netherlands, and by Beeld & Geluid – Netherlands Institute for Sound & Vision.

Contributions from the Arquivo.pt at the Web Archiving Conference

Arquivo.pt participated in the IIPC working group meetings (Training Working Group and Curators Working Group) and contributed with presentations in the thematic sessions Collaborations & Outreach and Program infrastructure (sessions 7 and 17).

  • Arquivo.pt updates 2023 (slides)
  • Linking web archiving with arts and humanities: the collaboration between ROSSIO and Arquivo.pt (video, slides)
  • Arquivo.pt behind the curtains (slides)

Meeting the RESAW research community

RESAW (Research Infrastructure for the Study of Archived Web Materials) is an initiative created in 2012 with the aim of promoting studies based on archived Web content, in areas such as Social Sciences, Digital Arts and Humanities.

The RESAW 2023 conference was held at the MUCEM Lab (Mediterranean Institute of Heritage Crafts) in Marseille on June 5-6, 2023, under the theme Exploring the Archived Web During a Highly Transformative Age.

Contributions from Arquivo.pt to RESAW 2023

Arquivo.pt contributed with presentations to the sessions Web Archive in Mediterranean area and its merge (4.A), From online Tools to Web Archive (6.B.), Towards a participatory approach to collections (9. A.), Digging up the materials for writing web history (9.B.).

  • How to research governmental web data? (abstract, slides)
  • Archiving Cryptocurrencies (abstract, slides)
  • Time to explore, time to learn from the archived web: Arquivo.pt training initiative (abstract, slides)
  • Exhibiting Web Memories from Arquivo.pt: a call for community participation (abstract, slides)

CLEOPATRA Project Meeting

The CLEOPATRA Project, led by the L3S Research Center at the Gottfried Wilhelm Leibniz University of Hannover, has developed since 2019 a training programme for doctoral researchers (Early Stage Researcher, PhD).

Arquivo.pt has participated in three courses: Incentives design for hybrid multilingual information processing and analytics, in Southampton; National and transnational media coverage of European parliamentary elections, 2004-2014, London; and NLP for under-resourced languages, in Zagreb, Croatia.

In 2022, the Arquivo.pt welcomed two researchers in its facilities who used the archived resources and received special support from the Arquivo.pt team to develop their research.

The CLEOPATRA Project ended in 2023 with a meeting on the 16th May, in Hannover, which brought together Professors, Researchers and representatives of the institutions involved.

Daniel Gomes, Arquivo.pt’s Manager, highlighted the new tools that Arquivo.pt makes available and the results of the work carried out by the researchers that have passed through Arquivo.pt.

Virtual Museum of Tourism MUVITUR created a collection of preserved Websites

Coleção registos no Catálogo do MUVITUR com páginas Web preservadas no Arquivo.pt

Last updated on February 26th, 2024 at 09:07 am

MUVITUR – Virtual Museum of Turism is a portal that aggregates digital content about Tourism in Portugal.

The platform is maintained by the Celestino Domingues Library of The Estoril Higher Institute for Tourism and Hotel Studies (ESHTE) and has the participation of institutions from various areas of heritage that are content providers.

Among the digitized contents that can be consulted in the catalog and accessed in the provider institutions were sound, image, photography, printed material, but websites were missing.

Thus, the idea for the MUVITUR’s new “Web Pages” collection emerged.

Collaboration between MUVITUR and Arquivo.pt

In 2019, a collaboration between Arquivo.pt and MUVITUR began with the aim of identifying websites related to Tourism in Portugal and to disseminate the history of content published on the Web since 1996.

In 2022, a list was established with about 400 records of websites of various entities related to tourism, hotels, travel agencies, pages of municipalities’ websites dedicated to tourism and others.

This database resulted in the first collection of preserved websites about Tourism in Portugal.

Collection of records in the MUVITUR catalog with webpages preserved at Arquivo.pt. 

How the integration was done

MUVITUR uses Nyron software, which allows content from different sources to be aggregated using the OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) interoperability protocol, which is very common among libraries, archives and museums to provide content to portals such as Europeana.

Arquivo.pt, however, does not make information available through OAI-PMH so it was necessary to find alternative ways to create a record in Nyron with descriptive information from preserved sites.

The procedure for integration was as follows:

  • The XML schema with the fields for the metadata, according to what works in Nyron, was exported to an Excel sheet.
  • The information was entered manually, respecting the format and syntax, in collaboration with the computer technicians.
  • The XML file with the inserted data was validated and imported into Nyron.

Creating records in catalogs is largely a manual task and requires human curation. However, it was possible to input information to be automatically processed in the records of the Website collection. For example, the thumbnail was obtained using the Arquivo.pt API, more specifically the linkToScreenShot, visible in the technical details of a preserved page (see the options menu on the top right of a replayed page).

For other elements, such as the site’s title, it would be possible to obtain them automatically through the Arquivo.pt API, however the quality of the information depends on what the site’s producers have inserted and may not be accurate. The dates to limit the temporal scope can also be obtained automatically, but the manual method was chosen to control the information presented.

In the continuation of the project, the collection will be increased with new records, as there are thousands of websites about the Tourism sector.

Description of Web contents in the MUVITUR catalog

In the collection “Paginas Web” the following data are used:

  • Denomination – usually the title of the website
  • Organization – the entity to which the publication belongs
  • Website address on the Internet
  • Address for version in Arquivo.pt
  • Moment(s) to remember
  • Link for miniature in Arquivo.pt
  • Descriptors
  • Geographical data (location, coordinates, geographical name)

The presentation of the information was adjusted to be aligned with that of other MUVITUR resources and contains links to Arquivo.pt.

For example, in the register of the Turismo do Algarve site, we find a link to a moment to remember in 2011 and another link to the history in Arquivo.pt under “Consultar objecto”.

Detalhe do registo do site "Turismo do Algarve"
Detalhe do registo do site Turismo do Algarve

Organizations can create collections of Websites from their area

The National Library of Australia, for example, included records of preserved Websites in its catalog. In the Library of Congress there are collections of old Websites alongside traditional resources.

However, websites are rarely included in  museums.

With this unprecedented project we can say that preserved Web sites have gained citizenship in digital platforms dedicated to cultural heritage.

MUVITUR has paved the way with this project for other entities to create collections of websites of their interest on their own platforms.

Other results of the collaboration

CitationSaver preserves citations to web resources

Last updated on April 20th, 2023 at 09:37 pm

Documents cite web content by referencing their URLs so that readers can later access them.

In the case of scientific articles, the importance of these citations is even greater to maintain the integrity of research works because they often reference essential information to enable the reproducibility of an experiment or analysis.

For example, links in a scientific article may cite the datasets, software or web news that supported the research, which are not included in the text of the article.

To respond to the need of preserving the integrity of documents, Arquivo.pt launched the CitationSaver.

CitationSaver automatically extracts cited links in a document and preserves their content (e.g. web pages cited in a book) so that they can be retrieved later from Arquivo.pt.

infografia-citationsaver-en

Use CitationSaver to preserve the integrity of your documents

Upload a document and CitationSaver will extract the cited URLs, archive their content and make it available on Arquivo.pt after a short notice. There are 3 methods to upload a document:

  • insert the address (URL) of the PDF or TXT file, if it is published online
  • upload the file in PDF or TXT format
  • paste the text containing the addresses you want to preserve (e.g. References section of an article or Bibliography of a book).

More information

Project “Renascer” brings back old websites

Last updated on April 17th, 2023 at 06:32 pm

Organizations keep domains that referenced websites which are no longer used, to prevent them from being bought or because they were just forgotten.

The aim of project Renascer (Reborn) is to bring back historical websites whose content is no longer available online and whose domain continues to be held by their authors.

“Forgotten” domains can cause cybersecurity problems

In May 2023, the domain hmsportugal.pt of the Harvard Medical School-Portugal project referenced just one default web page hosted on an active server and the domain continued to be owned by its author.

In this situation, the original content of the website was inaccessible despite the fact that the domain continued to be owned by the author of the website.

Furthermore, since the domain was still pointing to an active web server, cybersecurity issues could occur if this server was not being properly maintained.

The domain hmsportugal.pt could be reborn to reference the contents of this website preserved by Arquivo.pt.

How are websites Reborn?

The domain owner only has to redirect it to Arquivo.pt, through the Memorial service.

For example, the mctes.pt domain started to reference back its original contents preserved by Arquivo.pt, thus making this website to be reborn.

Examples of Reborn domains

Project Renascer identified active domains managed by FCCN which were not referencing any content, and gave them a new life turning them to reference its historical contents preserved by Arquivo.pt.

Contact Arquivo.pt to reborn the historical websites of your organization.

See the following examples of Reborn websites:

 

 

 

15 years of Arquivo.pt celebrated in a event promoted by Wikimedia

thumbnail_15-anos-Arquivopt-Wikimedia

Last updated on August 18th, 2023 at 03:29 pm

On November 8, 2007, the Portuguese Web Archive was officially created and later named Arquivo.pt.

To celebrate this date, Wikimedia Portugal and Arquivo.pt have associated themselves in the organization of an online event dedicated to the preservation of the digital heritage.

Agenda

  • Introdução – André Barbosa, Wikimédia Portugal (Video)
  • 15 anos de Arquivo.pt – Daniel Gomes, Arquivo.pt (Slides, Video)
  • Wikimedia na Universidade: Exploração e Projetos na NOVA FCSH – Rute Correia, Residência WMPT na NOVA FCSH, (Slides; Video)
  • GLAM Wiki. Uma introdução geral – Giovanna Fontenelle, Fundação Wikimédia, Brasil (Slides; Video)
  • Demo dos recursos em acesso livre no Arquivo.pt – Daniel Gomes (Video)

More information

15-anos-Arquivopt-Wikimedia

Afghanistan Websites and the fall of the regime in August 2021

thumbnail_Karima Faryabi

Last updated on September 26th, 2022 at 03:57 pm

afghan-ministry-of-economy-17-08-2021

Afghanistan Ministry of Economy website with Karima Faryabi (recorded August 17, 2021)

On August 15, 2021 the presidential palace in Kabul was taken over by the Taliban, consummating the fall of the regime that had been in place for 20 years, following the 9/11 attacks on the United States.

The community of Web archivists, through the Content Development Working Group – International Internet Preservation Consortium, was challenged to record the Afghan sites, given the risk that they would disappear with the new regime.

No time to lose when it comes to preserving the Web

Arquivo.pt reacted quickly, launching an automatic content search focused on .af domain sites and on international media news about the ongoing events.

On August 17, the websites began to be recorded.

1800 website addresses from Afghanistan (ending in .af) and 500 media news stories from around the world were used.

The addresses, URLs or “seeds” were obtained through automated search using the Bing Search API and immediately put into recording.

Content available to know Afghanistan’s history

As a result of the collection carried out, more than 400 Gigabytes of information became available at Arquivo.pt, which anyone can use for research in the most diverse areas.

The main contribution of Arquivo.pt to the community of Web archivists was the use of the automatic search that allows a quick reaction in the recording of Web contents in imminent risk of being lost.

Know more

Arquivo.pt open data set (Dados.gov)

Content collected by the Content Development Working Group of the International Internet Preservation Consortium available at the Archive-it service

Cultural heritage on the Web: the online presence of museums

Last updated on August 2nd, 2024 at 12:16 pm

The Portuguese Museums Network was the community invited to participate in the cycle of three webinars entitled “Cultural Heritage on the Web: online presence of museums”.

The aim is to raise awareness among museum managers and professionals about the importance of preserving content published on the Web and to make known the services and tools of Arquivo.pt.

This initiative is promoted by the Direção Geral do Património Cultural, through the Departamento de Museus, Conservação e Credenciação and Divisão de Museus e Credenciação, which welcomed and integrated in its training offer the proposal of Arquivo.pt (FCT, I.P.) .

Information and materials

June 21st, 2022 – The Arquivo.pt and the preservation of digital memory (1st webinar)

In this session Arquivo.pt is presented as a useful service to museums and institutions that the community can count on to preserve digital cultural heritage, specifically Web content.

  • Speaker: Ricardo Basílio, digital curator (in substitution of Daniel Gomes, manager of Arquivo.pt)
  • Duration: 15h30 -17h00
  • Slides (PDF)
  • Video

June 22, 2022 – Publishing Well to Preserve Well (2nd Webinar)

This session deals with the aspects that an institution must take into account to create and maintain preservable websites.

  • Speaker: Pedro Gomes, responsible for the Arquivo.pt collections
  • Duration: 15h30 -17h00
  • Slides
  • Vídeo

June 27, 2022 – Archiving the Web: DIY (3rd Webinar)

This session offers a tutorial for creating a local web archive, recording contentes in a standard format and using open tools that any person can use.

  • Speaker: Ricardo Basílio, digital curator
  • Duration: 15h30 -17h00
  • Vídeo
  • Slides

June 28, 2022 – Repeat of the first session (extra session)

Open session for those who were not able to participate in the 1st session.

  • Speaker: Ricardo Basílio, digital curator
  • Duration: 15h30 -17h00
  • Video
  • Slides

Online exhibition: discover museums’ online presence over time

 

Municipality of Sines and Arquivo.pt together on the International Archives Day

thumbnail-sines-dia-internacional-dos-arquivos

Last updated on June 27th, 2022 at 08:40 am

The Municipal Archive of the Municipality of Sines and Arquivo.pt celebrated the International Archives Day, June 9, at the Salão Nobre dos Paços do Concelho, with a Workshop on preserving the digital memory of Sines (Portugal).

The meeting was broadcast online with the aim of sharing with the community of archivists what has been an experience of collaborative curation of Web content.

Collaboration between a municipal archive and a web archive

This meeting took place in the continuity of a collaboration between the two teams developed during the pandemic period.

The Arquivo Municipal de Sines made a selective and systematic collection of Web content related to the Municipality of Sines, with the collaboration of local media, such as Rádio Miróbriga and Rádio Sines.

In turn, Arquivo.pt contributed with training on tools, like Webrecorder.net, that records in standardized format and prepared useful services, such as SavePageNow that allows to record pages on the fly directly on Arquivo.pt.

Local history is better with preserved Web pages

From this collaboration resulted the preservation of thousands of Web pages (about 200 Gigabytes of information) about the experience of the pandemic in the geographical area of Sines and Santiago do Cacém.

The copies of the Web Archive Files (WARCs) sent to Arquivo.pt have been integrated to become available.

Presentations

Arquivo404 presents web-archived pages instead of “pages not found”

thumbnail- erro404-en-

Last updated on November 14th, 2023 at 02:46 pm

Does your website presents “Error 404 – Page not found” messages to your users?

Arquivo.pt offers a solution for this problem through Arquivo404.

Just insert a single line of code in the page that generates the 404 error message on your website and web-archived pages will be presented to your users instead of pages not found.

See these examples on websites that installed arquivo404.

How does Arquivo404 work?

example-fccn-pt-arquivo404-en

When a page is no longer on a website, Arquivo404 checks if a preserved version exists.

When a user tries to access a page that is no longer available on a website, Arquivo404 automatically checks if there is a version of that page preserved in Arquivo.pt.

If the page exists in Arquivo.pt, a link is presented so that the user may visit this version. If it does not exist, the normal error page is displayed.

See Arquivo404 at work in this example of an error page that presents a link automatically generated by Arquivo404.

How to install arquivo404 on your website?

The simplest implementation of arquivo404 is to insert the following Javascript on the HTML code that generates the “Page not found” message:

<script type="text/javascript" src="https://arquivo.pt/arquivo404.js" async defer onload="ARQUIVO_NOT_FOUND_404.call();"></script>

The code in Arquivo404 can easily be adapted. You can for example create a customised error message.

Hint for WordPress websites: When editing the 404 error page and inserting the arquivo404 script inside the <body>, you must put the <!– wp:html –> tag at the beginning and the <!– /wp:html –> tag at the end, otherwise the script will be deleted.

If you have any questions or issues, please contact us!

Know more

Short link to this page: arquivo.pt/arquivo404en

How to preserve Web references from Wikipedia?

Wikimedia Portugal e Arquivo.pt

Last updated on May 19th, 2022 at 07:05 pm

Wikimedia Portugal has started a collaboration with Arquivo.pt that aims at raising the community’s attention to the preservation of contents published on Wikipedia.

Eighty percent of the pages published on the Web disappear or are changed, just one year after their publication. At the same time, the information in Wikipedia is based on information mostly published on the Web. The disappearance of reference information undermines the reliability of Wikipedia articles.

Webinar cycle “Cultural Heritage on the Web: how to preserve references in Wikipedia?”

The cycle of Webinars, promoted by Wikimedia Portugal, includes educational content that enriches the training of information and communication professionals but also the digital literacy of any citizen.

Arquivo.pt and the preservation of digital memory (1st Webinar)

Gonçalo Themudo, President of Wikimedia Portugal, introduced the 1st webinar of the cycle entitled Cultural heritage on the Web: how to preserve references in Wikipedia?. He stressed the importance of preserving the references (URLs) used by authors when publishing articles in Wikipedia. Daniel Gomes, Manager of Arquivo.pt, showed how Arquivo.pt preserves Web contents and how the community of Wikipedia authors can contribute to the effective preservation of those contents.

  • Held on February 22, 2022
  • Speaker: Daniel Gomes, Arquivo.pt
  • Slides
  • Video

Automatic access and processing of preserved information from the Web through APIs (2nd Webinar)

Webinar that presents the Archive.pt’s APIs (Application Programming Interface) that enable the automatic processing of historical information preserved from the Web, in order to develop innovative and useful applications for organizations. This Webinar is mainly intended for IT professionals (e.g. Web developers, Web designers, Web marketers).

  • Date: 22 Mar. 2022 15:00 – 16:30
  • Speaker: Vasco Rato, Arquivo.pt
  • Slides
  • Video

Web archiving: do it yourself! (3rd Webinar)

Webinar that presents how to preserve cultural information of a municipal and national nature published on the Web. It demonstrates through practical cases how anyone can archive information published on the web in a proper format that will allow its preservation for the future using free tools. This Webinar is intended for any Internet user but is particularly useful for those responsible for communication and information management in organisations.

  • Date: 19 Abr. 2022 15:00 – 16:30
  • Speaker: Daniel Gomes, Arquivo.pt
  • Slides
  • Video