15 years of Arquivo.pt celebrated in a meeting promoted by Wikimedia

thumbnail_15-anos-Arquivopt-Wikimedia

Last updated on November 14th, 2022 at 04:25 pm

On November 8, 2007, the Portuguese Web Archive was officially created and later named Arquivo.pt.

To celebrate this date, Wikimedia Portugal and Arquivo.pt have associated themselves in the organization of an online event dedicated to the preservation of the digital heritage.

Agenda

  • Introdução – André Barbosa, Wikimédia Portugal (Video)
  • 15 anos de Arquivo.pt – Daniel Gomes, Arquivo.pt (Slides, Video)
  • Wikimedia na Universidade: Exploração e Projetos na NOVA FCSH – Rute Correia, Residência WMPT na NOVA FCSH, (Slides; Video)
  • GLAM Wiki. Uma introdução geral – Giovanna Fontenelle, Fundação Wikimédia, Brasil (Slides; Video)
  • Demo dos recursos em acesso livre no Arquivo.pt – Daniel Gomes (Video)

More information

15-anos-Arquivopt-Wikimedia

Afghanistan Websites and the fall of the regime in August 2021

thumbnail_Karima Faryabi

Last updated on September 26th, 2022 at 03:57 pm

afghan-ministry-of-economy-17-08-2021

Afghanistan Ministry of Economy website with Karima Faryabi (recorded August 17, 2021)

On August 15, 2021 the presidential palace in Kabul was taken over by the Taliban, consummating the fall of the regime that had been in place for 20 years, following the 9/11 attacks on the United States.

The community of Web archivists, through the Content Development Working Group – International Internet Preservation Consortium, was challenged to record the Afghan sites, given the risk that they would disappear with the new regime.

No time to lose when it comes to preserving the Web

Arquivo.pt reacted quickly, launching an automatic content search focused on .af domain sites and on international media news about the ongoing events.

On August 17, the websites began to be recorded.

1800 website addresses from Afghanistan (ending in .af) and 500 media news stories from around the world were used.

The addresses, URLs or “seeds” were obtained through automated search using the Bing Search API and immediately put into recording.

Content available to know Afghanistan’s history

As a result of the collection carried out, more than 400 Gigabytes of information became available at Arquivo.pt, which anyone can use for research in the most diverse areas.

The main contribution of Arquivo.pt to the community of Web archivists was the use of the automatic search that allows a quick reaction in the recording of Web contents in imminent risk of being lost.

Know more

Arquivo.pt open data set (Dados.gov)

Content collected by the Content Development Working Group of the International Internet Preservation Consortium available at the Archive-it service

Cultural heritage on the Web: the online presence of museums

Last updated on July 7th, 2022 at 09:26 pm

The Portuguese Museums Network was the community invited to participate in the cycle of three webinars entitled “Cultural Heritage on the Web: online presence of museums”.

The aim is to raise awareness among museum managers and professionals about the importance of preserving content published on the Web and to make known the services and tools of Arquivo.pt.

This initiative is promoted by the Direção Geral do Património Cultural, through the Departamento de Museus, Conservação e Credenciação and Divisão de Museus e Credenciação, which welcomed and integrated in its training offer the proposal of Arquivo.pt (FCT, I.P.) .

Information and materials

June 21st, 2022 – The Arquivo.pt and the preservation of digital memory (1st webinar)

In this session Arquivo.pt is presented as a useful service to museums and institutions that the community can count on to preserve digital cultural heritage, specifically Web content.

  • Speaker: Ricardo Basílio, digital curator (in substitution of Daniel Gomes, manager of Arquivo.pt)
  • Duration: 15h30 -17h00
  • Slides (PDF)
  • Video

June 22, 2022 – Publishing Well to Preserve Well (2nd Webinar)

This session deals with the aspects that an institution must take into account to create and maintain preservable websites.

  • Speaker: Pedro Gomes, responsible for the Arquivo.pt collections
  • Duration: 15h30 -17h00
  • Slides
  • Vídeo

June 27, 2022 – Archiving the Web: DIY (3rd Webinar)

This session offers a tutorial for creating a local web archive, recording contentes in a standard format and using open tools that any person can use.

  • Speaker: Ricardo Basílio, digital curator
  • Duration: 15h30 -17h00
  • Vídeo
  • Slides

June 28, 2022 – Repeat of the first session (extra session)

Open session for those who were not able to participate in the 1st session.

  • Speaker: Ricardo Basílio, digital curator
  • Duration: 15h30 -17h00
  • Video
  • Slides

Online exhibition: discover museums’ online presence over time

 

Municipality of Sines and Arquivo.pt together on the International Archives Day

thumbnail-sines-dia-internacional-dos-arquivos

Last updated on June 27th, 2022 at 08:40 am

The Municipal Archive of the Municipality of Sines and Arquivo.pt celebrated the International Archives Day, June 9, at the Salão Nobre dos Paços do Concelho, with a Workshop on preserving the digital memory of Sines (Portugal).

The meeting was broadcast online with the aim of sharing with the community of archivists what has been an experience of collaborative curation of Web content.

Collaboration between a municipal archive and a web archive

This meeting took place in the continuity of a collaboration between the two teams developed during the pandemic period.

The Arquivo Municipal de Sines made a selective and systematic collection of Web content related to the Municipality of Sines, with the collaboration of local media, such as Rádio Miróbriga and Rádio Sines.

In turn, Arquivo.pt contributed with training on tools, like Webrecorder.net, that records in standardized format and prepared useful services, such as SavePageNow that allows to record pages on the fly directly on Arquivo.pt.

Local history is better with preserved Web pages

From this collaboration resulted the preservation of thousands of Web pages (about 200 Gigabytes of information) about the experience of the pandemic in the geographical area of Sines and Santiago do Cacém.

The copies of the Web Archive Files (WARCs) sent to Arquivo.pt have been integrated to become available.

Presentations

Fix “page not found” errors on your website

thumbnail- erro404-en-

Last updated on January 5th, 2023 at 11:14 am

Does your website present “Error 404 – Page not found” messages to your users?

Arquivo.pt offers a solution for this problem through arquivo404.

Just insert a single line of code in the page that generates the 404 error message on your website.

How Arquivo404 works

example-fccn-arquivo404_

When a page is no longer on a website, Arquivo404 checks if a preserved version exists.

When a user tries to access a page that is no longer available on a website, arquivo404 automatically checks if there is a version of that page preserved in Arquivo.pt.

If the page exists in Arquivo.pt, a link is presented so that the user may visit this version. If it does not exist, the normal error page is displayed.

See Arquivo404 at work in this example of an error page that presents a link automatically generated by arquivo404.

How to install arquivo404 on your website?

The simplest implementation of arquivo404 is to insert the following Javascript on the HTML code that generates the “Page not found” message:

<script type="text/javascript" src="https://arquivo.pt/arquivo404.js" async defer onload="ARQUIVO_NOT_FOUND_404.call();"></script>

The code in arquivo404 can easily be adapted. You can for example create a customised error message.

If you have any questions or issues, please contact us!

To know more

Short link to this page: arquivo.pt/arquivo404en

How to preserve Web references from Wikipedia?

thumbnail-wikimedia

Last updated on May 19th, 2022 at 07:05 pm

Wikimedia Portugal has started a collaboration with Arquivo.pt that aims at raising the community’s attention to the preservation of contents published on Wikipedia.

Eighty percent of the pages published on the Web disappear or are changed, just one year after their publication. At the same time, the information in Wikipedia is based on information mostly published on the Web. The disappearance of reference information undermines the reliability of Wikipedia articles.

Webinar cycle “Cultural Heritage on the Web: how to preserve references in Wikipedia?”

The cycle of Webinars, promoted by Wikimedia Portugal, includes educational content that enriches the training of information and communication professionals but also the digital literacy of any citizen.

Arquivo.pt and the preservation of digital memory (1st Webinar)

Gonçalo Themudo, President of Wikimedia Portugal, introduced the 1st webinar of the cycle entitled Cultural heritage on the Web: how to preserve references in Wikipedia?. He stressed the importance of preserving the references (URLs) used by authors when publishing articles in Wikipedia. Daniel Gomes, Manager of Arquivo.pt, showed how Arquivo.pt preserves Web contents and how the community of Wikipedia authors can contribute to the effective preservation of those contents.

  • Held on February 22, 2022
  • Speaker: Daniel Gomes, Arquivo.pt
  • Slides
  • Video

Automatic access and processing of preserved information from the Web through APIs (2nd Webinar)

Webinar that presents the Archive.pt’s APIs (Application Programming Interface) that enable the automatic processing of historical information preserved from the Web, in order to develop innovative and useful applications for organizations. This Webinar is mainly intended for IT professionals (e.g. Web developers, Web designers, Web marketers).

  • Date: 22 Mar. 2022 15:00 – 16:30
  • Speaker: Vasco Rato, Arquivo.pt
  • Slides
  • Video

Web archiving: do it yourself! (3rd Webinar)

Webinar that presents how to preserve cultural information of a municipal and national nature published on the Web. It demonstrates through practical cases how anyone can archive information published on the web in a proper format that will allow its preservation for the future using free tools. This Webinar is intended for any Internet user but is particularly useful for those responsible for communication and information management in organisations.

  • Date: 19 Abr. 2022 15:00 – 16:30
  • Speaker: Daniel Gomes, Arquivo.pt
  • Slides
  • Video

Training in colaboration with the City Council of Lisboa

Thumbnail_passaporte-competencias-digitais-arquivopt

Last updated on December 13th, 2021 at 12:02 pm

print_passaporte-competencias-digitais

A cycle of webinars was held between October and December 2021, organised by the Department of Development and Training of the Municipality of Lisbon, within the digital skills program Passaporte Competências DigitaisCâmara Municipal de Lisboa, in collaboration with Centro Qualifica +ValorLx, a Infraestrutura ROSSIO and Arquivo.pt Fundação para a Ciência e a Tecnologia I.P.

The aim of this initiative was to present the services of Arquivo.pt and disseminate their use so that the historical heritage published on the web can be preserved and exploited by any citizen.

The sessions were open by registration and had a total of 126 participants (average of 31 per session).

The speakers’ presentations were recorded and can now be accessed, along with the slides from each session.

Sessions held

September 15 – Arquivo.pt. What is it? What is it for?

Daniel Gomes, manager of Arquivo.pt, the public Web preservation service operated by the Fundação para a Ciência e a Tecnologia, I.P., explains how any citizen can use to consult Web pages from the past in the most diverse cases and talks about the importance of preserving the digital memory.

November 11 – API Arquivo.pt : automatic acess to the Web preserved information

Vasco Rato, web developer of Arquivo.pt, presented the Arquivo.pt’s APIs (Application Programming Interface). These enable the development of innovative and useful applications for organizations through the automatic processing of historical information preserved from the Web.

November 25 – Archive the Web: do-it-yourself!

Ricardo Basílio, curador digital do Arquivo.pt, apresentou um tutorial sobre a utilização das ferramentas do Webrecorder.net para gravação de páginas Web em formato normalizado no próprio computador, a qual permite que uma pessoa ou uma organização possa organizar em pequena escala o seu próprio arquivo da Web.

December 9 – Publish on the Web: best practices  by Arquivo.pt

Pedro Gomes, the engineer responsible for the Arquivo.pt crawls, addressed the issue of publishing preservable web contents. How many contents are in formats that make their future access difficult or impossible? These situations were illustrated with practical cases and recommendations on how to avoid them. Therefore, it all boils down to publishing well in order to preserve well.

Know more about Arquivo.pt training

Arquivo.pt is open to collaborations aiming at training professionals in organizations or common citizens on Web preservation.

Learn about the training modules and contact us.

 

H2020 projects preserved by Arquivo.pt

Thumbnail H2020 projects

Last updated on December 6th, 2021 at 05:03 pm

The main objective of Arquivo.pt is to preserve online information for research and education purposes.

Previously, Arquivo.pt identified and preserved Research & Development project websites funded by the European Union during the FP4, FP5, FP6 and FP7 programmes (1994-2013).

Now, Arquivo.pt contributed to preserve online information that documents R&D projects funded by the Horizon 2020 programme (2014-2021). It preserved 197 million web files (17 TB) related to science for future access.

H2020 projects publish valuable information online but are being lost

Websites about Research and Development (R&D) projects are increasingly being used to publish and disseminate important scientific information that complements published literature (e.g. data sets, documentation or software).

However, after projects ending, the corresponding websites usually disappear causing a permanent loss of unique and valuable scientific information.

Arquivo.pt automatically identified URLs that document H2020 Research and Development projects

The European Union’s Open Data Portal published a data set from the Community Research and Development Information Service (CORDIS) that documents H2020 research projects. However, from the 31 129 projects listed, only 46% presented a project URL.

Arquivo.pt developed a low-cost methodology that automatically identifies URLs related to R&D projects to be systematically preserved. This automatic identification is achieved through the combination of open data sets with web search services. This methodology is detailed on a scientific article published at the International Conference on Digital Preservation 2016.

In sum, we extracted 106 300 unique URLs from the following open data sets:

Then, we extracted the acronym and title of the projects from the data sets and automatically searched the web for additional URLs using the Bing Search API.

All the data sets and tools developed have been made publicly available in open access so that they can be reused and collaboratively enhanced. In particular, you can access the software developed to automatically identify additional URLs about H2020 projects.

197 million web files related to science were preserved

Arquivo.pt identified and preserved 197 million web files (17 TB) that document R&D projects funded by Horizon 2020.

In 2021, we can already witness project websites that are no longer available online, such as the Extended Model of Organic Semiconductors (EXTMOS) project (http://extmos.eu/). However, it was preserved and can be accessed at Arquivo.pt:

Archived version at Arquivo.pt (https://arquivo.pt/wayback/20170427182603/http://extmos.eu/) of the home page of the EXTMOS Research and Development project (http://extmos.eu/)funded by H2020.
Archived version at Arquivo.pt of the home page of the EXTMOS Research and Development project funded by H2020.

Contributions to complement the European Open Data Sets

All the resulting data sets were made publicly available so that they can be improved and reused by other organizations also interested on preserving this digital heritage:

If you want to know more information about this collection you can watch the video Preservation of web content related to Horizon 2020.

References

Are you a researcher?

Arquivo.pt certified as an open data provider

selo-dados-gov

Last updated on August 17th, 2022 at 08:39 am

Arquivo.pt has been collaborating with Agência Modernização Administrativa (AMA) with the aim of improving the preservation of Public Administration websites.

Collaboration is based on three action points:

AMA is the public organisation responsible for promoting digital means in Public Administration and aims to modernise and simplify citizens’ access to State services.

Arquivo.pt is a service operated by the Fundação para a Ciência e Tecnologia I.P. that preserves data published on the Web between 1996 and the present day, making them accessible to any citizen for memory and research purposes.

EU open data directive includes documents on websites

The Directive (EU) 2019/1024 of the European Parliament and of the Council of 20 June 2019 on open data and the re-use of public sector information stipulates the following:

“(30) This Directive lays down the definition of the term ‘document’ and that definition should include any part of a document. The term ‘document’ should cover any representation of acts, facts or information — and any compilation of such acts, facts or information — whatever its medium (paper, or electronic form or as a sound, visual or audiovisual recording.

(34) To facilitate re-use, public sector bodies should, where possible and appropriate, make documents, including those published on websites, available through an open and machine-readable format and together with their metadata, at the best level of precision and granularity, in a format that ensures interoperability

(35) A document should be considered to be in a machine-readable format if it is in a file format that is structured in such a way that software applications can easily identify, recognise and extract specific data from it. Data encoded in files that are structured in a machine-readable format should be considered to be machine-readable data. A machine-readable format can be open or proprietary. They can be formal standards or not.

(60) The Commission should facilitate the cooperation among Member States and support the design, testing, implementation and deployment of interoperable electronic interfaces that enable more efficient and secure public services.

Arquivo.pt is a public service that has the mission of preserving documents published on Internet sites to enable their long-term open access and provides interoperable electronic interfaces (APIs) for their automatic processing.

The Portuguese Law No. 68/2021 of 2021-08-26 approves the general principles on open data and transposes the European Directive.

Arquivo.pt was certified as a Public Administration open data provider

The AMA recognized Arquivo.pt as a public service and open data provider and awarded its certification seal on the Open Data Portal.

Arquivo.pt collects general information published on the Web of interest to the Portuguese community. However, it is also responsible for the preservation of Public Administration websites, such as the Portal do Governo, in collaboration with the Management Center for the Government Electronic Network (CEGER).

Any citizen can access the open data resulting from these historical archives and, for example, search for official information published on the websites of successive governments.

In 2021, Arquivo.pt provided open access to over 10 billion files (721 TB) from 27 million websites. The open data preserved by Arquivo.pt can be explored through the search interface, automatically through API (https://arquivo.pt/api) or by reusing derived datasets.

Derived datasets available on the Open Data Portal

Besides the original web artefacts preserved at Arquivo.pt, this service has generated open datasets derived from its activities, which are now available in open access so that they can be reused:

Resources list

Video presentation at the IIPC Web Archiving Conference 2022

Internet Memory Foundation collection available in Arquivo.pt

logo Internet Memory Foundation - website

Last updated on September 15th, 2021 at 09:29 am

The historical collection of web content generated during the Internet Memory Foundation’s (IMF) activity has been donated to Arquivo.pt and is now searchable!

The IMF was a European organization dedicated to preserving web content that was wound up in 2018.

The 1st web archiving project in Europe (2004-2010) was led by Julien Masanès (who was guest of honour at the celebration of 10 years of Arquivo.pt) and was called European Archive Foundation.

In 2010, Julien Masanès, the “father” of Web archives in Europe created the IMF.

Examples of pages from the collection donated by the IMF

The collection donated by the IMF has now been integrated in the Arquivo.pt collection to be preserved for posterity.

This collection is composed of 142 million files that total 6.3 TB of historical information whose texts or images can now be searched through Arquivo.pt.

webpage liteScience printscreen

Life Science Competence in Europe portal, 2009.

print homepage www.limes.fp6-limes.eu

LIMES project homepage (Land and Sea Monitoring for Environment and Security), 2009.

print homepage intelligence-territoriale.eu

Project Intelligence-territoriale homepage, 2009.

European Parliament news page in the 20th anniversary of the break of the Berlim Wall, 2009.

Le Figaro about French presidential election, 2012.

Reuters with a new about WikiLeaks, 2011.

Print da página do Internet Memory Research em 2014

Internet Memory Foundation homepage, 2014.

Search this new collection!

This new collection has been named “InternetMemory” in the Arquivo.pt collections list.

Searches can be made on this collection using the collection search parameter or through the custom search page available at arquivo.pt/InternetMemory.

custom-search-page of Internet Memory collection