Exhibition of old websites to mark International Museum Day

Heritales Crowd-Recycling e Arquivo.pt no Dia Internacional dos Museus

May 18, International Museum Day, was celebrated all over the country with free admission, guided tours, entertainment and exhibitions related to memory and heritage.

Arquivo.pt contributed with an exhibition of old web pages, entitled “Digital Memory through the Internet of the Past”, which was on display at one of the stands at the National Coach Museum in Lisbon.

The pages were selected to show different aspects of the Alentejo over time. From 2016, pages relating to the Heritales project were selected.

Heritales and Crowd-Recycling drew attention to the preservation of the Internet’s memory

Heritales is a project based in Évora that aims to study and disseminate heritage in all its manifestations. It is known for its main event created in 2016, HERITALES – International Heritage Film Festival.

Crowd-Recycling is a project focused on good practices for sustainability.

Heritales, Crowd-Recycling and Arquivo.pt carried out this action in collaboration with the aim of giving visibility to content published on the web over time. Preserving and giving access to digital content is fundamental to enhancing heritage.

Why an exhibition of old websites is a good idea

Making an exhibition of websites over time is relatively easy, all you have to do is come up with a theme, which can also be the history of an institution, and choose pages preserved on Arquivo.pt.

An exhibition of old websites is an original idea for the target audience. It often features texts and images that only existed on the web.

By drawing attention to the websites, we realize that many things were left unrecorded and this changes our view of the content we publish today. We start taking more care to save important pages, for example by taking action or saving them on the spot with SavePageNow.

Heritales Crowd-Recycling e Arquivo.pt no Dia Internacional dos Museus
Heritales, Crowd-Recycling and Arquivo.pt on International Museum Day at the National Coach Museum

World Internet Day was on May 17th

The day before International Museum Day was World Internet Day (May 17). The proximity of the two commemorations ties in with the theme of preserving memory.

Portugal connected to the Internet for the first time in 1991, with the FCCN project “RCCN IP Service”.

To remember how it all happened, here are the three suggestions that FCCN published on social media for this day:

Arquivo.pt applied for the DPC Awards 2024

dpc-award-thumb

Last updated on May 20th, 2024 at 04:46 pm

The Digital Preservation Coalition Awards

The Digital Preservation Coalition (DPC) is dedicated to promoting digital preservation and associated best practices.

The DPC Awards promote exemplary and innovative digital preservation use cases from all over the world.

The Arquivo.pt team submitted two applications to the DPC Awards 2024 in the categories of “Safeguarding the Digital Legacy” and “Research and Innovation”.

The Award for Safeguarding the Digital Legacy celebrates the practical application of preservation tools to protect at-risk digital objects.

The Award for Research and Innovation recognizes excellence in practical research and innovation activities.

Arquivo.pt applications to the DPC Awards

#1 Arquivo.pt catalog of tools for digital preservation

Information that rules modern-day lives is born-digital and disseminated online. However, invaluable digital objects published online have been continuously lost.

Arquivo.pt is a public infrastructure which supports the preservation of digital objects published online to safeguard this digital legacy for future generations.

Thus, in October 2023 after 15 years of research and development, Arquivo.pt released a Catalog of 13 innovative tools to support the preservation of at-risk online content, from acquisition to dissemination (e.g. search and access, APIs, training, open data sets, exhibitions).

Arquivo.pt safeguards online digital objects of worldwide interest for research and education.

The Arquivo.pt Catalog was submitted to the Safeguarding the Digital Legacy Award.

#2 Searching preserved web-images

Images published online are precious digital assets that document contemporary times for future generations.

This initiative describes the research and development of an innovative image search system that enables the discovery and access to billions of preserved images acquired from the web since the 1990s.

This research was applied to enhance the Arquivo.pt web archive with an image search service publicly available to any Internet user, officially launched in August 2022.

The resulting scientific and technical publications are available in open-access and the developed software is available as free open-source software to be reused and enhanced by the community.

This work on searching images preserved in web archives applied for the Research and Innovation Award.

Know more

Commemoration of the 50th anniversary of April 25 – the Portuguese revolution of 1974

50anos25abril-ArquivoPT-IG-Feed-2

Arquivo.pt joined the celebrations of the 50th anniversary of April 25, the Portuguese Revolution of 1974, as part of the initiatives promoted by the Fundação para a Ciência e a Tecnologia (FCT) in partnership with the Estrutura de Missão – Comissão Comemorativa 50 anos 25 de Abril.

The initiatives were as follows: a journey through time, a special collection on the theme “Abril 25”, a presentation at the “50 years of April International Congress” and the inclusion of a special mention in the 2025 edition of the Arquivo.pt Award.

Memories of April 25 on the Internet exhibition

The exhibition Memories of April 25 on the Internet presents a selection of web pages about the celebrations of April 25 in various regions of the country, since the beginning of the web in the 1990s.

The criteria for choosing the pages for the exhibition were as follows:

  • Pages relating to the April 25 commemorations;
  • Pages found on Arquivo.pt on dates close to the anniversary each year;
  • Diversity to include different areas of the country;
  • Popular demonstrations and official ceremonies.

A historical memory without web archives is incomplete. The aim of this journey through time is to invite citizens to travel back in time, browsing through old web pages and reliving recent episodes in our life as a democracy.

See the exhibiton: arquivo.pt/50anos25abril

Special collection on April 25 – the Portuguese Revolution of 1974

To mark the anniversary, Arquivo.pt carried out a special collection on the topic of “April 25” and made the results available in an open dataset, published on the Dados.gov portal.

The dataset contains a list of keywords put into a search engine in order to obtain results on the topic of “April 25”. The search considered names of people, places, political, social and cultural aspects, as well as words associated with the event.

The searches were carried out on March 22, 2024 using the Bing Search API, an automatic search service that returns results according to the relevance criteria of the Bing service itself and others configured by us.

A total of 12,650 unique web page addresses were obtained. It is hoped that the recording of these pages will be useful for the organizations that produced this content, for researchers who want to study our history and for citizens who cultivate a sense of memory and democracy.

Participation in the 50 years of April International Congress

memorial-congresso-internacional-50anos25abril
João Gomes, Director of Advanced Services, FCCN-FCT presenting the Arquivo.pt Memorial service at the 50 years of April International Congress

On May 2, 2024, João Gomes, Director of Advanced Services at the FCCN Scientific Computing Unit of the Foundation for Science and Technology I.P., presented Arquivo.pt to the participants of the 50 years of April International Congress, as a distinctive service, open to citizens and useful for organizations.

This event, organized by the Estrutura de Missão – Comissão Comemorativa 50 anos 25 de Abril and the University of Lisbon, included a presentation of two FCT services for citizens: Arquivo.pt and NAU’s massive online open courses.

Arquivo.pt is a web preservation service available to all citizens who want to search for old content published on the web.

Using Arquivo.pt contributes to a better understanding of our history. It also provides useful services for cybersecurity, such as the Arquivo.pt Memorial, which is able to maintain institutions’ old websites, preventing attacks and saving them resources.

Special mention for “April 25 and Democracy” at the Arquivo.pt Awards 2025

The Arquivo.pt Award is held annually and honors works that use Arquivo.pt.

In 2025, as part of the celebrations for the 50th anniversary of April 25, a special mention will be made of work on the theme “April 25 and Democracy”.

We therefore challenge researchers and interested citizens to create innovative works using Arquivo.pt.

If you have any questions about the Arquivo.pt Award, please contact us.

Arquivo.pt reaches 1 PetaByte of preserved information!

The collection of 1 PetaByte of content predominantly in Portuguese, accessible to both researchers and ordinary citizens, is a milestone that deserves to be celebrated, in the month of its 16th anniversary.

At Arquivo.pt you can search for information published on the Web in the past, such as:

Discover more pages through the selected pages in the Arquivo.pt Online Exhibitions.

The first European page
News from The New York Times in 2008
European Film Awards 2014

Purpose and mission of the Portuguese Web Archive

Arquivo.pt was created on November 8, 2007 with the aim of preserving content from the Portuguese Web.

In 2013, as a service operated by the Fundação para a Ciência e a Tecnologia (FCT), its mission was formulated as follows: “To promote the preservation of content available on the national Internet, ensuring that it is made available to the scientific community and the general public” (Decreto-Lei no. 55/2013).

In recent years, Arquivo.pt has created new services, such as CitationSaver, which allows researchers to record references to web content in their scientific articles, Memorial and Complete page, which facilitate access to content scattered throughout the huge 1 PetaByte block of data.

Where did so much information come from?

In order to reach the 1 PetaByte volume, Arquivo.pt periodically recorded content from websites in the .PT domain and from Portuguese websites in other domains.

In addition, frequent daily and monthly collections were made from a small number of government sites and the main news sites in Portugal.

As part of international collaborations, content was collected from sites in various languages, for example on the 2019 European Elections.

Content prior to 2008 came from the Internet Archive and donations, such as a collection made by the National Library and INESC on the 2005 Legislative Elections.

The largest Portuguese-language dataset available to researchers

By making 1 PetaByte of information available, in open access and through the use of APIs (Application Programming Interfaces), Arquivo.pt is a useful tool for research.

For example, a researcher who wants to do a study on elections in Portugal can use the entire Arquivo.pt collection. Better still, they can focus on just a few special collections dedicated to the elections, choosing the ones that interest them and downloading just a few Terabytes to process automatically with the APIs.

Contributions from the various teams and friends of Arquivo.pt

The development of Arquivo.pt is more than a technological issue and has been due to the dedication and persistence of the various teams that have worked on it since 2007.

It was also due to the contribution of many friends of Arquivo.pt, who were always on hand to help improve, and to the response of the user community.

Congratulations to all! Thank you.

World Digital Preservation Day dedicated to Justice

Last updated on November 13th, 2023 at 08:59 am

The Instituto de Gestão Financeira e Equipamentos da Justiça (IGFEJ) and Secretaria Geral do Ministério da Justiça (SGMJ), in collaboration with BAD, organized the event “Digital Preservation in Justice” to mark World Digital Preservation Day on November 2, 2023.

The event, which took place in the auditorium of the Polícia Judiciária in Lisbon, was attended by representatives from the government’s justice department and professionals from the archives, communications and IT departments.

How to use Arquivo.pt to preserve institutional websites

Arquivo.pt took part in the presentation “Preserve your website”, which addressed the issue of preserving institutional websites and critical aspects such as cybersecurity.

Justice entities can benefit from Arquivo.pt and its various services to ensure good preservation of their websites, mitigate cybersecurity threats and provide historical content to citizens.

The presentation concluded with the following recommendations:

  • Inventory and publicize your current and historical websites
  • Use Arquivo.pt services collaboratively
  • Save content in a standardized format with ArchiveWeb.page

Resources

University of Lisbon preserved over 100 historical websites in the Arquivo.pt Memorial

thumb-memorial-fcul

Last updated on March 27th, 2024 at 11:17 am

More than 100 historical websites from the Faculty of Sciences of the University of Lisbon (FCUL) are now accessible through the Memorial service of Arquivo.pt.

FCUL’s IT Department sent to Arquivo.pt a list of old websites hosted on its servers that were no longer updated, but whose historical content continues to be interesting to the community (e.g. websites of research projects or scientific events).

Arquivo.pt preserved these websites in collaboration with their ownersa, seeking to maintain a faithful representation of the published content for the future.

FCUL redirected the domain of each website to Arquivo.pt, and then, became able to disconnect the respective servers and  begin sparing the resources spent on their maintenance (e.g. electricity, data center space, human resources).

The show case of MiNEMA

print-memorial-example-minema-project

Landing page of www.minema.di.fc.ul.pt at Memorial do Arquivo.pt.

The MiNEMA scientific program website was the first that FCUL integrated into the Memorial. This website stopped being updated in 2009 when the project ended. FCUL invested resources in maintaining the website for another 10 years until it became necessary to suspend it down for cybersecurity reasons.

The Memorial of Arquivo.pt emerged as an option and since 2020, FCUL just needs to maintain the domain www.minema.di.fc.ul.pt while Arquivo.pt preserveS the information contained on the website.

Please note that the website’s content continues to be displayed in search engine results.

Follow FCUL and preserve your historical websites in the Memorial!

An increasing number of institutions are recurring to the Memorial of Arquivo.pt to safely preserve the content of their historical websites. For example, FCUL preserved 116 websites, the Government IT Network Management Center preserved 23 and the Foundation for Science and Technology preserved 40.

Public institutions have priority to benefit from this service. However, other entities can also request it as long as they own the website domain.

Identify your historical websites candidate to be integrated into the Memorial of Arquivo.pt and contact us!

To know more

Completing webpages from the past: it is possible!

Last updated on October 16th, 2023 at 06:59 pm

Some web-archived pages are reproduced incompletely due to problems occurred during the archiving process (e.g. deformatted or missing embedded images).

Complete page is a function of Arquivo.pt that allows to recover missing elements in web-archived pages, from other web archives or the original websites.

When a user views a page archived in Arquivo.pt, just needs to access the Options menu in the top right corner and choose Complete page.

This process is performed automatically.

How does Complete page work?

If you open a web-archived page that appears incomplete, try the Complete page option and wait.

Arquivo.pt will search for missing elements on the Internet and in other web archives using the Memento protocol. If it succeeds, the obtained elements will be immediately displayed on the web-archived page.

Later, these recovered elements are integrated into the Arquivo.pt collection, so that the web-archived page will appear more complete in the future accesses performed by any user.

complete-page-website-cristina-guerra-en

Completing the home page of artist Cristina Guerra’s website found a missing image.

For example, the website of artist Cristina Guerra archived in 2005 had a missing image. By using Complete page, it was possible in 2021 to obtain this missing image from another web archive which preserved it.

Participate in collaborative curation to improve the quality of Arquivo.pt!

Due to the high number of web-archived pages, it is not possible for Arquivo.pt to complete them all automatically. Therefore, the collaboration of users to identify important pages with missing elements and try to complete them is important.

By using Complete page, the users are contributing to improve the quality of the historical webpages preserved in Arquivo.pt!

Always give it a try to complete web-archived pages may that look incomplete. If you detect any problem, contact us.

Spread the word about the Arquivo.pt Complete page!

Arquivo.pt presentations at IIPC GA/WAC, RESAW 2023 and CLEOPATRA

Last updated on March 10th, 2024 at 05:23 pm

Meeting the Web Archive Community

The International Internet Preservation Consortium (IIPC), a consortium that brings together Web preservation initiatives from around the world, held its General Assembly with its members on May 10, 2023.

On the following days, May 11 and 12, the IIPC Web Archiving Conference (IIPC WAC) was held, an initiative open to the community, where people or entities not associated with the IIPC and interested in the Web preservation domain can participate.

The two events were jointly hosted by KB – National Library of the Netherlands, and by Beeld & Geluid – Netherlands Institute for Sound & Vision.

Contributions from the Arquivo.pt at the Web Archiving Conference

Arquivo.pt participated in the IIPC working group meetings (Training Working Group and Curators Working Group) and contributed with presentations in the thematic sessions Collaborations & Outreach and Program infrastructure (sessions 7 and 17).

  • Arquivo.pt updates 2023 (slides)
  • Linking web archiving with arts and humanities: the collaboration between ROSSIO and Arquivo.pt (video, slides)
  • Arquivo.pt behind the curtains (slides)

Meeting the RESAW research community

RESAW (Research Infrastructure for the Study of Archived Web Materials) is an initiative created in 2012 with the aim of promoting studies based on archived Web content, in areas such as Social Sciences, Digital Arts and Humanities.

The RESAW 2023 conference was held at the MUCEM Lab (Mediterranean Institute of Heritage Crafts) in Marseille on June 5-6, 2023, under the theme Exploring the Archived Web During a Highly Transformative Age.

Contributions from Arquivo.pt to RESAW 2023

Arquivo.pt contributed with presentations to the sessions Web Archive in Mediterranean area and its merge (4.A), From online Tools to Web Archive (6.B.), Towards a participatory approach to collections (9. A.), Digging up the materials for writing web history (9.B.).

  • How to research governmental web data? (abstract, slides)
  • Archiving Cryptocurrencies (abstract, slides)
  • Time to explore, time to learn from the archived web: Arquivo.pt training initiative (abstract, slides)
  • Exhibiting Web Memories from Arquivo.pt: a call for community participation (abstract, slides)

CLEOPATRA Project Meeting

The CLEOPATRA Project, led by the L3S Research Center at the Gottfried Wilhelm Leibniz University of Hannover, has developed since 2019 a training programme for doctoral researchers (Early Stage Researcher, PhD).

Arquivo.pt has participated in three courses: Incentives design for hybrid multilingual information processing and analytics, in Southampton; National and transnational media coverage of European parliamentary elections, 2004-2014, London; and NLP for under-resourced languages, in Zagreb, Croatia.

In 2022, the Arquivo.pt welcomed two researchers in its facilities who used the archived resources and received special support from the Arquivo.pt team to develop their research.

The CLEOPATRA Project ended in 2023 with a meeting on the 16th May, in Hannover, which brought together Professors, Researchers and representatives of the institutions involved.

Daniel Gomes, Arquivo.pt’s Manager, highlighted the new tools that Arquivo.pt makes available and the results of the work carried out by the researchers that have passed through Arquivo.pt.

Virtual Museum of Tourism MUVITUR created a collection of preserved Websites

Coleção registos no Catálogo do MUVITUR com páginas Web preservadas no Arquivo.pt

Last updated on February 26th, 2024 at 09:07 am

MUVITUR – Virtual Museum of Turism is a portal that aggregates digital content about Tourism in Portugal.

The platform is maintained by the Celestino Domingues Library of The Estoril Higher Institute for Tourism and Hotel Studies (ESHTE) and has the participation of institutions from various areas of heritage that are content providers.

Among the digitized contents that can be consulted in the catalog and accessed in the provider institutions were sound, image, photography, printed material, but websites were missing.

Thus, the idea for the MUVITUR’s new “Web Pages” collection emerged.

Collaboration between MUVITUR and Arquivo.pt

In 2019, a collaboration between Arquivo.pt and MUVITUR began with the aim of identifying websites related to Tourism in Portugal and to disseminate the history of content published on the Web since 1996.

In 2022, a list was established with about 400 records of websites of various entities related to tourism, hotels, travel agencies, pages of municipalities’ websites dedicated to tourism and others.

This database resulted in the first collection of preserved websites about Tourism in Portugal.

Collection of records in the MUVITUR catalog with webpages preserved at Arquivo.pt. 

How the integration was done

MUVITUR uses Nyron software, which allows content from different sources to be aggregated using the OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) interoperability protocol, which is very common among libraries, archives and museums to provide content to portals such as Europeana.

Arquivo.pt, however, does not make information available through OAI-PMH so it was necessary to find alternative ways to create a record in Nyron with descriptive information from preserved sites.

The procedure for integration was as follows:

  • The XML schema with the fields for the metadata, according to what works in Nyron, was exported to an Excel sheet.
  • The information was entered manually, respecting the format and syntax, in collaboration with the computer technicians.
  • The XML file with the inserted data was validated and imported into Nyron.

Creating records in catalogs is largely a manual task and requires human curation. However, it was possible to input information to be automatically processed in the records of the Website collection. For example, the thumbnail was obtained using the Arquivo.pt API, more specifically the linkToScreenShot, visible in the technical details of a preserved page (see the options menu on the top right of a replayed page).

For other elements, such as the site’s title, it would be possible to obtain them automatically through the Arquivo.pt API, however the quality of the information depends on what the site’s producers have inserted and may not be accurate. The dates to limit the temporal scope can also be obtained automatically, but the manual method was chosen to control the information presented.

In the continuation of the project, the collection will be increased with new records, as there are thousands of websites about the Tourism sector.

Description of Web contents in the MUVITUR catalog

In the collection “Paginas Web” the following data are used:

  • Denomination – usually the title of the website
  • Organization – the entity to which the publication belongs
  • Website address on the Internet
  • Address for version in Arquivo.pt
  • Moment(s) to remember
  • Link for miniature in Arquivo.pt
  • Descriptors
  • Geographical data (location, coordinates, geographical name)

The presentation of the information was adjusted to be aligned with that of other MUVITUR resources and contains links to Arquivo.pt.

For example, in the register of the Turismo do Algarve site, we find a link to a moment to remember in 2011 and another link to the history in Arquivo.pt under “Consultar objecto”.

Detalhe do registo do site "Turismo do Algarve"
Detalhe do registo do site Turismo do Algarve

Organizations can create collections of Websites from their area

The National Library of Australia, for example, included records of preserved Websites in its catalog. In the Library of Congress there are collections of old Websites alongside traditional resources.

However, websites are rarely included in  museums.

With this unprecedented project we can say that preserved Web sites have gained citizenship in digital platforms dedicated to cultural heritage.

MUVITUR has paved the way with this project for other entities to create collections of websites of their interest on their own platforms.

Other results of the collaboration

CitationSaver preserves citations to web resources

Last updated on April 20th, 2023 at 09:37 pm

Documents cite web content by referencing their URLs so that readers can later access them.

In the case of scientific articles, the importance of these citations is even greater to maintain the integrity of research works because they often reference essential information to enable the reproducibility of an experiment or analysis.

For example, links in a scientific article may cite the datasets, software or web news that supported the research, which are not included in the text of the article.

To respond to the need of preserving the integrity of documents, Arquivo.pt launched the CitationSaver.

CitationSaver automatically extracts cited links in a document and preserves their content (e.g. web pages cited in a book) so that they can be retrieved later from Arquivo.pt.

infografia-citationsaver-en

Use CitationSaver to preserve the integrity of your documents

Upload a document and CitationSaver will extract the cited URLs, archive their content and make it available on Arquivo.pt after a short notice. There are 3 methods to upload a document:

  • insert the address (URL) of the PDF or TXT file, if it is published online
  • upload the file in PDF or TXT format
  • paste the text containing the addresses you want to preserve (e.g. References section of an article or Bibliography of a book).

More information