Arquivo.pt is finalist for the DPC Awards 2024

dpc-award-thumb

Last updated on August 12th, 2024 at 11:50 am

The Digital Preservation Coalition Awards

The Digital Preservation Coalition (DPC) is dedicated to promoting digital preservation and associated best practices.

The DPC Awards promote exemplary and innovative digital preservation use cases from all over the world.

The Arquivo.pt team submitted two applications to the DPC Awards 2024 in the categories of “Safeguarding the Digital Legacy” and “Research and Innovation”.

The Award for Safeguarding the Digital Legacy celebrates the practical application of preservation tools to protect at-risk digital objects.

The Award for Research and Innovation recognizes excellence in practical research and innovation activities.

Arquivo.pt applications to the DPC Awards

#1 Arquivo.pt catalog of tools for digital preservation

Information that rules modern-day lives is born-digital and disseminated online. However, invaluable digital objects published online have been continuously lost.

Arquivo.pt is a public infrastructure which supports the preservation of digital objects published online to safeguard this digital legacy for future generations.

Thus, in October 2023 after 15 years of research and development, Arquivo.pt released a Catalog of 13 innovative tools to support the preservation of at-risk online content, from acquisition to dissemination (e.g. search and access, APIs, training, open data sets, exhibitions).

Arquivo.pt safeguards online digital objects of worldwide interest for research and education.

The Arquivo.pt Catalog was selected as finalist to the Safeguarding the Digital Legacy Award.

#2 Searching preserved web-images

Images published online are precious digital assets that document contemporary times for future generations.

This initiative describes the research and development of an innovative image search system that enables the discovery and access to billions of preserved images acquired from the web since the 1990s.

This research was applied to enhance the Arquivo.pt web archive with an image search service publicly available to any Internet user, officially launched in August 2022.

The resulting scientific and technical publications are available in open-access and the developed software is available as free open-source to be reused and enhanced by the community.

This work on searching images preserved in web archives applied for the Research and Innovation Award.

Know more

Analysis of the Arquivo.pt query dataset

demo-wordcloud-arquivopt3

Last updated on December 10th, 2024 at 03:12 pm

Arquivo.pt query logs are unique resources for research

Arquivo.pt provides a “Google-like” service that enables searching pages and images collected from the web since the 1990s. Notice that Arquivo.pt search complements live-web search engines because it enables temporal search over information that is no longer available online on its original websites.

Analyzing user behavior is an important research topic to understand users’ information needs and enhance the quality of search results. Thus, when a user interacts with a search engine, the system records the user’s actions in a file called the query log. Query logs from web archives are unique resources for research because they describe the real needs of web-archive users about the historical information published online over time.

Research case study

Flavie Gallois and Adam Jatowt from the University of Innsbruck, and Ricardo Campos from the University of Beira Interior and INESC TEC analyzed user search behavior based on the Arquivo.pt search query log dataset collected over a period of 3 months from June to September 2021 (Analyzing User Search Behaviour in Temporal Web Repositories through Search Query Log Analysis).

This study analyzed query features such as length, type or frequency and compared the obtained results with previous work about user search behavior over web-archives and live-web search engines.

This study revealed interesting trends and patterns about how users search for information within web archives, with strong potential for future research work.

How do web-archive users search?

Figure 1 : Distribution of country origin of users
Figure 1 : Distribution of country origin of users
Figure 2: Distribution of languages used in queries
Figure 2: Distribution of languages used in queries

The users came from Portugal in 85.7% of the queries. However, the Portuguese language was identified through automatic language identification of queries as being used in only 37% of the queries. This suggests that users apply other languages than their own to search in web archives.

Users of Arquivo.pt tend to use longer queries with more words and characters in comparison to previous studies, both over web archives and live-web search engines. About 92% of the queries had 5 or fewer terms (average of 25 characters), with 3 being the most common number of submitted terms. In previous work about search behavior in web archives, it was observed that users tended to submit from 1 to 3 terms per query, with 1 term as the most common submission.

Users tend to issue multiple queries within a session instead of a single query, possibly indicating a need for refining their search queries or exploring multiple options for inquiry.

87,7% of the queries submitted to Arquivo.pt used Desktop Browsers, despite Arquivo.pt providing mobile-friendly user interfaces. Old web-archived pages are not responsive and render poorly on mobile devices. Thus, it is expectable that users mostly use web archives through their desktops.

Figure 3: Arquivo.pt users can refine the time span of their queries by using the From and To datepickers.
Figure 3: Arquivo.pt users can refine the time span of their queries by using the From and To datepickers.

Users refined the time span of the search (using the datepickers) in about 50% of queries which indicates awareness of temporal needs peculiar to web-archive usage. Interestingly, users modified the From datepicker more frequently than the To datepicker. Notice that keeping the default time span may fit the user information needs and does not necessarily indicate the lack of awareness about the existence of the function to define time span (peculiar to web-archive search).

Only a small percentage of users included specific years in their query terms (4%), potentially suggesting that in these cases the time span function was insufficient, or unnoticed by some users.

The obtained results suggest that users are more conscious of their information needs and have improved their search techniques to be more effective over web-archives instead of just using them out of curiosity as first-comers.

What is searched in a web-archive?

The authors of the study applied automatic named entity recognition over the user queries and derived a set of word clouds that graphically provide a glimpse of the most common information needs of Arquivo.pt users:

Figure 4: Word cloud of the most frequent query terms submitted to Arquivo.pt.
Figure 4: Word cloud of the most frequent query terms submitted to Arquivo.pt.
Figure 6: The most frequent Geographical Locations in query terms submitted to Arquivo.pt.
Figure 6: The most frequent Geographical Locations in query terms submitted to Arquivo.pt.
Figure 6: The most frequent Organizations in query terms submitted to Arquivo.pt
Figure 6: The most frequent Organizations in query terms submitted to Arquivo.pt
Figure 7: The most frequent Persons in query terms submitted to Arquivo.pt.
Figure 7: The most frequent Persons in query terms submitted to Arquivo.pt.

Access to research Arquivo.pt query dataset

Arquivo.pt released a set of resources to support research studies over its

Query log dataset

Original log files (samples)

Documentation

Evaluation Metrics for web-archive search

The first step to understand user behavior is to define evaluation metrics. Defining metrics is a powerful tool to set long and short-term goals to decide which new products and features should be released to the users.

We share a work-in-progress report which aggregates information about Web Archive Search Evaluation Metrics. This contributes to comparing users’ search behavior between live-web and web-archive search engines. Feel free to comment directly on the collaborative document or to contact us.

This report also provides a summary of references about previous work, query workflows and structure of the corresponding query logs produced by Arquivo.pt, to facilitate the work from the researchers to study these data sets.

Know more

Artificial Intelligence processes data from Arquivo.pt

Artificial Intelligence AI

Last updated on July 16th, 2024 at 08:33 am

Artificial Intelligence (AI), covers various areas of knowledge, such as linguistics and computing, and is present in the new technologies used by citizens on a daily basis.

For example, when we search for information on the Internet and the computer generates an amazingly accurate response, in a language very close to our own.

Natural Language Processing (NLP) is what allows machines to perfect the algorithm that generates these answers tailored to Internet users.

The problem is that natural language processing models have been developed more for the English language and less for Portuguese and other languages with less representation.

The more the processing models are trained on a language, the better they will be able to interpret the complexities of the language. But this is only possible if quality data is available.

Portuguese text collection on Arquivo.pt available for research

Arquivo.pt appears here as the largest Portuguese-language textual dataset in Portugal, available in open access, for researchers to train NLP models.

In recent years, researchers from various research groups and projects have drawn attention to the usefulness of preserved web data for large-scale processing.

Arquivo.pt has more than 1 Petabyte of preserved web content dating back to the 1990s, including everything that can be found on web pages. It’s not just text, but also images, audio files, video, page code and various metadata.

The content is accessible via the search interface and the Arquivo.pt APIs.

In order to make it easier to download archived resources from the web, Arquivo.pt has created indexes for researchers in CDXJ format.

GlórIA, a model for the Portuguese language

One of the projects that used Arquivo.pt to obtain large amounts of text is called GlórIA and is a large-scale language model (LLM) focused on the European Portuguese language.

“Despite the abundance of LLMs for many high-resource languages, the availability of such models remains limited for European Portuguese” as the authors of GlórIA project, Ricardo Lopes, João Magalhães, David Semedo, researchers at the NOVA School of Science and Technology, explain in their article GlórIA – A Generative and Open Large Language Model for Portuguese.

The model used 35 billion tokens, or expressions that machines can process, from various sources.

Arquivo.pt contributed a collection of 1.4M European Portuguese archived news and periodicals.

You can try generating text in European Portuguese using the GlórIA API inference on the Hugging Face Model card.

If you want to develop a project or study using Arquivo.pt, you can start your research and, if you need help, contact us.

Know more

Prepare a work for the Arquivo.pt Award 2024!

1080x556-EN-arquivopt-award

Last updated on August 6th, 2024 at 05:15 pm

1080x556-EN-arquivopt-award

Until May 6, 2024, Arquivo.pt is launching the challenge of creating a work based on  historical information preserved from the Web.

In this 7th edition of the Arquivo.pt Award, €15,000 will be awarded to the 3 best works (€10,000 for 1st place), plus 3 honorable mentions.

Know more at: arquivo.pt/award

Honorable mentions for authors and professors

To promote the use of the Arquivo.pt in the context of teaching, research or professional usage, three partners institutions promoted honorable mentions with an associated prize.

  • The Público newspaper will award an Honorable Mention to works based on the Público online content preserved by Arquivo.pt.
  • The Aveiro Media Competence Center (AMCC) will award an Honorable Mention to the best work on the web archive of one or more Portuguese online media.
  • Association DNS.PT will award an Honorable Mention to a professor or teacher who has encouraged the submission of works.

Share and spread the word!

Help us spreading the word about the Arquivo.pt Award 2024 among potential candidates!

Cross-lingual research datasets on 2019 European Parliamentary Elections

Daniel Gomes and Diego Alves at presenting at CLEOPATRA final event

Last updated on December 13th, 2024 at 01:55 pm

Arquivo.pt preserved online documents in several languages about the 2019 European Parliamentary Elections

The 2019 European Parliamentary Elections were an event of international relevance. The strategy to preserve the relevant information on the World Wide Web is delegated to national institutions. However, the preservation of web pages that document transnational events is not officially assigned. 

The Arquivo.pt team, with the aim of preserving the cross-lingual online content that documents this event, applied a combination of human and automatic selection processes.

The process of generating the collection about the 2019 European Parliamentary Elections was performed in two steps.

In the first step, 40 relevant terms in Portuguese about the 2019 European Parliamentary Elections were identified, and then, automatically translated into the 24 official languages of the European Union: Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Irish, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish and Swedish. 

These translations were reviewed in collaboration with the Publications Office of the European Union. Besides that, in parallel, a collaborative list was launched to gather contributions of relevant seeds from the international community.

In the second step, the Arquivo.pt team iteratively ran 6 crawls (99 million web files, 4.8 TB) using different configurations and crawling software, to maximize the quality of the collected content. 

The obtained web-data was aggregated into one special collection identified as EAWP23 and became searchable and accessible through Arquivo.pt in July 2020 (https://arquivo.pt/ee2019).

CLEOPATRA project: Cross-lingual Event-centric Open Analytics Research Academy

Daniel Gomes and Diego Alves at presenting at CLEOPATRA final event
Daniel Gomes and Diego Alves at presenting at CLEOPATRA final event

The CLEOPATRA ITN was a Marie Skłodowska-Curie Innovative Training Network aimed to generate ways to better understand the massive digital coverage of major events in Europe over the past decades. 

The main goal was to facilitate advanced cross-lingual processing of textual and visual information related to key contemporary events at large scale and develop innovative methods for efficient access and interaction with multilingual information.

In total, 14 Early-Stage Researchers hosted across 9 European Universities developed their research while enrolled as Ph.D. students. 

Associated partners such as Arquivo.pt contributed to CLEOPATRA by hosting and training early-stage researchers such as Diego Alves. As part of the training program, he conducted a secondment at Arquivo.pt in Lisbon from June to August 2022. 

The idea was to develop part of his research about syntactic structures of EU languages using the textual resources preserved by the Arquivo.pt and exchange knowledge with the web-archiving experts on the strategies to extract and process historical web-data. 

Diego Alves defended his Ph.D thesis entitled Computational typological analysis of syntactic structures in European languages in July 2023 at the Faculty of Humanities and Social Sciences of the University of Zagreb (Croatia). 

Generating textual datasets for Natural Language Processing

Diego Alves’ work originated cross-lingual datasets about the 2019 European Parliamentary Elections precious for research.

This work will be detailed in chapter “Robustness of Corpus-based Typological Strategies for Dependency Parsing” of the open-access CLEOPATRA book entitled “Event Analytics across Languages and Communities”.

A 3-step Natural Language Processing pipeline was developed to generate research textual datasets that can be used in several types of digital humanities studies:

  1. Extract text: The textual content was extracted from each web-archived URL using the newspaper3k Python library. The language of each extracted text was determined using the langdetect library, to separate the texts written in different languages across distinct files;
  2. Clean extracted texts: a Python script was applied to clean the texts by removing unnecessary information (e.g.: repeated instances, empty lines, etc.);
  3. Double-check of language identification: the language of each cleaned extracted text was verified again to eliminate possible errors originated during the previous steps.

Two new research datasets are openly available!

The result was a dataset of cleaned and language-verified texts publicly available. Each file contains the texts in a given language about the 2019 European Union Elections. The distribution of extracted texts for each language is described in the figure below:

Number of tokens of each corpus extracted from the collection 2019 European Union Elections preserved by Arquivo.pt (EAWP23).
Number of tokens of each corpus extracted from the collection 2019 European Union Elections preserved by Arquivo.pt (EAWP23).

The aforementioned corpus was automatically annotated regarding part-of-speech and dependency relations to generate a corpus with syntactic information which is useful for linguistic studies. 

The multilingual model of the UDify tool (Kondratyuk and Straka, 2019) was applied. 

The texts in these annotated corpora followed the same order of the respective raw-texts files. Each sentence is annotated following the Universal Dependencies framework in the CoNNL-U format, which is the reference in terms of syntactic annotation in Natural Language Processing. Thus, each file in this dataset contains the annotated texts in a given language about the 2019 European Union Elections

The syntactically annotated texts about the 2019 European Elections are publicly available!

Know more

Arquivo.pt presentations at IIPC GA/WAC, RESAW 2023 and CLEOPATRA

Last updated on March 10th, 2024 at 05:23 pm

Meeting the Web Archive Community

The International Internet Preservation Consortium (IIPC), a consortium that brings together Web preservation initiatives from around the world, held its General Assembly with its members on May 10, 2023.

On the following days, May 11 and 12, the IIPC Web Archiving Conference (IIPC WAC) was held, an initiative open to the community, where people or entities not associated with the IIPC and interested in the Web preservation domain can participate.

The two events were jointly hosted by KB – National Library of the Netherlands, and by Beeld & Geluid – Netherlands Institute for Sound & Vision.

Contributions from the Arquivo.pt at the Web Archiving Conference

Arquivo.pt participated in the IIPC working group meetings (Training Working Group and Curators Working Group) and contributed with presentations in the thematic sessions Collaborations & Outreach and Program infrastructure (sessions 7 and 17).

  • Arquivo.pt updates 2023 (slides)
  • Linking web archiving with arts and humanities: the collaboration between ROSSIO and Arquivo.pt (video, slides)
  • Arquivo.pt behind the curtains (slides)

Meeting the RESAW research community

RESAW (Research Infrastructure for the Study of Archived Web Materials) is an initiative created in 2012 with the aim of promoting studies based on archived Web content, in areas such as Social Sciences, Digital Arts and Humanities.

The RESAW 2023 conference was held at the MUCEM Lab (Mediterranean Institute of Heritage Crafts) in Marseille on June 5-6, 2023, under the theme Exploring the Archived Web During a Highly Transformative Age.

Contributions from Arquivo.pt to RESAW 2023

Arquivo.pt contributed with presentations to the sessions Web Archive in Mediterranean area and its merge (4.A), From online Tools to Web Archive (6.B.), Towards a participatory approach to collections (9. A.), Digging up the materials for writing web history (9.B.).

  • How to research governmental web data? (abstract, slides)
  • Archiving Cryptocurrencies (abstract, slides)
  • Time to explore, time to learn from the archived web: Arquivo.pt training initiative (abstract, slides)
  • Exhibiting Web Memories from Arquivo.pt: a call for community participation (abstract, slides)

CLEOPATRA Project Meeting

The CLEOPATRA Project, led by the L3S Research Center at the Gottfried Wilhelm Leibniz University of Hannover, has developed since 2019 a training programme for doctoral researchers (Early Stage Researcher, PhD).

Arquivo.pt has participated in three courses: Incentives design for hybrid multilingual information processing and analytics, in Southampton; National and transnational media coverage of European parliamentary elections, 2004-2014, London; and NLP for under-resourced languages, in Zagreb, Croatia.

In 2022, the Arquivo.pt welcomed two researchers in its facilities who used the archived resources and received special support from the Arquivo.pt team to develop their research.

The CLEOPATRA Project ended in 2023 with a meeting on the 16th May, in Hannover, which brought together Professors, Researchers and representatives of the institutions involved.

Daniel Gomes, Arquivo.pt’s Manager, highlighted the new tools that Arquivo.pt makes available and the results of the work carried out by the researchers that have passed through Arquivo.pt.

Portuguese municipal elections 2021 preserved by Arquivo.pt

thumbnail_eleicoes_autarquicas

Last updated on May 8th, 2023 at 05:09 pm

Thousands of pages about the elections to preserve before they disappear

On 26 September 2021 the local elections were held in Portugal, an event marked by the Covid-19 pandemic. The communication of the candidates was mainly based on the media and publications through the Web.

Electoral websites are of manifest historical importance. However, they are difficult to identify because they appear and disappear quickly. In the case of municipal elections, the number of candidates and the variety of channels used makes the task even more challenging.

Arquivo.pt, as in previous elections, launched a special collection to preserve contents concerning the municipal elections.

How was the electoral content published on the Web identified

The first step was the manual identification of election-related content by municipality and parish. For this purpose help was requested from people and organisations with the following initiatives:

  • collaborative list “Municipal Elections 2021: we need your help!
  • request for collaboration from the archive services of the 308 municipalities in the identification of electoral sites and candidates of the respective municipality;
  • request to the Parties to send the names of their lead candidates.

The Eyedata – Social Data Lab site was used, which made the names of candidates from all over the country available on the Web.  The Wikipedia page Eleições autárquicas portuguesas de 2021 was also used as a source of information.

This manual identification process resulted in a list of 255 addresses which documented the candidacies for the 2021 Municipal Elections. Notice that 61% of the identified addresses pointed to private social media platforms: 54% facebook.com, 5% instagram.com and 2% twitter.com).

Much of this content of national interest could not be preserved because these foreign private companies do not allow it.

The list with names of candidates by county, party or coalition was used to create automatic searches in Bing that identified the most relevant electoral contents.

For instance, by combining the term “autárquicas 2021” with the name of a candidate and the respective municipality, one obtains results related to that candidate, such as news, initiatives of his/her campaign or the official page of his/her electoral campaign.

This methodology was applied in the Presidential Elections 2021 and in the Europeia Elections 2019. The technical report A transnational crawl of the European Parliamentary Elections 2019 details the applied methodology.

Content collection and availability in Arquivo.pt

Between 22nd August and 8th October 2021, the Arquivo.pt gathered, in an exhaustive manner, pages related to the Local Government Elections 2021.

The resulting collection called Municipal Elections 2021″ (EAWP39) gathers 31 million files that total 2.7 TeraBytes of information and will be available one year later.

Researchers who want to make a study on the 2021 Local Elections and need early access to the collected contents can contact Arquivo.pt.

To know more

H2020 projects preserved by Arquivo.pt

Thumbnail H2020 projects

Last updated on August 5th, 2024 at 04:50 pm

The main objective of Arquivo.pt is to preserve online information for research and education purposes.

Previously, Arquivo.pt identified and preserved Research & Development project websites funded by the European Union during the FP4, FP5, FP6 and FP7 programmes (1994-2013).

Now, Arquivo.pt contributed to preserve online information that documents R&D projects funded by the Horizon 2020 programme (2014-2021). It preserved 197 million web files (17 TB) related to science for future access.

H2020 projects publish valuable information online but are being lost

Websites about Research and Development (R&D) projects are increasingly being used to publish and disseminate important scientific information that complements published literature (e.g. data sets, documentation or software).

However, after projects ending, the corresponding websites usually disappear causing a permanent loss of unique and valuable scientific information.

Arquivo.pt automatically identified URLs that document H2020 Research and Development projects

The European Union’s Open Data Portal published a data set from the Community Research and Development Information Service (CORDIS) that documents H2020 research projects. However, from the 31 129 projects listed, only 46% presented a project URL.

Arquivo.pt developed a low-cost methodology that automatically identifies URLs related to R&D projects to be systematically preserved. This automatic identification is achieved through the combination of open data sets with web search services. This methodology is detailed on a scientific article published at the International Conference on Digital Preservation 2016.

In sum, we extracted 106 300 unique URLs from the following open data sets:

Then, we extracted the acronym and title of the projects from the data sets and automatically searched the web for additional URLs using the Bing Search API.

All the data sets and tools developed have been made publicly available in open access so that they can be reused and collaboratively enhanced. In particular, you can access the software developed to automatically identify additional URLs about H2020 projects.

197 million web files related to science were preserved

Arquivo.pt identified and preserved 197 million web files (17 TB) that document R&D projects funded by Horizon 2020.

In 2021, we can already witness project websites that are no longer available online, such as the Extended Model of Organic Semiconductors (EXTMOS) project (http://extmos.eu/). However, it was preserved and can be accessed at Arquivo.pt:

Archived version at Arquivo.pt (https://arquivo.pt/wayback/20170427182603/http://extmos.eu/) of the home page of the EXTMOS Research and Development project (http://extmos.eu/)funded by H2020.
Archived version at Arquivo.pt of the home page of the EXTMOS Research and Development project funded by H2020.

Contributions to complement the European Open Data Sets

All the resulting data sets were made publicly available so that they can be improved and reused by other organizations also interested on preserving this digital heritage:

If you want to know more information about this collection you can watch the video Preservation of web content related to Horizon 2020.

References

Are you a researcher?

Create automatic narratives about any topic!

thumbnail-narrative-q2

Last updated on August 5th, 2024 at 04:51 pm

Arquivo.pt provides a new function that allows you to automatically create temporal narratives on any topic.

The “Narrative” functionality, integrated into Arquivo.pt in September 2021, is the result of the collaboration between “Conta-me Histórias”, winner of the Arquivo.pt Award 2018, and Arquivo.pt.

The Conta-me Histórias” (Tell me Stories) project was developed by researchers from the Laboratory of Artificial Intelligence and Decision Support (LIAAD – INESCTEC )  and affiliated to the institutions Instituto Politécnico de Tomar – Center for Research in Smart Cities (CI2) ; University of Porto and University of Innsbruck .

How it works?

When a user enters a set of words about a topic in the Arquivo.pt search box and clicks on the “Narrative” button, the user is directed to the “Conta-me Histórias” service, which automatically analyzes the news from 25 websites archived by Arquivo.pt over time and presents a chronology of news related to the topic.

For example, if we search for “Just Bieber” and click on the “Narrative” button (Figure 1), we will be directed to the “Conta-me Histórias” , where we will automatically obtain a narrative of archived news (Figure 2).

example-narrative-arquivopt

Figure 1: Search results for pages about “Justin Bieber”.

example-tell-me-stories-arquivopt

Figure 2: Narrative of news about “Justin Bieber” from Portuguese news sites preserved by Arquivo.pt generated by the “Conta-me Histórias” service.

Create your narrative now!

“Conta-me Histórias” researches, analyzes and aggregates thousands of results to generate each narrative about a topic. It is recommended to choose descriptive words about well-defined themes, personalities or events to obtain good narratives.

Creating a narrative is useful for researchers, journalists or citizens who want to quickly get an overview of the evolution of a topic along time, thus saving them a lot of time and effort.

Go to Arquivo.pt and try to create a narrative about a theme of your choice.

Tell us about your experience so we can improve the service!

Presentations in the IIPC Web Archiving Conference and RESAW 2021

Thumbnail IIPC WAC 2021

Last updated on August 4th, 2024 at 06:22 pm

During the week of 14 to 18 June, three international meetings were held by videoconference with the participation of the Arquivo.pt:

    • International Internet Preservation Consortium (IIPC) – General Assembly – general assembly of the consortium that gathers the Web archiving initiatives around the world
    • Web Archiving Conference 2021 – the most important meeting in the field of Web preservation, where experts share new knowledge and experiences
    • RESAW Conference – meeting of the European RESAW network (Research Infrastructure for the Study of Archived Web Materials) this year in its 4th edition, mainly addressed to the community of researchers from non-technological scientific areas, such as Social Sciences, Arts and Humanities.

Contributions of Arquivo.pt to the international community

Arquivo.pt presented some results of the work developed in the last year, with emphasis on the functionalities that improve the reproduction of the archived contents, such as the “Complete the page”.
Two historical collections were integrated on the Arquivo.pt: the Geocities and the Internet Memory Foundation. Arquivo.pt did special collections about the 2019 European Elections and Covid-19.
The contents of Arquivo.pt are accessible to any researcher regardless of the country they are in and therefore it is a useful service to the international community.

Presentations

  • Arquivo.pt updates 2021: presentation at the IIPC – General Assembly, by Daniel Gomes (Vídeo)
  • Complete the page. 1 minute drop in (presentation at the IIPC – General Assembly “complete the page”), by Daniel Gomes (Slide, Video)
  • A transnational and cross-lingual crawl of the European Parliamentary Elections 2019, by Ivo Branco (Slides, Vídeo)
  • Enhancing access to research the Geocities historical collection, by Pedro Gomes (Slides, Vídeo)
Complete the page - demo
Complete the page – demo. Slide used in the IIPC 1 minute presentation, at the IIPC General Assembly 2021