Portuguese municipal elections 2021 preserved by Arquivo.pt

thumbnail_eleicoes_autarquicas

Last updated on July 15th, 2022 at 11:42 am

Thousands of pages about the elections to preserve before they disappear

On 26 September 2021 the local elections were held in Portugal, an event marked by the Covid-19 pandemic. The communication of the candidates was mainly based on the media and publications through the Web.

Electoral websites are of manifest historical importance. However, they are difficult to identify because they appear and disappear quickly. In the case of municipal elections, the number of candidates and the variety of channels used makes the task even more challenging.

Arquivo.pt, as in previous elections, launched a special collection to preserve contents concerning the municipal elections.

How was the electoral content published on the Web identified

The first step was the manual identification of election-related content by municipality and parish. For this purpose help was requested from people and organisations with the following initiatives:

  • collaborative list “Municipal Elections 2021: we need your help!
  • request for collaboration from the archive services of the 308 municipalities in the identification of electoral sites and candidates of the respective municipality;
  • request to the Parties to send the names of their lead candidates.

The Eyedata – Social Data Lab site was used, which made the names of candidates from all over the country available on the Web.  The Wikipedia page Eleições autárquicas portuguesas de 2021 was also used as a source of information.

The list with names of candidates by county, party or coalition was used to create automatic searches in Bing that identified the most relevant electoral contents.

For instance, by combining the term “autárquicas 2021” with the name of a candidate and the respective municipality, one obtains results related to that candidate, such as news, initiatives of his/her campaign or the official page of his/her electoral campaign.

This methodology was applied in the Presidential Elections 2021 and in the Europeia Elections 2019. The technical report A transnational crawl of the European Parliamentary Elections 2019 details the applied methodology.

Content collection and availability in Arquivo.pt

Between 22nd August and 8th October 2021, the Arquivo.pt gathered, in an exhaustive manner, pages related to the Local Government Elections 2021.

The resulting collection called Municipal Elections 2021″ (EAWP39) gathers 31 million files that total 2.7 TeraBytes of information and will be available one year later.

Researchers who want to make a study on the 2021 Local Elections and need early access to the collected contents can contact Arquivo.pt.

To know more

H2020 projects preserved by Arquivo.pt

Thumbnail H2020 projects

Last updated on December 6th, 2021 at 05:03 pm

The main objective of Arquivo.pt is to preserve online information for research and education purposes.

Previously, Arquivo.pt identified and preserved Research & Development project websites funded by the European Union during the FP4, FP5, FP6 and FP7 programmes (1994-2013).

Now, Arquivo.pt contributed to preserve online information that documents R&D projects funded by the Horizon 2020 programme (2014-2021). It preserved 197 million web files (17 TB) related to science for future access.

H2020 projects publish valuable information online but are being lost

Websites about Research and Development (R&D) projects are increasingly being used to publish and disseminate important scientific information that complements published literature (e.g. data sets, documentation or software).

However, after projects ending, the corresponding websites usually disappear causing a permanent loss of unique and valuable scientific information.

Arquivo.pt automatically identified URLs that document H2020 Research and Development projects

The European Union’s Open Data Portal published a data set from the Community Research and Development Information Service (CORDIS) that documents H2020 research projects. However, from the 31 129 projects listed, only 46% presented a project URL.

Arquivo.pt developed a low-cost methodology that automatically identifies URLs related to R&D projects to be systematically preserved. This automatic identification is achieved through the combination of open data sets with web search services. This methodology is detailed on a scientific article published at the International Conference on Digital Preservation 2016.

In sum, we extracted 106 300 unique URLs from the following open data sets:

Then, we extracted the acronym and title of the projects from the data sets and automatically searched the web for additional URLs using the Bing Search API.

All the data sets and tools developed have been made publicly available in open access so that they can be reused and collaboratively enhanced. In particular, you can access the software developed to automatically identify additional URLs about H2020 projects.

197 million web files related to science were preserved

Arquivo.pt identified and preserved 197 million web files (17 TB) that document R&D projects funded by Horizon 2020.

In 2021, we can already witness project websites that are no longer available online, such as the Extended Model of Organic Semiconductors (EXTMOS) project (http://extmos.eu/). However, it was preserved and can be accessed at Arquivo.pt:

Archived version at Arquivo.pt (https://arquivo.pt/wayback/20170427182603/http://extmos.eu/) of the home page of the EXTMOS Research and Development project (http://extmos.eu/)funded by H2020.
Archived version at Arquivo.pt of the home page of the EXTMOS Research and Development project funded by H2020.

Contributions to complement the European Open Data Sets

All the resulting data sets were made publicly available so that they can be improved and reused by other organizations also interested on preserving this digital heritage:

If you want to know more information about this collection you can watch the video Preservation of web content related to Horizon 2020.

References

Are you a researcher?

Create automatic narratives about any topic!

thumbnail-narrative-q2

Arquivo.pt provides a new function that allows you to automatically create temporal narratives on any topic.

The “Narrative” functionality, integrated into Arquivo.pt in September 2021, is the result of the collaboration between “Conta-me Histórias”, winner of the Arquivo.pt Award 2018, and Arquivo.pt.

The Conta-me Histórias” (Tell me Stories) project was developed by researchers from the Laboratory of Artificial Intelligence and Decision Support (LIAAD – INESCTEC )  and affiliated to the institutions Instituto Politécnico de Tomar – Center for Research in Smart Cities (CI2) ; University of Porto and University of Innsbruck .

How it works?

When a user enters a set of words about a topic in the Arquivo.pt search box and clicks on the “Narrative” button, the user is directed to the “Conta-me Histórias” service, which automatically analyzes the news from 25 websites archived by Arquivo.pt over time and presents a chronology of news related to the topic.

For example, if we search for “Just Bieber” and click on the “Narrative” button (Figure 1), we will be directed to the “Conta-me Histórias” , where we will automatically obtain a narrative of archived news (Figure 2).

example-narrative-arquivopt

Figure 1: Search results for pages about “Justin Bieber”.

example-tell-me-stories-arquivopt

Figure 2: Narrative of news about “Justin Bieber” from Portuguese news sites preserved by Arquivo.pt generated by the “Conta-me Histórias” service.

Create your narrative now!

“Conta-me Histórias” researches, analyzes and aggregates thousands of results to generate each narrative about a topic. It is recommended to choose descriptive words about well-defined themes, personalities or events to obtain good narratives.

Creating a narrative is useful for researchers, journalists or citizens who want to quickly get an overview of the evolution of a topic along time, thus saving them a lot of time and effort.

Go to Arquivo.pt and try to create a narrative about a theme of your choice.

Tell us about your experience so we can improve the service!

Presentations in the IIPC Web Archiving Conference and RESAW 2021

Thumbnail IIPC WAC 2021

Last updated on August 17th, 2021 at 07:35 pm

During the week of 14 to 18 June, three international meetings were held by videoconference with the participation of the Arquivo.pt:

    • International Internet Preservation Consortium (IIPC) – General Assembly – general assembly of the consortium that gathers the Web archiving initiatives around the world
    • Web Archiving Conference 2021 – the most important meeting in the field of Web preservation, where experts share new knowledge and experiences
    • RESAW Conference – meeting of the European RESAW network (Research Infrastructure for the Study of Archived Web Materials) this year in its 4th edition, mainly addressed to the community of researchers from non-technological scientific areas, such as Social Sciences, Arts and Humanities.

Contributions of Arquivo.pt to the international community

Arquivo.pt presented some results of the work developed in the last year, with emphasis on the functionalities that improve the reproduction of the archived contents, such as the “Complete the page”.
Two historical collections were integrated on the Arquivo.pt: the Geocities and the Internet Memory Foundation. Arquivo.pt did special collections about the 2019 European Elections and Covid-19.
The contents of Arquivo.pt are accessible to any researcher regardless of the country they are in and therefore it is a useful service to the international community.

Presentations

  • Arquivo.pt updates 2021: presentation at the IIPC – General Assembly, by Daniel Gomes (Vídeo)
  • Complete the page. 1 minute drop in (presentation at the IIPC – General Assembly “complete the page”), by Daniel Gomes (Slide)
  • A transnational and cross-lingual crawl of the European Parliamentary Elections 2019, by Ivo Branco (Slides, Vídeo)
  • Enhancing access to research the Geocities historical collection, by Pedro Gomes (Slides, Vídeo)
Complete the page - demo
Complete the page – demo. Slide used in the IIPC 1 minute presentation, at the IIPC General Assembly 2021

New “query suggestions” on Arquivo.pt!

Thumbnial did-you-mean

Arquivo.pt launched a new version, named Caronte, on January 19, 2021.

In this version we improved the query suggestions feature (Did you mean).

Whenever a user enters a query containing a potential error, Arquivo.pt presents a suggestion for an alternative query.

For example, when searching for “now york” you get the suggestion “Did you mean: new york”.

did-you-mean-arquivo-pt

Figure 1: Example of the query suggest feature by searching the term “now york”

The opening of applications to the Arquivo.pt Award 2021 until the 4th May became also emphasized on the home page.

Help us to improve!

To help us, just search the Arquivo.pt using any device (e.g. laptop, mobile phone, tablet).

If you encounter any problem, please contact us!

Remember to always send us the address of the page where you detected the problem.

To know more

Arquivo.pt preserves websites of national scientific projects

thumbnail_memoriafct

Last updated on October 1st, 2021 at 09:11 am

Preserving scientific project websites is important

The contents of the websites tend to disappear when the scientific projects are finished.

The preservation of scientific project websites is important because:

  • documents the development of projects;
  • ensures access to unique technical and scientific content that researchers have posted on the project websites (eg presentations, photographs, data sets);
  • reinforces the visibility of the results of projects financed by FCT.

Experimental collection of scientific projects websites in 2016

Arquivo.pt automatically collected websites for projects financed by FCT in 2016.

The information about these websites was dispersed as it was not recorded during the administrative process.

For about 20 years, FCT financed scientific projects, so the number of sites could be too high to be identified manually.

Then an automatic methodology for identifying these websites was developed, developed by Arquivo.pt.

The FCT database had a total of 11,996 project entries but did not include references to web addresses. Applying the automatic methodology, 7 956 URLs related to the funded scientific projects were identified.

The collection of content referenced by these addresses resulted in the preservation of 600 721 files (72 GB), including content such as research group web pages, researchers’ personal pages or project-related blogs.

Online references in scientific project reports have been preserved since 2020

From June 2020, the website addresses of the projects financed by FCT must be registered in the progress and final reports funded by FCT.

Arquivo.pt started using these addresses to preserve the contents of websites of national scientific projects in a systematic way.

1st official collection of scientific project websites

In June 2020, Arquivo.pt obtained 263 addresses related to 100 scientific projects from the reports submitted to FCT. Most of the addresses (67%) did not have any version previously preserved in Arquivo.pt.

The addresses obtained point to online resources such as the websites of the projects, R&D units, news in the media, articles in scientific journals or repositories, databases, videos on Youtube or Facebook pages.

In July 2020, a special collection was launched from this set of addresses which resulted in 6.9 GB of information obtained from the visit to 31,606 URLs.

Exhibition about Research & Development projects

The Scientific Research Memory is an online exhibition dedicated to the websites of scientific projects funded by the Foundation for Science and Technology (FCT) that Arquivo.pt has preserved.

There are also websites of the Research & Development Units financed by FCT.

Memorial do Arquivo.pt preserves scientific websites for free

The Memorial do Arquivo.pt service has preserved historic FCT websites that have been disabled. These were created for events or initiatives that have ended and therefore their contents are no longer updated.

To include a website in the Memorial, Arquivo.pt starts by making a high quality collection of its contents.

Then, the collected contents are validated in collaboration with those responsible for the website.

Finally, the website address is redirected to the contents that have been preserved by Arquivo.pt.

For example, if someone wants to access any page on the Scientific Archives Meeting held in 2014, they will be redirected to Arquivo.pt.

Thus, the contents remain accessible over time and the links, the references in scientific communications that may exist do not break.

The digital preservation service Memorial do Arquivo.pt is free of charge for websites of the academic and scientific community, just send a request to contacto@arquivo.pt.

To know more

Online archives or archives of the online?

thumbnail_tendencias

At the end of 2020, we recommend some texts that put the future in perspective.

We highlight the theme of preserving online content presented in the ebook “Tendências 2021” (Trends 2021). The contribution of Daniel Gomes, the Arquivo.pt manager, was entitled “Arquivos online ou do online?” (Online archives or archives of the online?).

I was invited to write about the challenges and threats to online archives. The first question that came to me was what is meant by an “online archive”?

My concern lies in the “archives of the online” because there is not even an established awareness about their need, whether at an academic, governmental or individual level.

It is technologically impossible to preserve all information available online. But it is absurd not to be aware that we have to preserve some of the information online for short, medium and long term access.

The complete text (in Portuguese) is available at pages 23 to 26 of the open-access book “Tendências 2021”.

The challenge is to cultivate awareness about the importance of preserving content online by learning how to do it in practice.

Happy New Year!

World Digital Preservation Day 2020

WDPD2020-English-Portrait-RGB

Last updated on November 23rd, 2020 at 06:20 pm

WDPD2020-English-Landscape-RGB

On November 5, World Digital Preservation Day, Arquivo.pt held an online session open to the community.

Registration form (free but required)

The speaker for this session was the winner of the Arquivo.pt 2020 Award, Miguel Ramalho, who presented his work. “Desarquivo” is a web aplication that searches for entities on Arquivo.pt and return a graph.

As in 2017, 2018 e 2019, we invited everyone to get to know Arquivo.pt, and to use it in research and in the preservation of memory.

World Digital Preservation Day is promoted by the Digital Preservation Coalitium (UK) and an occasion for initiatives around the world, shared on social networks with the WDPD2020 hashtag.

Agenda:

November 5th

3:00 pm – Welcome! Presentation of the Arquivo.pt team (slides, 1 MB, PDF)
3:05 pm – Archive News – Daniel Gomes (slides, 2.6 MB, PDF)
3:15 pm – Desarquivo, 1st place in the Arquivo.pt Awards 2020, by Miguel Ramalho (slides, 3 MB, PDF)
3:45 pm – Questions
4:00 pm – Conclusion

Session video

Satisfaction query

Collection about Covid-19 in Portugal

Thumbnail Covid-19 colletcion in Portugal

Last updated on June 18th, 2021 at 08:26 am

Banner Covid-19 colletcion in Portugal

Suggest web pages about Covid-19

Arquivo.pt invites everyone to suggest web pages that document the Covid-19 pandemic to be preserved for future access. Help us to keep a complete memory of the Portuguese live during this period.

Suggest pages using this form: https://tinyurl.com/arquivopt-covid19

Thousands of web pages to tell the story of the pandemic in Portugal

Arquivo.pt has been carrying out special collections of web pages related to the Covid-19 pandemic since March 2020.

“Future academics, scientists and journalists who are studying the Portuguese response to the Covid-19 pandemic will want to read first-hand testimonies of those affected, official records of the number of victims, and recommendations from doctors, politicians and scientists at the time” , Público newspaper, May 1, 2020 edition.

Daily, content was collected from a set of 106 sites on the theme of Covid-19. This set includes, for example, websites for the media, government, associations and university initiatives.

In another set are Twitter pages (108 identified in May), Youtube videos (815 identified in May) and also pages from Reddit and Git Hub.

Suggestions from the community were included. For example, Archivists from Sines (Portugal) collected local news related to Covid-19 (9 GB). The Revisionista.pt project also contributed and identified pages from newspapers. People sent suggestions through the public form.

Collaboration with IIPC for international collection

In February 2020, the International Internet Preservation Consortium (IIPC), the main organization on Web preservation, proposed to its members a collection about the Novel Coronavirus (Covid-19) outbreak.

Arquivo.pt contributed with 1 237 seeds, mainly in Portuguese. With successive contributions from other countries, the IIPC collection reached over 7 000 pages in July 2020.

A form is also available for anyone to suggest content for this international collection.

The IIPC collection “Novel Coronavirus (COVID-19)” is accessible via the Internet Archive Archive-it.

Arquivo.pt carried out 3 collections of the international collection compiled by the IIPC, the 1st on March 23 the 2nd on June 15 and the 3rd on late August, thus gathering international content useful for worldwide researchers.

Methodology for the selection of pages for the Covid-19 collection

We started by identifying terms related to the Coronavirus theme that included health, economic, political, geographic or organizational aspects.

Then, the Bing Azure service was used to automatically obtain, through a script, the following information for the first 10 results for each term: the page address, the title and the position in the results list.

Considering the list of results, it was decided which software would be used and which settings would be the best to collect the pages.

For example, in the case of a newspaper section dedicated to Covid-19, it was necessary to decide whether to record just one page or whether it makes sense to collect the entire site exhaustively.

Various types of software were used to collect the pages. For daily collections from 106 sites Heritrix was used. For capturing 108 Twitter accounts, Brozzler was chosen and for videos, manual capture using Webrecorder and Browsertrix.

Know more

Meet the winners of the Arquivo.pt Award 2020!

Card Meet the winners of the Arquivo.pt Award 2020

Last updated on February 18th, 2022 at 12:33 pm

The winners of the Arquivo.pt 2020 Award were announced by the Público newspaper, the official media partner of this year’s edition, which granted an honorable mention to the best work based on the contents of the newspaper. 29 candidate works were received.

The award ceremony toke place during Science 2020 – Meeting with Science and Technology, November 4, at the Lisbon Congress Center.

1st place – “Desarquivo”

The winner of the 10,000 euros prize was the work “ Desarquivo ” developed by Miguel Ramalho.

“Desarquivo” is a website that enables searching for named entities (e.g. people, organizations and places) and identify relationships among them, based on news published in online newspapers along time.

The search results are presented in the form of a graph or network of relationships that enables a journalist, researcher or any common citizen to dynamically explore the relationships among historical information preserved from the Web by Arquivo.pt.

For example, a user can explore ideological proximity among political parties along time.

2nd place – “Arquivo.pt Extension”

The 2nd prize in the amount of 3,000 euros was awarded to the work “ Extension Arquivo.pt ”,  a browser extension developed by Rodrigo Marques and Hugo Silva.

This extension enables users to perform advanced searches on Arquivo.pt directly from the browser , without having to leave the page they are currently viewing.

The “Arquivo.pt Extension” is available for download in the Chrome Web Store.

3rd place – “Arquivo Económico .pt”

The 3rd place winner received a prize of 2,000 euros and was awarded to the work “Arquivo Económico .pt” by Nuno Bragança.

The “Arquivo Económico .pt” organizes and presents information preserved by Arquivo.pt about the prices of products since the time of the Portuguese coin escudo.

As a result, we have a website that enables searching the price of consumer goods by different categories, such as supermarket, transportation or others, on given dates.

For example, users can easily know how much a trip from Lisbon-Porto or a cell phone call costed in 1999.

Honorable Mention granted by Público newspaper

Jornal Público, official partner of the 3rd edition of the Arquivo.pt Prize, awarded its Honorable Mention to the work “Jornal do Passado”, developed by Bruno Galhardo.

“Jornal do Passado” is a game for all ages, developed for Android, in which the users test their knowledge about news or events by guessing the date in which they occurred.

As a result, we have an app that enables searching the historical information preserved by Arquivo.pt in a pedagogical and fun way.

Image gallery

Entrega de prémios na sessão de encerramento do Encontro Ciência 2020
20201104-EncontroCiencia-0140
Entrega de prémios na sessão de encerramento do Encontro Ciência 2020 no grande auditório do Centro de Congressos de Lisboa
Entrega de prémios na sessão de encerramento do Encontro Ciência 2020
Entrega de prémios na sessão de encerramento do Encontro Ciência 2020
Entrega de prémios na sessão de encerramento do Encontro Ciência 2020
Entrega de prémios na sessão de encerramento do Encontro Ciência 2020
Entrega de prémios na sessão de encerramento do Encontro Ciência 2020
Entrega de prémios na sessão de encerramento do Encontro Ciência 2020
Entrega de prémios na sessão de encerramento do Encontro Ciência 2020
Entrega de prémios na sessão de encerramento do Encontro Ciência 2020
Entrega de prémios na sessão de encerramento do Encontro Ciência 2020
Entrega de prémios na sessão de encerramento do Encontro Ciência 2020
Entrega de prémios na sessão de encerramento do Encontro Ciência 2020
Entrega de prémios na sessão de encerramento do Encontro Ciência 2020
Entrega de prémios na sessão de encerramento do Encontro Ciência 2020 20201104-EncontroCiencia-0140 Entrega de prémios na sessão de encerramento do Encontro Ciência 2020 no grande auditório do Centro de Congressos de Lisboa Entrega de prémios na sessão de encerramento do Encontro Ciência 2020 Entrega de prémios na sessão de encerramento do Encontro Ciência 2020 Entrega de prémios na sessão de encerramento do Encontro Ciência 2020 Entrega de prémios na sessão de encerramento do Encontro Ciência 2020 Entrega de prémios na sessão de encerramento do Encontro Ciência 2020 Entrega de prémios na sessão de encerramento do Encontro Ciência 2020 Entrega de prémios na sessão de encerramento do Encontro Ciência 2020 Entrega de prémios na sessão de encerramento do Encontro Ciência 2020 Entrega de prémios na sessão de encerramento do Encontro Ciência 2020 Entrega de prémios na sessão de encerramento do Encontro Ciência 2020 Entrega de prémios na sessão de encerramento do Encontro Ciência 2020 Entrega de prémios na sessão de encerramento do Encontro Ciência 2020