Afghanistan Websites and the fall of the regime in August 2021

thumbnail_Karima Faryabi

Last updated on September 26th, 2022 at 03:57 pm

afghan-ministry-of-economy-17-08-2021

Afghanistan Ministry of Economy website with Karima Faryabi (recorded August 17, 2021)

On August 15, 2021 the presidential palace in Kabul was taken over by the Taliban, consummating the fall of the regime that had been in place for 20 years, following the 9/11 attacks on the United States.

The community of Web archivists, through the Content Development Working Group – International Internet Preservation Consortium, was challenged to record the Afghan sites, given the risk that they would disappear with the new regime.

No time to lose when it comes to preserving the Web

Arquivo.pt reacted quickly, launching an automatic content search focused on .af domain sites and on international media news about the ongoing events.

On August 17, the websites began to be recorded.

1800 website addresses from Afghanistan (ending in .af) and 500 media news stories from around the world were used.

The addresses, URLs or “seeds” were obtained through automated search using the Bing Search API and immediately put into recording.

Content available to know Afghanistan’s history

As a result of the collection carried out, more than 400 Gigabytes of information became available at Arquivo.pt, which anyone can use for research in the most diverse areas.

The main contribution of Arquivo.pt to the community of Web archivists was the use of the automatic search that allows a quick reaction in the recording of Web contents in imminent risk of being lost.

Know more

Arquivo.pt open data set (Dados.gov)

Content collected by the Content Development Working Group of the International Internet Preservation Consortium available at the Archive-it service

Portuguese municipal elections 2021 preserved by Arquivo.pt

thumbnail_eleicoes_autarquicas

Last updated on July 15th, 2022 at 11:42 am

Thousands of pages about the elections to preserve before they disappear

On 26 September 2021 the local elections were held in Portugal, an event marked by the Covid-19 pandemic. The communication of the candidates was mainly based on the media and publications through the Web.

Electoral websites are of manifest historical importance. However, they are difficult to identify because they appear and disappear quickly. In the case of municipal elections, the number of candidates and the variety of channels used makes the task even more challenging.

Arquivo.pt, as in previous elections, launched a special collection to preserve contents concerning the municipal elections.

How was the electoral content published on the Web identified

The first step was the manual identification of election-related content by municipality and parish. For this purpose help was requested from people and organisations with the following initiatives:

  • collaborative list “Municipal Elections 2021: we need your help!
  • request for collaboration from the archive services of the 308 municipalities in the identification of electoral sites and candidates of the respective municipality;
  • request to the Parties to send the names of their lead candidates.

The Eyedata – Social Data Lab site was used, which made the names of candidates from all over the country available on the Web.  The Wikipedia page Eleições autárquicas portuguesas de 2021 was also used as a source of information.

The list with names of candidates by county, party or coalition was used to create automatic searches in Bing that identified the most relevant electoral contents.

For instance, by combining the term “autárquicas 2021” with the name of a candidate and the respective municipality, one obtains results related to that candidate, such as news, initiatives of his/her campaign or the official page of his/her electoral campaign.

This methodology was applied in the Presidential Elections 2021 and in the Europeia Elections 2019. The technical report A transnational crawl of the European Parliamentary Elections 2019 details the applied methodology.

Content collection and availability in Arquivo.pt

Between 22nd August and 8th October 2021, the Arquivo.pt gathered, in an exhaustive manner, pages related to the Local Government Elections 2021.

The resulting collection called Municipal Elections 2021″ (EAWP39) gathers 31 million files that total 2.7 TeraBytes of information and will be available one year later.

Researchers who want to make a study on the 2021 Local Elections and need early access to the collected contents can contact Arquivo.pt.

To know more

Memory of events and festivals of art: PARA SEMPRE

Thumbnail-projeto-para-sempre

Last updated on February 8th, 2022 at 10:57 am

The exhibition Memória de festivais e eventos de arte proposes a look at the Portuguese art scene present on the Web and includes a chronology of these events.

This online information product is a presentation of the results in a systematic and structured way of the PARA SEMPRE project.

cartao-expo-memoria-festivais-e-eventos-de-arte

Online exhibition – arteparasempre.wordpress.com

The project’s second online product will be a directory of references of artists, galleries and projects in the area of contemporary Portuguese art to be made available during 2022, at the Gulbenkian Art Library webpage.

Cycle of Webinars “Art forever on the web”

A cycle of Webinars entitled “Art forever on the web” was held, between April and July 2021, oriented to artists, curators, gallerists and event producers, among others.

The average number of participants was 58 per session, who evaluated their satisfaction, on a scale from 1 to 5, with an average score of 4.6. The three sessions aimed at disseminating knowledge about digital preservation of information on the web and requirements for publishing preservable information.

Identification of artists, galleries and projects

The first step was to identify relevant artists, galleries and projects in the contemporary Portuguese art scene. We started from an initial set of 63 agents (artists, galleries and projects), to which 573 artists belonging to the Modern Collection of the Calouste Gulbenkian Foundation and the BAA – FCG Collection of Artist Books and Independent Publishing were added.

Throughout these months, 636 elements were thus identified (social networks and websites active in 2020), which were subsequently analysed.

The conclusions of the analysis carried out within the project were presented in the last webinar, held on July 1, 2021 :

Special feature on art websites and blogs

In April 2021, Arquivo.pt made a special collection based on the initial identification of artists, galleries and projects and obtained 2.8 terabytes of preserved information.

New contents about art websites were recorded, using tools that allow higher quality collections, such as Brozzler and Webrecorder.

A collaborative project of digital curation

“PARA SEMPRE” (forever) is a digital curatorial project applied to the information made available on the web by the several agents of the contemporary Portuguese art scene (artists, galleries and hybrid sites).

Its main purpose is to contribute to the preservation/reuse of past and future pages, to ensure the preservation of the digital memory of current Portuguese art available at Arquivo.pt, and to promote knowledge on this theme by presenting it in a systematized and structured way.

Its creation results from the encounter of the missions of two organizations: one that aims to ensure the preservation of the Portuguese web, Arquivo.pt, and another that assumes itself as an agent in the development of knowledge about contemporary Portuguese art, the Calouste Gulbenkian Foundation Art Library. This is part of the ROSSIO (Research Infrastructure in the Social Sciences, Arts and Humanities).

Arquivo.pt certified as an open data provider

selo-dados-gov

Last updated on August 17th, 2022 at 08:39 am

Arquivo.pt has been collaborating with Agência Modernização Administrativa (AMA) with the aim of improving the preservation of Public Administration websites.

Collaboration is based on three action points:

AMA is the public organisation responsible for promoting digital means in Public Administration and aims to modernise and simplify citizens’ access to State services.

Arquivo.pt is a service operated by the Fundação para a Ciência e Tecnologia I.P. that preserves data published on the Web between 1996 and the present day, making them accessible to any citizen for memory and research purposes.

EU open data directive includes documents on websites

The Directive (EU) 2019/1024 of the European Parliament and of the Council of 20 June 2019 on open data and the re-use of public sector information stipulates the following:

“(30) This Directive lays down the definition of the term ‘document’ and that definition should include any part of a document. The term ‘document’ should cover any representation of acts, facts or information — and any compilation of such acts, facts or information — whatever its medium (paper, or electronic form or as a sound, visual or audiovisual recording.

(34) To facilitate re-use, public sector bodies should, where possible and appropriate, make documents, including those published on websites, available through an open and machine-readable format and together with their metadata, at the best level of precision and granularity, in a format that ensures interoperability

(35) A document should be considered to be in a machine-readable format if it is in a file format that is structured in such a way that software applications can easily identify, recognise and extract specific data from it. Data encoded in files that are structured in a machine-readable format should be considered to be machine-readable data. A machine-readable format can be open or proprietary. They can be formal standards or not.

(60) The Commission should facilitate the cooperation among Member States and support the design, testing, implementation and deployment of interoperable electronic interfaces that enable more efficient and secure public services.

Arquivo.pt is a public service that has the mission of preserving documents published on Internet sites to enable their long-term open access and provides interoperable electronic interfaces (APIs) for their automatic processing.

The Portuguese Law No. 68/2021 of 2021-08-26 approves the general principles on open data and transposes the European Directive.

Arquivo.pt was certified as a Public Administration open data provider

The AMA recognized Arquivo.pt as a public service and open data provider and awarded its certification seal on the Open Data Portal.

Arquivo.pt collects general information published on the Web of interest to the Portuguese community. However, it is also responsible for the preservation of Public Administration websites, such as the Portal do Governo, in collaboration with the Management Center for the Government Electronic Network (CEGER).

Any citizen can access the open data resulting from these historical archives and, for example, search for official information published on the websites of successive governments.

In 2021, Arquivo.pt provided open access to over 10 billion files (721 TB) from 27 million websites. The open data preserved by Arquivo.pt can be explored through the search interface, automatically through API (https://arquivo.pt/api) or by reusing derived datasets.

Derived datasets available on the Open Data Portal

Besides the original web artefacts preserved at Arquivo.pt, this service has generated open datasets derived from its activities, which are now available in open access so that they can be reused:

Resources list

Video presentation at the IIPC Web Archiving Conference 2022

Special collection of Portuguese Presidential Elections

thumbnail_presidential_elections
banner_presidenciais_v
Form to suggest a web page, a web site or other web content

Arquivo.pt invites all citizens to suggest web pages related to the 2021 Presidential Elections to be preserved for the future.

The Presidential Elections will take place in Portugal on January 24, 2021.

Your suggestions are important so that Arquivo.pt can keep a more complete memory of this important electoral event.

To suggest web pages use this form (https://tinyurl.com/presidenciais-sugerir)

Cross-lingual collection about the 2019 European Elections is available

print_europeanelections_q

Last updated on August 30th, 2022 at 10:46 am

Print European Elections 2019
Print from an archived page on Arquivo.pt: https://www.european-elections.eu

The special collection of web pages about the 2019 European Elections is available for search at Arquivo.pt.

To compile this collection, pages written in 24 European languages ​​were identified through automatic searches on the Bing search engine and suggestions from 17 European countries.

We emphasize the collaboration of the Publications Office of the European Union, which reviewed the list of search terms in the different languages ​​of the European Union.

Between May and July 2019, Arquivo.pt exhaustively collected pages related to the European Elections in several countries.

The resulting collection named “European Elections 2019” comprises 99 million web files that sum 4.8 Terabytes of information.

The technical report “A transnational crawl of the European Parliamentary Elections 2019 ” details the applied methodology. This methodology has been applied to generate other thematic collections such as about Covid-19.

We invited all citizens, especially the researchers, to try this service especially created to search the 2019 European Elections cross-lingual and international collection: https://arquivo.pt/ee2019

Video “A transnational and cross-lingual crawl of the European Parliamentary Elections 2019”

A transnational and cross-lingual crawl of the European Parliamentary Elections 2019, Ivo Branco, IIPC Web Archiving Conference and RESAW 2021 (slides)

To know more:

We preserved the Portuguese Local Elections of 2017

Last updated on August 30th, 2022 at 10:52 am

Arquivo.pt performed 2 web crawls of information related with the Portuguese Local Elections of 2017.

We appealed the community to contribute with suggestions of relevant Web pages so that we could preserve them.

The 2 crawls occurred during and after the campaign period, using the list of 410 Web pages suggested by the community and 13 887 web pages found automatically using search engines.

The result was an archive of 2 265 887 Web resources (360 GB).

Among the preserved web pages are the official sites of the candidates, news, blogs and articles with personal opinions about the elections.

The Arquivo.pt respects an embargo period of 1 year, and for that reason this collection will only be available by the end of 2018.

Meanwhile, you can consult the preserved pages about the previous elections of 2013, such as:

We would like to thank all the volunteers that collaborated with this initiative.