Last updated on September 7th, 2020 at 10:35 am
Suggest web pages about Covid-19
Arquivo.pt invites everyone to suggest web pages that document the Covid-19 pandemic to be preserved for future access.
Help us to keep a complete memory of the Portuguese live during this period.
Suggest pages using this form: https://tinyurl.com/arquivopt-covid19
Thousands of web pages to tell the story of the pandemic in Portugal
Arquivo.pt has been carrying out special collections of web pages related to the Covid-19 pandemic since March 2020.
“Future academics, scientists and journalists who are studying the Portuguese response to the Covid-19 pandemic will want to read first-hand testimonies of those affected, official records of the number of victims, and recommendations from doctors, politicians and scientists at the time” , Público newspaper, May 1, 2020 edition.
Daily, content was collected from a set of 106 sites on the theme of Covid-19. This set includes, for example, websites for the media, government, associations and university initiatives.
In another set are Twitter pages (108 identified in May), Youtube videos (815 identified in May) and also pages from Reddit and Git Hub.
Suggestions from the community were included. For example, Archivists from Sines (Portugal) collected local news related to Covid-19 (9 GB). The “Revisionista.pt” project also contributed and identified pages from newspapers. People sent suggestions through the public form.
Collaboration with IIPC for international collection
In February 2020, the International Internet Preservation Consortium (IIPC), the main organization on Web preservation, proposed to its members a collection about the Novel Coronavirus (Covid-19) outbreak.
Arquivo.pt contributed with 1237 seeds, mainly in Portuguese. With successive contributions from other countries, the the IIPC collection reached over 7000 pages in July 2020.
A form is also available for anyone to suggest content for this international collection.
Arquivo.pt carried out 2 collections of the international collection compiled by the IIPC, the first on March 23 and the second on 15 June, thus bringing together international content useful for national and worldwide researchers.
Methodology for the selection of pages for the Covid-19 collection
We started by identifying terms related to the Coronavirus theme that included health, economic, political, geographic or organizational aspects.
Then, the Bing Azure service was used to automatically obtain, through a script, the following information for the first 10 results for each term: the page address, the title and the position in the results list.
Considering the list of results, it was decided which software would be used and which settings would be the best to collect the pages.
For example, in the case of a newspaper section dedicated to Covid-19, it is necessary to decide whether to record just one page or whether it makes sense to collect the entire site exhaustively.
Various types of software were used to collect the pages. For example, for daily collections from 106 sites, Heritrix was used. For capturing 108 Twitter pages, Brozzler was chosen and for videos, manual capture using Webrecorder and Browsertrix.
To know more
Coronavirus (Covid-19) Search Terms
Results obtained through the Bing Azure search service
Contribution of Arquivo.pt to the 1st international collection (1,235 addresses) – 20 February
Contribution of Arquivo.pt for the 2nd international collection (75 addresses) – 16 March
All addresses of the international collection collected by Arquivo.pt – June 15th
List of websites about Covid-19 in Portugal for daily collection with Heritrix
List of Twitter pages for the national collection with Brozzler
List of Youtube videos for the national collection with Webrecorder
Crawl logs of the Arquivo.pt Covid-19 collection (10 GB)