Last updated on November 16th, 2020 at 10:18 am
Suggest web pages about Covid-19
Arquivo.pt invites everyone to suggest web pages that document the Covid-19 pandemic to be preserved for future access. Help us to keep a complete memory of the Portuguese live during this period.
Suggest pages using this form: https://tinyurl.com/arquivopt-covid19
Thousands of web pages to tell the story of the pandemic in Portugal
Arquivo.pt has been carrying out special collections of web pages related to the Covid-19 pandemic since March 2020.
“Future academics, scientists and journalists who are studying the Portuguese response to the Covid-19 pandemic will want to read first-hand testimonies of those affected, official records of the number of victims, and recommendations from doctors, politicians and scientists at the time” , Público newspaper, May 1, 2020 edition.
Daily, content was collected from a set of 106 sites on the theme of Covid-19. This set includes, for example, websites for the media, government, associations and university initiatives.
Suggestions from the community were included. For example, Archivists from Sines (Portugal) collected local news related to Covid-19 (9 GB). The Revisionista.pt project also contributed and identified pages from newspapers. People sent suggestions through the public form.
Collaboration with IIPC for international collection
In February 2020, the International Internet Preservation Consortium (IIPC), the main organization on Web preservation, proposed to its members a collection about the Novel Coronavirus (Covid-19) outbreak.
Arquivo.pt contributed with 1 237 seeds, mainly in Portuguese. With successive contributions from other countries, the the IIPC collection reached over 7 000 pages in July 2020.
A form is also available for anyone to suggest content for this international collection.
Arquivo.pt carried out 3 collections of the international collection compiled by the IIPC, the first on March 23 the second on June 15 and the third on late august, thus gathering international content useful for worldwide researchers.
Methodology for the selection of pages for the Covid-19 collection
We started by identifying terms related to the Coronavirus theme that included health, economic, political, geographic or organizational aspects.
Then, the Bing Azure service was used to automatically obtain, through a script, the following information for the first 10 results for each term: the page address, the title and the position in the results list.
Considering the list of results, it was decided which software would be used and which settings would be the best to collect the pages.
For example, in the case of a newspaper section dedicated to Covid-19, it was necessary to decide whether to record just one page or whether it makes sense to collect the entire site exhaustively.
Various types of software were used to collect the pages. For daily collections from 106 sites Heritrix was used. For capturing 108 Twitter accounts, Brozzler was chosen and for videos, manual capture using Webrecorder and Browsertrix.
- Coronavirus (Covid-19) Search Terms
- Results obtained through the Bing Azure search service
- Contribution of Arquivo.pt to the 1st international collection (1,235 addresses) – 20 February
- Contribution of Arquivo.pt for the 2nd international collection (75 addresses) – 16 March
- All addresses of the international collection collected by Arquivo.pt – June 15th
- List of websites about Covid-19 in Portugal for daily collection with Heritrix
- List of Twitter pages for the national collection with Brozzler
- List of Youtube videos for the national collection with Webrecorder
- Crawl logs of the Arquivo.pt Covid-19 collection (10 GB)