At the end of 2020, we recommend some texts that put the future in perspective.
We highlight the theme of preserving online content presented in the ebook “Tendências 2021” (Trends 2021). The contribution of Daniel Gomes, the Arquivo.pt manager, was entitled “Arquivos online ou do online?” (Online archives or archives of the online?).
I was invited to write about the challenges and threats to online archives. The first question that came to me was what is meant by an “online archive”?
My concern lies in the “archives of the online” because there is not even an established awareness about their need, whether at an academic, governmental or individual level.
It is technologically impossible to preserve all information available online. But it is absurd not to be aware that we have to preserve some of the information online for short, medium and long term access.
Welcome to the second season of the Online Cafe with Arquivo.pt
Talk directly to the Arquivo.pt team and get answers to all your questions! The Arquivo.pt launched a new cycle of team chats with you through online sessions. Brief introductory presentations will be given, leaving time to ask all your questions about how to get more out of Arquivo.pt or how to apply to the Arquivo.pt Awards.
21st session – Bilions of images to search on Arquivo.pt – all about the Arquivo.pt API
In March 2021 Arquivo.pt launched a new version with 1800 million images available. The search for images in web archives at this scale is unique in the world and innovative. The process used for indexing was explained in detail, in this session, as well as the best way to take advantage of these resources using the API to create new works based on images.
André Mourão, Ph.D. in Computer Science, is working on the indexing of the information, specially images from the Internet of the past to the present at Arquivo.pt (Portuguese Web Archive). He is a researcher working on ways to search and interpret multimedia data (e.g., images, text, video) effectively at large scales. He is also the co-creator of Revisionista.PT, uncovering post-publication edits in Portuguese news articles (with Flávio Martins) and an Associated Member at NOVA LINCS research center.
20th session – March 26 – The Online Centenary of the Great War
Daniela Major, invited speaker of the 20th Café with Arquivo.pt, presented a use case about historical research and old websites. The commemoration of the Centenary of the 1st World War generated several strategies for international cooperation in the 21st century and such diversity is present in the information published on commemorative websites. In this session, Daniela Major will show how she used Web archives to start a study in the context of Contemporary History, as well as the methodological implications in her work.
Daniela Major is a PhD student in Digital Humanities at the School of Advanced Study, University of London. In 2019, within the scope of the ROSSIO Infrastructure, he started at Arquivo.pt a study on the celebrations of World War I based on the preserved contents of the Web. Currently, his PhD focuses on the media impact of the idea of Europe over the past few years. 15 years, in an effort to combine intellectual history with digital humanities.
This event, Cafe with Arquivo.pt, toke place for the first time 1 year ago, on the 27th march 2020. Cheers!
Special Session – March 2 – Arquivo.pt Award story and questions
Resume: The Arquivo.pt Award was created in 2018 in order to promote the use of Arquivo.pt for innotive works. How many candidates had applied since till now? What areas did the work focus on? What is the balance between Studies and Applications? Who were the winners and what were their main contributions? These and other questions were the guidelines for this session dedicated Award 2021.
This session was presented by Daniel Gomes and the Arquivo.pt team
This session was dedicated to the websites of Portuguese newspapers. Diogo Silva da Cunha talked about his first contact with Arquivo.pt and his approach, concretized in an “research route” as a way of delimiting the scope of his analysis. He also presented the results of his research about Correio da Manhã, Diário de Notícias, Expresso and Público newspapers.
17th session – january 15 – How to do an exhibition of old web pages without being an IT expert (tutorial)
This session present by Ricardo Basílio, the Arquivo.pt digital curator, is focused in practical aspects, when preparing an exhibition of old pages. As example: the use of long links, tipical of web archives, graphic aspects to be taken into account and navigation routes between webpages. The WordPress.com is used as platform to show how easy is build a web exhibition. The core aspects of dissimination content from Web archives have application in other platforms.
16th session – december 11 – Arquivo Económico .pt
Arquivo Económico .PT, authored by Nuno Bragança, 3rd place in Arquivo.pt Awards 2020, is a WebApp that allows discovering prices on web pages along time, over a set products in frequent use and compare them with current prices. Data are obtainned automatically from Arquivo.pt, processed and presented in an intuitive way for the common user.The possibility of comparing the present with the past based on information from the archived web shows how it can be useful not only to satisfy curiosity but also to support studies in many areas.
In this session we met the winners of 2nd place the Arquivo.pt Awards 2020. Rodrigo Marques and Hugo Silva talked about the their work “Arquivo.pt Extension” wich is a browser extension that allows users to search on Arquivo.pt. They showed through practical examples how the extension save time and helps the acess to the Arquivo.pt.
The speaker for this session was the winner of the Arquivo.pt 2020 Award, Miguel Ramalho, who presented his work. “Desarquivo” is a web aplication that searches for entities on Arquivo.pt and return a graph.
As in 2017, 2018 e 2019, we invited everyone to get to know Arquivo.pt, and to use it in research and in the preservation of memory.
Geocities.com was the first major “social network” which enabled anyone to create their website and publish information on the Web. It was created in 1994, acquired by Yahoo in 1999 and shut down in 2009.
By making the historical collection of Geocities available, Arquivo.pt intends to contribute to the development of innovative studies in areas such as Arts, Humanities or Sociology (see a project summary).
Thousands of web pages to tell the story of the pandemic in Portugal
Arquivo.pt has been carrying out special collections of web pages related to the Covid-19 pandemic since March 2020.
“Future academics, scientists and journalists who are studying the Portuguese response to the Covid-19 pandemic will want to read first-hand testimonies of those affected, official records of the number of victims, and recommendations from doctors, politicians and scientists at the time” , Público newspaper, May 1, 2020 edition.
Daily, content was collected from a set of 106 sites on the theme of Covid-19. This set includes, for example, websites for the media, government, associations and university initiatives.
In another set are Twitter pages (108 identified in May), Youtube videos (815 identified in May) and also pages from Reddit and Git Hub.
Suggestions from the community were included. For example, Archivists from Sines (Portugal) collected local news related to Covid-19 (9 GB). The Revisionista.pt project also contributed and identified pages from newspapers. People sent suggestions through the public form.
Collaboration with IIPC for international collection
Arquivo.pt carried out 3 collections of the international collection compiled by the IIPC, the 1st on March 23 the 2nd on June 15 and the 3rd on late August, thus gathering international content useful for worldwide researchers.
Methodology for the selection of pages for the Covid-19 collection
We started by identifying terms related to the Coronavirus theme that included health, economic, political, geographic or organizational aspects.
Then, the Bing Azure service was used to automatically obtain, through a script, the following information for the first 10 results for each term: the page address, the title and the position in the results list.
Considering the list of results, it was decided which software would be used and which settings would be the best to collect the pages.
For example, in the case of a newspaper section dedicated to Covid-19, it was necessary to decide whether to record just one page or whether it makes sense to collect the entire site exhaustively.
Various types of software were used to collect the pages. For daily collections from 106 sites Heritrix was used. For capturing 108 Twitter accounts, Brozzler was chosen and for videos, manual capture using Webrecorder and Browsertrix.
The winner of the 10,000 euros prize was the work “ Desarquivo ” developed by Miguel Ramalho.
“Desarquivo” is a website that enables searching for named entities (e.g. people, organizations and places) and identify relationships among them, based on news published in online newspapers along time.
The search results are presented in the form of a graph or network of relationships that enables a journalist, researcher or any common citizen to dynamically explore the relationships among historical information preserved from the Web by Arquivo.pt.
For example, a user can explore ideological proximity among political parties along time.
This external service is useful for research use cases, in areas such as Web design, Art, Communication or History,where it is necessary to access the original visual aspect of a page from the past in the most reliable way possible.