Geocities.com was the first major “social network” which enabled anyone to create their website and publish information on the Web. It was created in 1994, acquired by Yahoo in 1999 and shut down in 2009.
By making the historical collection of Geocities available, Arquivo.pt intends to contribute to the development of innovative studies in areas such as Arts, Humanities or Sociology (see a project summary).
Thousands of web pages to tell the story of the pandemic in Portugal
Arquivo.pt has been carrying out special collections of web pages related to the Covid-19 pandemic since March 2020.
“Future academics, scientists and journalists who are studying the Portuguese response to the Covid-19 pandemic will want to read first-hand testimonies of those affected, official records of the number of victims, and recommendations from doctors, politicians and scientists at the time” , Público newspaper, May 1, 2020 edition.
Daily, content was collected from a set of 106 sites on the theme of Covid-19. This set includes, for example, websites for the media, government, associations and university initiatives.
In another set are Twitter pages (108 identified in May), Youtube videos (815 identified in May) and also pages from Reddit and Git Hub.
Suggestions from the community were included. For example, Archivists from Sines (Portugal) collected local news related to Covid-19 (9 GB). The Revisionista.pt project also contributed and identified pages from newspapers. People sent suggestions through the public form.
Collaboration with IIPC for international collection
Arquivo.pt carried out 3 collections of the international collection compiled by the IIPC, the first on March 23 the second on June 15 and the third on late august, thus gathering international content useful for worldwide researchers.
Methodology for the selection of pages for the Covid-19 collection
We started by identifying terms related to the Coronavirus theme that included health, economic, political, geographic or organizational aspects.
Then, the Bing Azure service was used to automatically obtain, through a script, the following information for the first 10 results for each term: the page address, the title and the position in the results list.
Considering the list of results, it was decided which software would be used and which settings would be the best to collect the pages.
For example, in the case of a newspaper section dedicated to Covid-19, it was necessary to decide whether to record just one page or whether it makes sense to collect the entire site exhaustively.
Various types of software were used to collect the pages. For daily collections from 106 sites Heritrix was used. For capturing 108 Twitter accounts, Brozzler was chosen and for videos, manual capture using Webrecorder and Browsertrix.
The winner of the 10,000 euros prize was the work “ Desarquivo ” developed by Miguel Ramalho.
“Desarquivo” is a website that enables searching for named entities (e.g. people, organizations and places) and identify relationships among them, based on news published in online newspapers along time.
The search results are presented in the form of a graph or network of relationships that enables a journalist, researcher or any common citizen to dynamically explore the relationships among historical information preserved from the Web by Arquivo.pt.
For example, a user can explore ideological proximity among political parties along time.
Talk directly to the Arquivo.pt team and get answers to all your questions!
The Arquivo.pt team chats with you through online sessions.
Brief introductory presentations will be given, leaving time to ask all your questions about how to get more out of Arquivo.pt or how to apply to the Arquivo.pt Awards.
Sessions held in the 1st season
1st session, 27 March – Website Preservation: Do It Yourself!
The 1st session (in Portuguese) was about Website Preservation: Do It Yourself! and counted with the participation of Ricardo Basílio (Digital Curator of Arquivo.pt) and Daniel Gomes (Manager of Arquivo.pt).
The App meuParlamento.pt, was the winner of Arquivo.pt Award 2019. Nuno Moniz presented the relevance of this app to the citizen participation on politics. Arian Pasquali and Tomás Amaro, also authors of this work were presents. The session continued with questions related to the development of works from Arquivo.pt.
3th session, April 17 – Arquivo.pt Award and News on Arquivo.pt
After Easter break Arquivo.pt Online Café was back, presented by Daniel Gomes. This session was dedicated to clarify doubts for those who are finalizing their work to compete for the Arquivo.pt Award. Finally, the new interface of Arquivo.pt has been presented.
6th session, May 8 – Arquivo.pt API – How to process data at large scale?
André Mourão, Engineer I&D explained Arquivo.pt APIs (Application Programming Interfaces) through examples and cases, in the session held on 8 April. One doesn’t need to be an IT expert to see the the potencial of the API when used on research or new tools.
7th session, May 15 – Website Preservation: Do It Yourself!
Ricardo Basílio, Arquivo.pt’s web curator, presented a tutorial dedicated to Webrecorder and Browsertrix. This tools are usefull to capture websites locally in a small scale. From a demonstration of how it works, Arquivo.pt want to encourage the community. Anyone can make a selection of pages or websites and preserve them in a standardized format.
8th session, May 22 – The history of video games on the Portuguese web
Miguel Costa, Web developer and passionate about Web, tecnologies and videogames talked about the main figures of national business of videogames and about the first Portuguese videogame. In Arquivo.pt he founded archived files of videogames and a lot of information.
9th session, May 29 – Straight Edge in the metropolitan area of Lisbon
In the 9th session of the Café, we have got to know Straight Edge and its presence in the punk/hardcore medium of the metropolitan area of Lisbon in the 90s more closely. Diogo Duarte, anthropologist and researcher at the Contemporary History Institute of Universidade Nova de Lisboa,talkedabout his work dedicated to the theme and about the importance of Arquivo.pt to study this movement and other expressions of popular culture.
1oth session, June 5 – Health and Internet: an evolution
Health and Internet was the topic of the 10th session of Arquivo.pt Café, presented by Rita Espanha, professor and researcher at the ISCTE (University Institute of Lisbon) and CIES (Centre for Research and Studies in Sociology). The Internet has become the privileged medium where citizens seek information and build their own know in all areas of your life, including health. State agencies in turn have developed services that use the Internet. From the outside, part of the population remains that has not followed this change. The other part of the population that has easy access to information does not always have the critical sense to evaluate information and use it to their advantage. All of these issues became more evident during the Covid-19 pandemic period.
“Tell me Stories”, “Conta-me Histórias” is a service that creates temporal narratives, based on the contents preserved by Arquivo.pt.This application was the winner of the Arquivo.pt Prize 2018. One of its authors, Ricardo Campos (IPT; INESC TEC), talked about the service developments. Arian Pasquali, member of the development team, also participated in the discussion.
Researchers on NLP (Natural Language Processing find in this session an excellent use case explained in detail by its author. Miguel Won, resercher at the INESC-ID (Lisbon), talked about the opinion sections of the media. How do commentators read events and how does this reflect their political position? Based on this question, he developed the Web application Arquivo de Opinion, awarded in 2018, which presents a history of the opinion columns of Portuguese newspapers, from the pages of Arquivo.pt. In this session we got to know the news of the project, which now also collects pages from social networks.
14th session, July 10 – Museum of Portuguese Web Design
Sandra Antunes, Professor at the School of Technology and Management of Viseu (ESTGV) spoke about virtual spaces for the memory of Portuguese Web design and showed the importance of a museum to fill gaps in the areas of preservation, exhibition and history of Portuguese Web design.
The exhibition of Arquivo.pt is being displayed at the library of the Faculty of Sciences of the University of Lisbon (FCUL) until April 30.
Eight posters with old web pages invite students, researchers and professors to use Arquivo.pt in their work and apply to the Arquivo.pt 2020 Award. There will be training at FCUL on March 12, 4h30 to 18h00 p.m., room 1.3.15.
This exhibition has been going through several Higher Education institutions, but in the case of FCUL it is a return to its origins.
Arquivo.pt was officially launched at the FCCN in November 2007, aimed to collecting and preserving Portuguese Web content and using specific technologies, similar to those of the Internet Archive.
Three researchers from FCUL were part of the core team. They have developed the Arquivo.pt service in the early years. In 2010, they presented a prototype of a search service, a Google for the past, innovative in the context of web archiving.
Currently, Arquivo.pt also has an image search and an API (Advanced Programming Interface). It maintains the perspective, followed by the first project at FCUL, which is based of creating useful services for the community.
Arquivo.pt Memorial is the most recent service, created for institutions that wants to keep old sites accessible, even after disconnecting them from their servers. As an example of this, you can visit the Minema project (finished years ago), and see how this service works.