Tutorial: how to explore Arquivo.pt using Python

Last updated on November 30th, 2022 at 03:10 pm

The Programming Historian aims to develop digital skills among the Humanities researchers through the publication of practical lessons in several languages.

The call Computational analysis skills for large-scale humanities data originated 7 new lessons.

One of them was the tutorial “Timeline summarization for large-scale past-web events with Python: the case of Arquivo.pt” developed by Daniel Gomes and Ricardo Campos.

It shows how to explore Arquivo.pt user interface and the Application Programming Interface (API) to execute advanced queries, process large amount of data or build new services, such as Tell me stories.

All the developed resources are freely available in open-access.

Open-access resources of the tutorial “Timeline summarization for large-scale past-web events with Python: the case of Arquivo.pt”

 

 

Cultural heritage on the Web: the online presence of museums

Last updated on July 7th, 2022 at 09:26 pm

The Portuguese Museums Network was the community invited to participate in the cycle of three webinars entitled “Cultural Heritage on the Web: online presence of museums”.

The aim is to raise awareness among museum managers and professionals about the importance of preserving content published on the Web and to make known the services and tools of Arquivo.pt.

This initiative is promoted by the Direção Geral do Património Cultural, through the Departamento de Museus, Conservação e Credenciação and Divisão de Museus e Credenciação, which welcomed and integrated in its training offer the proposal of Arquivo.pt (FCT, I.P.) .

Information and materials

June 21st, 2022 – The Arquivo.pt and the preservation of digital memory (1st webinar)

In this session Arquivo.pt is presented as a useful service to museums and institutions that the community can count on to preserve digital cultural heritage, specifically Web content.

  • Speaker: Ricardo Basílio, digital curator (in substitution of Daniel Gomes, manager of Arquivo.pt)
  • Duration: 15h30 -17h00
  • Slides (PDF)
  • Video

June 22, 2022 – Publishing Well to Preserve Well (2nd Webinar)

This session deals with the aspects that an institution must take into account to create and maintain preservable websites.

  • Speaker: Pedro Gomes, responsible for the Arquivo.pt collections
  • Duration: 15h30 -17h00
  • Slides
  • Vídeo

June 27, 2022 – Archiving the Web: DIY (3rd Webinar)

This session offers a tutorial for creating a local web archive, recording contentes in a standard format and using open tools that any person can use.

  • Speaker: Ricardo Basílio, digital curator
  • Duration: 15h30 -17h00
  • Vídeo
  • Slides

June 28, 2022 – Repeat of the first session (extra session)

Open session for those who were not able to participate in the 1st session.

  • Speaker: Ricardo Basílio, digital curator
  • Duration: 15h30 -17h00
  • Video
  • Slides

Online exhibition: discover museums’ online presence over time

 

Municipality of Sines and Arquivo.pt together on the International Archives Day

thumbnail-sines-dia-internacional-dos-arquivos

Last updated on June 27th, 2022 at 08:40 am

The Municipal Archive of the Municipality of Sines and Arquivo.pt celebrated the International Archives Day, June 9, at the Salão Nobre dos Paços do Concelho, with a Workshop on preserving the digital memory of Sines (Portugal).

The meeting was broadcast online with the aim of sharing with the community of archivists what has been an experience of collaborative curation of Web content.

Collaboration between a municipal archive and a web archive

This meeting took place in the continuity of a collaboration between the two teams developed during the pandemic period.

The Arquivo Municipal de Sines made a selective and systematic collection of Web content related to the Municipality of Sines, with the collaboration of local media, such as Rádio Miróbriga and Rádio Sines.

In turn, Arquivo.pt contributed with training on tools, like Webrecorder.net, that records in standardized format and prepared useful services, such as SavePageNow that allows to record pages on the fly directly on Arquivo.pt.

Local history is better with preserved Web pages

From this collaboration resulted the preservation of thousands of Web pages (about 200 Gigabytes of information) about the experience of the pandemic in the geographical area of Sines and Santiago do Cacém.

The copies of the Web Archive Files (WARCs) sent to Arquivo.pt have been integrated to become available.

Presentations

Training in colaboration with the City Council of Lisboa

Thumbnail_passaporte-competencias-digitais-arquivopt

Last updated on December 13th, 2021 at 12:02 pm

print_passaporte-competencias-digitais

A cycle of webinars was held between October and December 2021, organised by the Department of Development and Training of the Municipality of Lisbon, within the digital skills program Passaporte Competências DigitaisCâmara Municipal de Lisboa, in collaboration with Centro Qualifica +ValorLx, a Infraestrutura ROSSIO and Arquivo.pt Fundação para a Ciência e a Tecnologia I.P.

The aim of this initiative was to present the services of Arquivo.pt and disseminate their use so that the historical heritage published on the web can be preserved and exploited by any citizen.

The sessions were open by registration and had a total of 126 participants (average of 31 per session).

The speakers’ presentations were recorded and can now be accessed, along with the slides from each session.

Sessions held

September 15 – Arquivo.pt. What is it? What is it for?

Daniel Gomes, manager of Arquivo.pt, the public Web preservation service operated by the Fundação para a Ciência e a Tecnologia, I.P., explains how any citizen can use to consult Web pages from the past in the most diverse cases and talks about the importance of preserving the digital memory.

November 11 – API Arquivo.pt : automatic acess to the Web preserved information

Vasco Rato, web developer of Arquivo.pt, presented the Arquivo.pt’s APIs (Application Programming Interface). These enable the development of innovative and useful applications for organizations through the automatic processing of historical information preserved from the Web.

November 25 – Archive the Web: do-it-yourself!

Ricardo Basílio, curador digital do Arquivo.pt, apresentou um tutorial sobre a utilização das ferramentas do Webrecorder.net para gravação de páginas Web em formato normalizado no próprio computador, a qual permite que uma pessoa ou uma organização possa organizar em pequena escala o seu próprio arquivo da Web.

December 9 – Publish on the Web: best practices  by Arquivo.pt

Pedro Gomes, the engineer responsible for the Arquivo.pt crawls, addressed the issue of publishing preservable web contents. How many contents are in formats that make their future access difficult or impossible? These situations were illustrated with practical cases and recommendations on how to avoid them. Therefore, it all boils down to publishing well in order to preserve well.

Know more about Arquivo.pt training

Arquivo.pt is open to collaborations aiming at training professionals in organizations or common citizens on Web preservation.

Learn about the training modules and contact us.

 

Book “The Past Web: Exploring Web Archives” available in Green Open access!

thumb-the-past-web

Last updated on September 13th, 2022 at 04:15 pm

Since 2006, a book has not been published that reflects the state-of-the-art in the area of ​​web preservation and the research that has been conducted on web archives.

The main goal of the new book The Past Web: Exploring Web Archives was to create a new, up-to-date resource to educate more people in the field of web preservation and to make web archives known to researchers and academics.

As such, the book is primarily aimed at the academic and scientific communities, and presents the most innovative methods for exploring information from the past preserved by web archives.

Daniel Gomes, head of Arquivo.pt led the book’s editorial team, along with the field specialists Elena DemidovaJane Winters and Thomas Risse. In total, the book resulted from the contributions of 40 authors from around the world who are experts in web archiving.

The book is divided into 6 parts where we find various resources for exploring pages archived from the Internet since the 1990s.

We can also learn how to preserve our collective memory in the Digital Era, which strategies to use when selecting online content, and what impact web archives have on preserving historical information.

The book aims to support professors in their mission to transmit innovative and adequate knowledge for the digital literacy required to train professionals for the 21st century.

Daniel Gomes from Arquivo.pt, alerts to the need of including web archives in teaching plans and emphasizes that this knowledge brings a great competitive advantage especially for students of Humanities and Social Sciences.

An innovative detail of this book is that all its cited links have been preserved by Arquivo.pt so that the references remain valid over time.

The book was available for free to be downloaded from Portuguese higher education institutions (b-On member entities) until March 6th 2022.

However, you can still download a pre-print version of the book (Green Open Access).

Links

Book launch at Jornadas FCCN 2021

Apresentação do livro
Apresentação do livro
Apresentação do livro
Apresentação do livro
Apresentação do livro
Apresentação do livro
Apresentação do livro
Apresentação do livro
Apresentação do livro Apresentação do livro Apresentação do livro Apresentação do livro Apresentação do livro Apresentação do livro Apresentação do livro Apresentação do livro

“Art Forever on the Web”: Cycle of Webinars

composicao sobre Colectiva de Artistas 2008 Quadrado Azul

Last updated on July 6th, 2021 at 01:23 pm

composicao sobre Colectiva de Artistas 2008 Quadrado Azul

Colectiva de Artistas. 2008.04.19 a 2008.06.07. Galeria Quadrado Azul. Porto. Composition from a Webpage preserved on Arquivo.pt: www.quadradoazul.pt, 22nd October 2008.

On April 29, May 27 and July 1, from 3 to 4:30 pm, webinars geared to the community of artists, curators, gallerists and event producers will be held, open also to anyone interested in learning more about preserving art websites.

Throughout the sessions, participants will learn in detail about the functionalities of Arquivo.pt in order to take advantage of this public Web preservation service. They will have technical information, in the form of recommendations and best practices, to create preservable websites. Finally, they will learn how to use available tools to save their websites in a standardized format so that their contents are not lost.

This cycle of Webinars is an initiative of the “Forever” Project, a collaboration between the Calouste Gulbenkian Foundation Art Library and Arquivo.pt under the ROSSIO infrastructure.

For more details and sharing, please see the program (PDF) (in Portuguese).

Sign up!

April 29 – The Arquivo.pt and the preservation of digital memory
May 27 – Recommendations for creating preservable websites for the future
July 1 – Archiving the Web: do-it-yourself!

Held sessions presentations

Online archives or archives of the online?

thumbnail_tendencias

At the end of 2020, we recommend some texts that put the future in perspective.

We highlight the theme of preserving online content presented in the ebook “Tendências 2021” (Trends 2021). The contribution of Daniel Gomes, the Arquivo.pt manager, was entitled “Arquivos online ou do online?” (Online archives or archives of the online?).

I was invited to write about the challenges and threats to online archives. The first question that came to me was what is meant by an “online archive”?

My concern lies in the “archives of the online” because there is not even an established awareness about their need, whether at an academic, governmental or individual level.

It is technologically impossible to preserve all information available online. But it is absurd not to be aware that we have to preserve some of the information online for short, medium and long term access.

The complete text (in Portuguese) is available at pages 23 to 26 of the open-access book “Tendências 2021”.

The challenge is to cultivate awareness about the importance of preserving content online by learning how to do it in practice.

Happy New Year!

Online Cafe with Arquivo.pt is back

Café com o Arquivo.pt

Last updated on August 23rd, 2022 at 04:17 pm

Café com o Arquivo.pt

Share this page: arquivo.pt/onlinecafe

Welcome to the second season of the Online Cafe with Arquivo.pt

Talk directly to the Arquivo.pt team and get answers to all your questions!  The Arquivo.pt launched a new cycle of team chats with you through online sessions. Brief introductory presentations will be given, leaving time to ask all your questions about how to get more out of Arquivo.pt or how to apply to the Arquivo.pt Awards.

Sessions held

21st session – Bilions of images to search on Arquivo.pt – all about the Arquivo.pt API

In March 2021 Arquivo.pt launched a new version with 1800 million images available. The search for images in web archives at this scale is unique in the world and innovative. The process used for indexing was explained in detail, in this session, as well as the best way to take advantage of these resources using the API to create new works based on images.

André Mourão, Ph.D. in Computer Science, is working on the indexing of the information, specially images from the Internet of the past to the present at Arquivo.pt (Portuguese Web Archive). He is a researcher working on ways to search and interpret multimedia data (e.g., images, text, video) effectively at large scales. He is also the co-creator of Revisionista.PT, uncovering post-publication edits in Portuguese news articles (with Flávio Martins) and an Associated Member at NOVA LINCS research center.

20th session – March 26 – The Online Centenary of the Great War

Daniela Major, invited speaker of the 20th Café with Arquivo.pt, presented a use case about historical research and old websites.  The commemoration of the Centenary of the 1st World War generated several strategies for international cooperation in the 21st century and such diversity is present in the information published on commemorative websites. In this session, Daniela Major will show how she used Web archives to start a study in the context of Contemporary History, as well as the methodological implications in her work.

Daniela Major is a PhD student in Digital Humanities at the School of Advanced Study, University of London. In 2019, within the scope of the ROSSIO Infrastructure, he started at Arquivo.pt a study on the celebrations of World War I based on the preserved contents of the Web. Currently, his PhD focuses on the media impact of the idea of Europe over the past few years. 15 years, in an effort to combine intellectual history with digital humanities.

This event, Cafe with Arquivo.pt, toke place for the first time 1 year ago, on the 27th march 2020. Cheers!

Special Session – March 2 – Arquivo.pt Award story and questions

Resume: The Arquivo.pt Award was created in 2018 in order to promote the use of Arquivo.pt for innotive works. How many candidates had applied since till now? What areas did the work focus on? What is the balance between Studies and Applications? Who were the winners and what were their main contributions? These and other questions were the guidelines for this session dedicated Award 2021.

This session was presented by Daniel Gomes and the Arquivo.pt team

18th session – Newspapers and web archives

This session was dedicated to the websites of Portuguese newspapers. Diogo Silva da Cunha talked about his first contact with Arquivo.pt and his approach, concretized in an “research route” as a way of delimiting the scope of his analysis. He also presented the results of his research about Correio da Manhã, Diário de Notícias, Expresso and Público newspapers.

Diogo Silva da Cunha is PhD student at the Institute of Social Sciences at the University of Lisbon, collaborator at the Center for Philosophy of Sciences at the same university. He recently published a study on the preservation of newspapers’ web pages in the book “O choque tecno-liberal, os media e o jornalismo: estudos críticos sobre a realidade Portuguesa”. In 2017, he participated in the Digital Humanities project “Investiga XXI” at Arquivo.pt.

17th session – january  15 – How to do an exhibition of old web pages without being an IT expert (tutorial)

This session present by Ricardo Basílio, the Arquivo.pt digital curator, is focused in practical aspects, when preparing an exhibition of old pages. As example: the use of long links, tipical of web archives, graphic aspects to be taken into account and navigation routes between webpages. The WordPress.com is used as platform to show how easy is build a web exhibition. The core aspects of dissimination content from Web archives have application in other platforms.

  • Video and presentation: (soon)

To know more about Portuguese newspapers in the Arquivo.pt see the exhibition Memória da Imprensa Portuguesa.

16th session – december 11 – Arquivo Económico .pt

Arquivo Económico .PT, authored by Nuno Bragança, 3rd place in Arquivo.pt Awards 2020, is a WebApp that allows discovering prices on web pages along time, over a set products in frequent use and compare them with current prices. Data are obtainned automatically from Arquivo.pt, processed and presented in an intuitive way for the common user. The possibility of comparing the present with the past based on information from the archived web shows how it can be useful not only to satisfy curiosity but also to support studies in many areas.

Query satisfaction of this presentation

15th session – november  24 – Extension Arquivo.pt

In this session we met the winners of 2nd place the Arquivo.pt Awards 2020. Rodrigo Marques and Hugo Silva talked about the their work “Arquivo.pt Extension” wich is a browser extension that allows users to search on Arquivo.pt. They showed through practical examples how the extension save time and helps the acess to the Arquivo.pt.

Query satisfaction of this presentation

Special session – World Digital Preservation Day 2020 – november 5

In November, World Digital Preservation Day is broadly celebrated and, to mark this international initiative, Arquivo.pt held an online session open to the community. The special guest of this session was the winner of the Arquivo.pt Award 2020, Miguel Ramalho, who told us about his work entitled “Desarquivo”.

Sessions held  between mars and july 2020

Online Cafe with Arquivo.pt

Café com o Arquivo.pt

Last updated on November 24th, 2020 at 05:18 pm

Wellcome to Arquivo.pt  Online Cafe!

Talk directly to the Arquivo.pt team and get answers to all your questions!

The Arquivo.pt team chats with you through online sessions.

Brief introductory presentations will be given, leaving time to ask all your questions about how to get more out of Arquivo.pt or how to apply to the Arquivo.pt Awards.

Sessions held in the 1st season

1st session, 27 March – Website Preservation: Do It Yourself!

The 1st session (in Portuguese) was about Website Preservation: Do It Yourself! and counted with the participation of Ricardo Basílio (Digital Curator of Arquivo.pt) and Daniel Gomes (Manager of Arquivo.pt).

2rd session, April 3 – meuParlamento.pt

The App meuParlamento.pt, was the winner of Arquivo.pt Award 2019. Nuno Moniz presented the relevance of this app to the citizen participation on politics. Arian Pasquali and Tomás Amaro, also authors of this work were presents. The session continued with questions related to the development of works from Arquivo.pt.

3th session, April 17 – Arquivo.pt Award and News on Arquivo.pt

After Easter break Arquivo.pt Online Café was back, presented by Daniel Gomes. This session was dedicated to clarify doubts for those who are finalizing their work to compete for the Arquivo.pt Award. Finally, the new interface of Arquivo.pt has been presented.

4th session, April 24 – Revisionista.PT – Uncovering the News

Flávio Martins and André Mourão, creators of the Revisionista.pt, talked about this tool that uses Arquivo.pt to show the reviews of a given new after its publication in newspapers.

5th session, April 30 – Public speeches about violence in private

Zélia Teixeira, Professor at Fernando Pessoa University and Psychologist, brought us an analysis of 217 news collected in Arquivo.pt from the three main daily newspapers, on domestic violence.

6th session, May 8 – Arquivo.pt API – How to process data at large scale?

André Mourão, Engineer I&D explained Arquivo.pt APIs (Application Programming Interfaces) through examples and cases, in the session held on 8 April. One doesn’t need to be an IT expert to see the the potencial of the API when used on research or new tools.

7th session, May 15 – Website Preservation: Do It Yourself!

Ricardo Basílio, Arquivo.pt’s web curator, presented a tutorial dedicated to Webrecorder and Browsertrix. This tools are usefull to capture websites locally in a small scale. From a demonstration of how it works, Arquivo.pt want to encourage the community. Anyone can make a selection of pages or websites and preserve them in a standardized format.

8th session, May 22 – The history of video games on the Portuguese web

Miguel Costa, Web developer and passionate about Web, tecnologies and videogames talked about the main figures of national business of videogames and about the first Portuguese videogame. In Arquivo.pt he founded archived files of videogames and a lot of information.

9th session, May 29 – Straight Edge in the metropolitan area of Lisbon

In the 9th session of the Café, we have got to know Straight Edge and its presence in the punk/hardcore medium of the metropolitan area of Lisbon in the 90s more closely. Diogo Duarte, anthropologist and researcher at the Contemporary History Institute of Universidade Nova de Lisboa, talkedabout his work dedicated to the theme and about the importance of Arquivo.pt to study this movement and other expressions of popular culture.

1oth session, June 5 – Health and Internet: an evolution

Health and Internet was the topic of the 10th session of Arquivo.pt Café, presented by Rita Espanha, professor and researcher at the ISCTE (University Institute of Lisbon) and CIES (Centre for Research and Studies in Sociology). The Internet has become the privileged medium where citizens seek information and build their own know in all areas of your life, including health. State agencies in turn have developed services that use the Internet. From the outside, part of the population remains that has not followed this change. The other part of the population that has easy access to information does not always have the critical sense to evaluate information and use it to their advantage. All of these issues became more evident during the Covid-19 pandemic period.

11th session, June 19 – Creating and managing preservable websites

The team of Arquivo.pt presented a set of good practises when publishing information though the Web, in order to its preservation.

12th session, June 26 – “Tell me Stories”, “Conta-me Histórias

“Tell me Stories”, “Conta-me Histórias” is a service that creates temporal narratives, based on the contents preserved by Arquivo.pt. This application was the winner of the Arquivo.pt Prize 2018. One of its authors, Ricardo Campos (IPT; INESC TEC), talked about the service developments. Arian Pasquali, member of the development team, also participated in the discussion.

13th session, July 3 – Arquivo de Opinião

Researchers on NLP (Natural Language Processing find in this session an excellent use case explained in detail by its author. Miguel Won, resercher at the INESC-ID (Lisbon), talked about the opinion sections of the media. How do commentators read events and how does this reflect their political position? Based on this question, he developed the Web application Arquivo de Opinion, awarded in 2018, which presents a history of the opinion columns of Portuguese newspapers, from the pages of Arquivo.pt. In this session we got to know the news of the project, which now also collects pages from social networks.

14th session, July 10 – Museum of Portuguese Web Design

Sandra Antunes, Professor at the School of Technology and Management of Viseu (ESTGV) spoke about virtual spaces for the memory of Portuguese Web design and showed the importance of a museum to fill gaps in the areas of preservation, exhibition and history of Portuguese Web design.

Sessions of the 2nd season

Arquivo.pt training in Azores islands

Daniel Gomes in Azores islands

Last updated on July 15th, 2022 at 12:56 pm

Memorial (hight quality preservation) and image search were highlighted as new developments in Arquivo.pt during Jornadas de Computação Científica 2019, held from 6 to 8 at the University of Azores in Ponta Delgada.

On the first day of this annual event, Arquivo.pt developed a training session in 4 parts:

  • Memory of the web: a forgoten heritage?, by Daniel Gomes (in portuguese);
  • Curation of institutional websites, by Ricardo Basílio (in portuguese);
  • Automatic access and processing of preserved Web data (APIs), by Fernando Melo (in portuguese);
  • Recommendations for web publication of preservable information, by Daniel Bicho (in portuguese).

Participants learned about the web preservation service offered to the community by the Arquivo.pt that for the purpose of researching and safeguarding the digital heritage, and how they can help preserve the Web.

In addition to the Jornadas 2019, the Arquivo.pt team also made two presentations in class context. The first was to students of the Informatics – Networks and Multimedia course at the University of Azores, and the second at the Escola Secundária das Laranjeiras (high school) in Ponta Delgada.

To schedule a training session with Arquivo.pt, contact us.

Jornadas 2019

Daniel Gomes in Azores islands
Universidade dos Açores
Jornadas 2019
Jornadas 2019 Melo
Jornadas 2019 Daniel Bicho
Açores
Aula Universidade dos Açores
Açores Escola das Laranjeiras
Daniel Gomes in Azores islands Universidade dos Açores Jornadas 2019 Jornadas 2019 Melo Jornadas 2019 Daniel Bicho Açores Aula Universidade dos Açores Açores Escola das Laranjeiras