Training in colaboration with the City Council of Lisboa

Thumbnail_passaporte-competencias-digitais-arquivopt

Last updated on December 13th, 2021 at 12:02 pm

print_passaporte-competencias-digitais

A cycle of webinars was held between October and December 2021, organised by the Department of Development and Training of the Municipality of Lisbon, within the digital skills program Passaporte Competências DigitaisCâmara Municipal de Lisboa, in collaboration with Centro Qualifica +ValorLx, a Infraestrutura ROSSIO and Arquivo.pt Fundação para a Ciência e a Tecnologia I.P.

The aim of this initiative was to present the services of Arquivo.pt and disseminate their use so that the historical heritage published on the web can be preserved and exploited by any citizen.

The sessions were open by registration and had a total of 126 participants (average of 31 per session).

The speakers’ presentations were recorded and can now be accessed, along with the slides from each session.

Sessions held

September 15 – Arquivo.pt. What is it? What is it for?

Daniel Gomes, manager of Arquivo.pt, the public Web preservation service operated by the Fundação para a Ciência e a Tecnologia, I.P., explains how any citizen can use to consult Web pages from the past in the most diverse cases and talks about the importance of preserving the digital memory.

November 11 – API Arquivo.pt : automatic acess to the Web preserved information

Vasco Rato, web developer of Arquivo.pt, presented the Arquivo.pt’s APIs (Application Programming Interface). These enable the development of innovative and useful applications for organizations through the automatic processing of historical information preserved from the Web.

November 25 – Archive the Web: do-it-yourself!

Ricardo Basílio, curador digital do Arquivo.pt, apresentou um tutorial sobre a utilização das ferramentas do Webrecorder.net para gravação de páginas Web em formato normalizado no próprio computador, a qual permite que uma pessoa ou uma organização possa organizar em pequena escala o seu próprio arquivo da Web.

December 9 – Publish on the Web: best practices  by Arquivo.pt

Pedro Gomes, the engineer responsible for the Arquivo.pt crawls, addressed the issue of publishing preservable web contents. How many contents are in formats that make their future access difficult or impossible? These situations were illustrated with practical cases and recommendations on how to avoid them. Therefore, it all boils down to publishing well in order to preserve well.

Know more about Arquivo.pt training

Arquivo.pt is open to collaborations aiming at training professionals in organizations or common citizens on Web preservation.

Learn about the training modules and contact us.

 

H2020 projects preserved by Arquivo.pt

Thumbnail H2020 projects

Last updated on August 5th, 2024 at 04:50 pm

The main objective of Arquivo.pt is to preserve online information for research and education purposes.

Previously, Arquivo.pt identified and preserved Research & Development project websites funded by the European Union during the FP4, FP5, FP6 and FP7 programmes (1994-2013).

Now, Arquivo.pt contributed to preserve online information that documents R&D projects funded by the Horizon 2020 programme (2014-2021). It preserved 197 million web files (17 TB) related to science for future access.

H2020 projects publish valuable information online but are being lost

Websites about Research and Development (R&D) projects are increasingly being used to publish and disseminate important scientific information that complements published literature (e.g. data sets, documentation or software).

However, after projects ending, the corresponding websites usually disappear causing a permanent loss of unique and valuable scientific information.

Arquivo.pt automatically identified URLs that document H2020 Research and Development projects

The European Union’s Open Data Portal published a data set from the Community Research and Development Information Service (CORDIS) that documents H2020 research projects. However, from the 31 129 projects listed, only 46% presented a project URL.

Arquivo.pt developed a low-cost methodology that automatically identifies URLs related to R&D projects to be systematically preserved. This automatic identification is achieved through the combination of open data sets with web search services. This methodology is detailed on a scientific article published at the International Conference on Digital Preservation 2016.

In sum, we extracted 106 300 unique URLs from the following open data sets:

Then, we extracted the acronym and title of the projects from the data sets and automatically searched the web for additional URLs using the Bing Search API.

All the data sets and tools developed have been made publicly available in open access so that they can be reused and collaboratively enhanced. In particular, you can access the software developed to automatically identify additional URLs about H2020 projects.

197 million web files related to science were preserved

Arquivo.pt identified and preserved 197 million web files (17 TB) that document R&D projects funded by Horizon 2020.

In 2021, we can already witness project websites that are no longer available online, such as the Extended Model of Organic Semiconductors (EXTMOS) project (http://extmos.eu/). However, it was preserved and can be accessed at Arquivo.pt:

Archived version at Arquivo.pt (https://arquivo.pt/wayback/20170427182603/http://extmos.eu/) of the home page of the EXTMOS Research and Development project (http://extmos.eu/)funded by H2020.
Archived version at Arquivo.pt of the home page of the EXTMOS Research and Development project funded by H2020.

Contributions to complement the European Open Data Sets

All the resulting data sets were made publicly available so that they can be improved and reused by other organizations also interested on preserving this digital heritage:

If you want to know more information about this collection you can watch the video Preservation of web content related to Horizon 2020.

References

Are you a researcher?

Arquivo.pt certified as an open data provider

selo-dados-gov

Last updated on August 17th, 2022 at 08:39 am

Arquivo.pt has been collaborating with Agência Modernização Administrativa (AMA) with the aim of improving the preservation of Public Administration websites.

Collaboration is based on three action points:

AMA is the public organisation responsible for promoting digital means in Public Administration and aims to modernise and simplify citizens’ access to State services.

Arquivo.pt is a service operated by the Fundação para a Ciência e Tecnologia I.P. that preserves data published on the Web between 1996 and the present day, making them accessible to any citizen for memory and research purposes.

EU open data directive includes documents on websites

The Directive (EU) 2019/1024 of the European Parliament and of the Council of 20 June 2019 on open data and the re-use of public sector information stipulates the following:

“(30) This Directive lays down the definition of the term ‘document’ and that definition should include any part of a document. The term ‘document’ should cover any representation of acts, facts or information — and any compilation of such acts, facts or information — whatever its medium (paper, or electronic form or as a sound, visual or audiovisual recording.

(34) To facilitate re-use, public sector bodies should, where possible and appropriate, make documents, including those published on websites, available through an open and machine-readable format and together with their metadata, at the best level of precision and granularity, in a format that ensures interoperability

(35) A document should be considered to be in a machine-readable format if it is in a file format that is structured in such a way that software applications can easily identify, recognise and extract specific data from it. Data encoded in files that are structured in a machine-readable format should be considered to be machine-readable data. A machine-readable format can be open or proprietary. They can be formal standards or not.

(60) The Commission should facilitate the cooperation among Member States and support the design, testing, implementation and deployment of interoperable electronic interfaces that enable more efficient and secure public services.

Arquivo.pt is a public service that has the mission of preserving documents published on Internet sites to enable their long-term open access and provides interoperable electronic interfaces (APIs) for their automatic processing.

The Portuguese Law No. 68/2021 of 2021-08-26 approves the general principles on open data and transposes the European Directive.

Arquivo.pt was certified as a Public Administration open data provider

The AMA recognized Arquivo.pt as a public service and open data provider and awarded its certification seal on the Open Data Portal.

Arquivo.pt collects general information published on the Web of interest to the Portuguese community. However, it is also responsible for the preservation of Public Administration websites, such as the Portal do Governo, in collaboration with the Management Center for the Government Electronic Network (CEGER).

Any citizen can access the open data resulting from these historical archives and, for example, search for official information published on the websites of successive governments.

In 2021, Arquivo.pt provided open access to over 10 billion files (721 TB) from 27 million websites. The open data preserved by Arquivo.pt can be explored through the search interface, automatically through API (https://arquivo.pt/api) or by reusing derived datasets.

Derived datasets available on the Open Data Portal

Besides the original web artefacts preserved at Arquivo.pt, this service has generated open datasets derived from its activities, which are now available in open access so that they can be reused:

Resources list

Video presentation at the IIPC Web Archiving Conference 2022

Internet Memory Foundation collection available in Arquivo.pt

logo Internet Memory Foundation - website

Last updated on August 9th, 2024 at 04:15 pm

The historical collection of web content generated during the Internet Memory Foundation’s (IMF) activity has been donated to Arquivo.pt and is now searchable!

The IMF was a European organization dedicated to preserving web content that was wound up in 2018.

The 1st web archiving project in Europe (2004-2010) was led by Julien Masanès (who was guest of honour at the celebration of 10 years of Arquivo.pt) and was called European Archive Foundation.

In 2010, Julien Masanès, the “father” of Web archives in Europe created the IMF.

Examples of pages from the collection donated by the IMF

The collection donated by the IMF has now been integrated in the Arquivo.pt collection to be preserved for posterity.

This collection is composed of 142 million files that total 6.3 TB of historical information whose texts or images can now be searched through Arquivo.pt.

webpage liteScience printscreen

Life Science Competence in Europe portal, 2009.

homepage www-imes-fp6-limes-eu

LIMES project homepage (Land and Sea Monitoring for Environment and Security), 2009.

print homepage intelligence-territoriale-eu

Project Intelligence-territoriale homepage, 2009.

European Parliament news page in the 20th anniversary of the break of the Berlim Wall, 2009.

Le Figaro about French presidential election, 2012.

Reuters with a new about WikiLeaks, 2011.

Print da página do Internet Memory Research em 2014

Internet Memory Foundation homepage, 2014.

Search this new collection!

This new collection has been named “InternetMemory” in the Arquivo.pt collections list.

Searches can be made on this collection using the collection search parameter or through the custom search page available at arquivo.pt/InternetMemory.

custom-search-page of Internet Memory collection

 

Arquivo.pt preserves websites of national scientific projects

thumbnail_memoriafct

Last updated on October 1st, 2021 at 09:11 am

Preserving scientific project websites is important

The contents of the websites tend to disappear when the scientific projects are finished.

The preservation of scientific project websites is important because:

  • documents the development of projects;
  • ensures access to unique technical and scientific content that researchers have posted on the project websites (eg presentations, photographs, data sets);
  • reinforces the visibility of the results of projects financed by FCT.

Experimental collection of scientific projects websites in 2016

Arquivo.pt automatically collected websites for projects financed by FCT in 2016.

The information about these websites was dispersed as it was not recorded during the administrative process.

For about 20 years, FCT financed scientific projects, so the number of sites could be too high to be identified manually.

Then an automatic methodology for identifying these websites was developed, developed by Arquivo.pt.

The FCT database had a total of 11,996 project entries but did not include references to web addresses. Applying the automatic methodology, 7 956 URLs related to the funded scientific projects were identified.

The collection of content referenced by these addresses resulted in the preservation of 600 721 files (72 GB), including content such as research group web pages, researchers’ personal pages or project-related blogs.

Online references in scientific project reports have been preserved since 2020

From June 2020, the website addresses of the projects financed by FCT must be registered in the progress and final reports funded by FCT.

Arquivo.pt started using these addresses to preserve the contents of websites of national scientific projects in a systematic way.

1st official collection of scientific project websites

In June 2020, Arquivo.pt obtained 263 addresses related to 100 scientific projects from the reports submitted to FCT. Most of the addresses (67%) did not have any version previously preserved in Arquivo.pt.

The addresses obtained point to online resources such as the websites of the projects, R&D units, news in the media, articles in scientific journals or repositories, databases, videos on Youtube or Facebook pages.

In July 2020, a special collection was launched from this set of addresses which resulted in 6.9 GB of information obtained from the visit to 31,606 URLs.

Exhibition about Research & Development projects

The Scientific Research Memory is an online exhibition dedicated to the websites of scientific projects funded by the Foundation for Science and Technology (FCT) that Arquivo.pt has preserved.

There are also websites of the Research & Development Units financed by FCT.

Memorial do Arquivo.pt preserves scientific websites for free

The Memorial do Arquivo.pt service has preserved historic FCT websites that have been disabled. These were created for events or initiatives that have ended and therefore their contents are no longer updated.

To include a website in the Memorial, Arquivo.pt starts by making a high quality collection of its contents.

Then, the collected contents are validated in collaboration with those responsible for the website.

Finally, the website address is redirected to the contents that have been preserved by Arquivo.pt.

For example, if someone wants to access any page on the Scientific Archives Meeting held in 2014, they will be redirected to Arquivo.pt.

Thus, the contents remain accessible over time and the links, the references in scientific communications that may exist do not break.

The digital preservation service Memorial do Arquivo.pt is free of charge for websites of the academic and scientific community, just send a request to contacto@arquivo.pt.

To know more

Online archives or archives of the online?

thumbnail_tendencias

At the end of 2020, we recommend some texts that put the future in perspective.

We highlight the theme of preserving online content presented in the ebook “Tendências 2021” (Trends 2021). The contribution of Daniel Gomes, the Arquivo.pt manager, was entitled “Arquivos online ou do online?” (Online archives or archives of the online?).

I was invited to write about the challenges and threats to online archives. The first question that came to me was what is meant by an “online archive”?

My concern lies in the “archives of the online” because there is not even an established awareness about their need, whether at an academic, governmental or individual level.

It is technologically impossible to preserve all information available online. But it is absurd not to be aware that we have to preserve some of the information online for short, medium and long term access.

The complete text (in Portuguese) is available at pages 23 to 26 of the open-access book “Tendências 2021”.

The challenge is to cultivate awareness about the importance of preserving content online by learning how to do it in practice.

Happy New Year!

World Digital Preservation Day 2020

WDPD2020-English-Portrait-RGB

Last updated on November 23rd, 2020 at 06:20 pm

WDPD2020-English-Landscape-RGB

On November 5, World Digital Preservation Day, Arquivo.pt held an online session open to the community.

Registration form (free but required)

The speaker for this session was the winner of the Arquivo.pt 2020 Award, Miguel Ramalho, who presented his work. “Desarquivo” is a web aplication that searches for entities on Arquivo.pt and return a graph.

As in 2017, 2018 e 2019, we invited everyone to get to know Arquivo.pt, and to use it in research and in the preservation of memory.

World Digital Preservation Day is promoted by the Digital Preservation Coalitium (UK) and an occasion for initiatives around the world, shared on social networks with the WDPD2020 hashtag.

Agenda:

November 5th

3:00 pm – Welcome! Presentation of the Arquivo.pt team (slides, 1 MB, PDF)
3:05 pm – Archive News – Daniel Gomes (slides, 2.6 MB, PDF)
3:15 pm – Desarquivo, 1st place in the Arquivo.pt Awards 2020, by Miguel Ramalho (slides, 3 MB, PDF)
3:45 pm – Questions
4:00 pm – Conclusion

Session video

Satisfaction query

Search the Geocities history!

thumbnail research_geocities

Last updated on September 23rd, 2021 at 03:30 pm

Geocities.com was the first major “social network” which enabled anyone to create their website and publish information on the Web. It was created in 1994, acquired by Yahoo in 1999 and shut down in 2009.

Initiatives have been emerging to preserve the content of Geocities, such as the Archive Team project which gathered 641 GB of information in 2009oOCities or Geocities.ws.

Arquivo.pt also integrated Geocities history in its collections!

Now, anyone can explore Geocities through the innovative tools provided by Arquivo.pt (e.g. full-text search, image search or API).

By making the historical collection of Geocities available, Arquivo.pt intends to contribute to the development of innovative studies in areas such as Arts, Humanities or Sociology (see a project summary).

Search Geocities now at: arquivo.pt/searchGeocities

Examples of Geocities preserved websites

Video Enhancing access to research the Geocities historical collection

Enhancing access to research the Geocities historical collection, Pedro Gomes, RESAW 2021 (slides)

 

Cross-lingual collection about the 2019 European Elections is available

print_europeanelections_q

Last updated on August 30th, 2022 at 10:46 am

Print European Elections 2019
Print from an archived page on Arquivo.pt: https://www.european-elections.eu

The special collection of web pages about the 2019 European Elections is available for search at Arquivo.pt.

To compile this collection, pages written in 24 European languages ​​were identified through automatic searches on the Bing search engine and suggestions from 17 European countries.

We emphasize the collaboration of the Publications Office of the European Union, which reviewed the list of search terms in the different languages ​​of the European Union.

Between May and July 2019, Arquivo.pt exhaustively collected pages related to the European Elections in several countries.

The resulting collection named “European Elections 2019” comprises 99 million web files that sum 4.8 Terabytes of information.

The technical report “A transnational crawl of the European Parliamentary Elections 2019 ” details the applied methodology. This methodology has been applied to generate other thematic collections such as about Covid-19.

We invited all citizens, especially the researchers, to try this service especially created to search the 2019 European Elections cross-lingual and international collection: https://arquivo.pt/ee2019

Video “A transnational and cross-lingual crawl of the European Parliamentary Elections 2019”

A transnational and cross-lingual crawl of the European Parliamentary Elections 2019, Ivo Branco, IIPC Web Archiving Conference and RESAW 2021 (slides)

To know more:

Online Cafe with Arquivo.pt

Café com o Arquivo.pt

Last updated on November 24th, 2020 at 05:18 pm

Wellcome to Arquivo.pt  Online Cafe!

Talk directly to the Arquivo.pt team and get answers to all your questions!

The Arquivo.pt team chats with you through online sessions.

Brief introductory presentations will be given, leaving time to ask all your questions about how to get more out of Arquivo.pt or how to apply to the Arquivo.pt Awards.

Sessions held in the 1st season

1st session, 27 March – Website Preservation: Do It Yourself!

The 1st session (in Portuguese) was about Website Preservation: Do It Yourself! and counted with the participation of Ricardo Basílio (Digital Curator of Arquivo.pt) and Daniel Gomes (Manager of Arquivo.pt).

2rd session, April 3 – meuParlamento.pt

The App meuParlamento.pt, was the winner of Arquivo.pt Award 2019. Nuno Moniz presented the relevance of this app to the citizen participation on politics. Arian Pasquali and Tomás Amaro, also authors of this work were presents. The session continued with questions related to the development of works from Arquivo.pt.

3th session, April 17 – Arquivo.pt Award and News on Arquivo.pt

After Easter break Arquivo.pt Online Café was back, presented by Daniel Gomes. This session was dedicated to clarify doubts for those who are finalizing their work to compete for the Arquivo.pt Award. Finally, the new interface of Arquivo.pt has been presented.

4th session, April 24 – Revisionista.PT – Uncovering the News

Flávio Martins and André Mourão, creators of the Revisionista.pt, talked about this tool that uses Arquivo.pt to show the reviews of a given new after its publication in newspapers.

5th session, April 30 – Public speeches about violence in private

Zélia Teixeira, Professor at Fernando Pessoa University and Psychologist, brought us an analysis of 217 news collected in Arquivo.pt from the three main daily newspapers, on domestic violence.

6th session, May 8 – Arquivo.pt API – How to process data at large scale?

André Mourão, Engineer I&D explained Arquivo.pt APIs (Application Programming Interfaces) through examples and cases, in the session held on 8 April. One doesn’t need to be an IT expert to see the the potencial of the API when used on research or new tools.

7th session, May 15 – Website Preservation: Do It Yourself!

Ricardo Basílio, Arquivo.pt’s web curator, presented a tutorial dedicated to Webrecorder and Browsertrix. This tools are usefull to capture websites locally in a small scale. From a demonstration of how it works, Arquivo.pt want to encourage the community. Anyone can make a selection of pages or websites and preserve them in a standardized format.

8th session, May 22 – The history of video games on the Portuguese web

Miguel Costa, Web developer and passionate about Web, tecnologies and videogames talked about the main figures of national business of videogames and about the first Portuguese videogame. In Arquivo.pt he founded archived files of videogames and a lot of information.

9th session, May 29 – Straight Edge in the metropolitan area of Lisbon

In the 9th session of the Café, we have got to know Straight Edge and its presence in the punk/hardcore medium of the metropolitan area of Lisbon in the 90s more closely. Diogo Duarte, anthropologist and researcher at the Contemporary History Institute of Universidade Nova de Lisboa, talkedabout his work dedicated to the theme and about the importance of Arquivo.pt to study this movement and other expressions of popular culture.

1oth session, June 5 – Health and Internet: an evolution

Health and Internet was the topic of the 10th session of Arquivo.pt Café, presented by Rita Espanha, professor and researcher at the ISCTE (University Institute of Lisbon) and CIES (Centre for Research and Studies in Sociology). The Internet has become the privileged medium where citizens seek information and build their own know in all areas of your life, including health. State agencies in turn have developed services that use the Internet. From the outside, part of the population remains that has not followed this change. The other part of the population that has easy access to information does not always have the critical sense to evaluate information and use it to their advantage. All of these issues became more evident during the Covid-19 pandemic period.

11th session, June 19 – Creating and managing preservable websites

The team of Arquivo.pt presented a set of good practises when publishing information though the Web, in order to its preservation.

12th session, June 26 – “Tell me Stories”, “Conta-me Histórias

“Tell me Stories”, “Conta-me Histórias” is a service that creates temporal narratives, based on the contents preserved by Arquivo.pt. This application was the winner of the Arquivo.pt Prize 2018. One of its authors, Ricardo Campos (IPT; INESC TEC), talked about the service developments. Arian Pasquali, member of the development team, also participated in the discussion.

13th session, July 3 – Arquivo de Opinião

Researchers on NLP (Natural Language Processing find in this session an excellent use case explained in detail by its author. Miguel Won, resercher at the INESC-ID (Lisbon), talked about the opinion sections of the media. How do commentators read events and how does this reflect their political position? Based on this question, he developed the Web application Arquivo de Opinion, awarded in 2018, which presents a history of the opinion columns of Portuguese newspapers, from the pages of Arquivo.pt. In this session we got to know the news of the project, which now also collects pages from social networks.

14th session, July 10 – Museum of Portuguese Web Design

Sandra Antunes, Professor at the School of Technology and Management of Viseu (ESTGV) spoke about virtual spaces for the memory of Portuguese Web design and showed the importance of a museum to fill gaps in the areas of preservation, exhibition and history of Portuguese Web design.

Sessions of the 2nd season