Arquivo.pt certified as an open data provider

selo-dados-gov

Arquivo.pt has been collaborating with Agência Modernização Administrativa (AMA) with the aim of improving the preservation of Public Administration websites.

Collaboration is based on three action points:

AMA is the public organisation responsible for promoting digital means in Public Administration and aims to modernise and simplify citizens’ access to State services.

Arquivo.pt is a service operated by the Fundação para a Ciência e Tecnologia I.P. that preserves data published on the Web between 1996 and the present day, making them accessible to any citizen for memory and research purposes.

EU open data directive includes documents on websites

The Directive (EU) 2019/1024 of the European Parliament and of the Council of 20 June 2019 on open data and the re-use of public sector information stipulates the following:

“(30) This Directive lays down the definition of the term ‘document’ and that definition should include any part of a document. The term ‘document’ should cover any representation of acts, facts or information — and any compilation of such acts, facts or information — whatever its medium (paper, or electronic form or as a sound, visual or audiovisual recording.

(34) To facilitate re-use, public sector bodies should, where possible and appropriate, make documents, including those published on websites, available through an open and machine-readable format and together with their metadata, at the best level of precision and granularity, in a format that ensures interoperability

(35) A document should be considered to be in a machine-readable format if it is in a file format that is structured in such a way that software applications can easily identify, recognise and extract specific data from it. Data encoded in files that are structured in a machine-readable format should be considered to be machine-readable data. A machine-readable format can be open or proprietary. They can be formal standards or not.

(60) The Commission should facilitate the cooperation among Member States and support the design, testing, implementation and deployment of interoperable electronic interfaces that enable more efficient and secure public services.

Arquivo.pt is a public service that has the mission of preserving documents published on Internet sites to enable their long-term open access and provides interoperable electronic interfaces (APIs) for their automatic processing.

Arquivo.pt was certified as a Public Administration open data provider

The AMA recognized Arquivo.pt as a public service and open data provider and awarded its certification seal on the Open Data Portal.

Arquivo.pt collects general information published on the Web of interest to the Portuguese community. However, it is also responsible for the preservation of Public Administration websites, such as the Portal do Governo, in collaboration with the Management Center for the Government Electronic Network (CEGER).

Any citizen can access the open data resulting from these historical archives and, for example, search for official information published on the websites of successive governments.

In 2021, Arquivo.pt provided open access to over 10 billion files (721 TB) from 27 million websites. The open data preserved by Arquivo.pt can be explored through the search interface, automatically through API (https://arquivo.pt/api) or by reusing derived datasets.

Derived datasets available on the Open Data Portal

Besides the original web artefacts preserved at Arquivo.pt, this service has generated open datasets derived from its activities, which are now available in open access so that they can be reused:

Resources list

Special collection of Portuguese Presidential Elections

thumbnail_presidential_elections
banner_presidenciais_v
Form to suggest a web page, a web site or other web content

Arquivo.pt invites all citizens to suggest web pages related to the 2021 Presidential Elections to be preserved for the future.

The Presidential Elections will take place in Portugal on January 24, 2021.

Your suggestions are important so that Arquivo.pt can keep a more complete memory of this important electoral event.

To suggest web pages use this form (https://tinyurl.com/presidenciais-sugerir)

Arquivo.pt preserves websites of national scientific projects

thumbnail_memoriafct

Last updated on January 5th, 2021 at 06:47 pm

Preserving scientific project websites is important

The contents of the websites tend to disappear when the scientific projects are finished.

The preservation of scientific project websites is important because:

  • documents the development of projects;
  • ensures access to unique technical and scientific content that researchers have posted on the project websites (eg presentations, photographs, data sets);
  • reinforces the visibility of the results of projects financed by FCT.

Experimental collection of scientific projects websites in 2016

Arquivo.pt automatically collected websites for projects financed by FCT in 2016.

The information about these websites was dispersed as it was not recorded during the administrative process.

For about 20 years, FCT financed scientific projects, so the number of sites could be too high to be identified manually.

Then an automatic methodology for identifying these websites was developed, developed by Arquivo.pt.

The FCT database had a total of 11,996 project entries but did not include references to web addresses. Applying the automatic methodology, 7 956 URLs related to the funded scientific projects were identified.

The collection of content referenced by these addresses resulted in the preservation of 600 721 files (72 GB), including content such as research group web pages, researchers’ personal pages or project-related blogs.

Online references in scientific project reports have been preserved since 2020

From June 2020, the website addresses of the projects financed by FCT must be registered in the progress and final reports funded by FCT.

Arquivo.pt started using these addresses to preserve the contents of websites of national scientific projects in a systematic way.

1st official collection of scientific project websites

In June 2020, Arquivo.pt obtained 263 addresses related to 100 scientific projects from the reports submitted to FCT. Most of the addresses (67%) did not have any version previously preserved in Arquivo.pt.

The addresses obtained point to online resources such as the websites of the projects, R&D units, news in the media, articles in scientific journals or repositories, databases, videos on Youtube or Facebook pages.

In July 2020, a special collection was launched from this set of addresses which resulted in 6.9 GB of information obtained from the visit to 31,606 URLs.

Exhibition about Research & Development projects

The Scientific Research Memory is an online exhibition dedicated to the websites of scientific projects funded by the Foundation for Science and Technology (FCT) that Arquivo.pt has preserved.

There are also websites of the Research & Development Units financed by FCT.

Memorial do Arquivo.pt preserves scientific websites for free

The Memorial do Arquivo.pt service has preserved historic FCT websites that have been disabled. These were created for events or initiatives that have ended and therefore their contents are no longer updated.

To include a website in the Memorial, Arquivo.pt starts by making a high quality collection of its contents.

Then, the collected contents are validated in collaboration with those responsible for the website.

Finally, the website address is redirected to the contents that have been preserved by Arquivo.pt.

For example, if someone wants to access any page on the Scientific Archives Meeting held in 2014, they will be redirected to Arquivo.pt.

Thus, the contents remain accessible over time and the links, the references in scientific communications that may exist do not break.

The digital preservation service Memorial do Arquivo.pt is free of charge for websites of the academic and scientific community, just send a request to contacto@arquivo.pt.

To know more

Online archives or archives of the online?

thumbnail_tendencias

At the end of 2020, we recommend some texts that put the future in perspective.

We highlight the theme of preserving online content presented in the ebook “Tendências 2021” (Trends 2021). The contribution of Daniel Gomes, the Arquivo.pt manager, was entitled “Arquivos online ou do online?” (Online archives or archives of the online?).

I was invited to write about the challenges and threats to online archives. The first question that came to me was what is meant by an “online archive”?

My concern lies in the “archives of the online” because there is not even an established awareness about their need, whether at an academic, governmental or individual level.

It is technologically impossible to preserve all information available online. But it is absurd not to be aware that we have to preserve some of the information online for short, medium and long term access.

The complete text (in Portuguese) is available at pages 23 to 26 of the open-access book “Tendências 2021”.

The challenge is to cultivate awareness about the importance of preserving content online by learning how to do it in practice.

Happy New Year!

World Digital Preservation Day 2020

WDPD2020-English-Portrait-RGB

Last updated on November 23rd, 2020 at 06:20 pm

WDPD2020-English-Landscape-RGB

On November 5, World Digital Preservation Day, Arquivo.pt held an online session open to the community.

Registration form (free but required)

The speaker for this session was the winner of the Arquivo.pt 2020 Award, Miguel Ramalho, who presented his work. “Desarquivo” is a web aplication that searches for entities on Arquivo.pt and return a graph.

As in 2017, 2018 e 2019, we invited everyone to get to know Arquivo.pt, and to use it in research and in the preservation of memory.

World Digital Preservation Day is promoted by the Digital Preservation Coalitium (UK) and an occasion for initiatives around the world, shared on social networks with the WDPD2020 hashtag.

Agenda:

November 5th

3:00 pm – Welcome! Presentation of the Arquivo.pt team (slides, 1 MB, PDF)
3:05 pm – Archive News – Daniel Gomes (slides, 2.6 MB, PDF)
3:15 pm – Desarquivo, 1st place in the Arquivo.pt Awards 2020, by Miguel Ramalho (slides, 3 MB, PDF)
3:45 pm – Questions
4:00 pm – Conclusion

Session video

Satisfaction query

Search the Geocities history!

thumbnail research_geocities

Last updated on November 4th, 2020 at 03:04 pm

Geocities.com was the first major “social network” which enabled anyone to create their website and publish information on the Web. It was created in 1994, acquired by Yahoo in 1999 and shut down in 2009.

Initiatives have been emerging to preserve the content of Geocities, such as the Archive Team project which gathered 641 GB of information in 2009oOCities or Geocities.ws.

Arquivo.pt also integrated Geocities history in its collections!

Now, anyone can explore Geocities through the innovative tools provided by Arquivo.pt (e.g. full-text search, image search or API).

By making the historical collection of Geocities available, Arquivo.pt intends to contribute to the development of innovative studies in areas such as Arts, Humanities or Sociology (see a project summary).

Search Geocities now at: arquivo.pt/searchGeocities

Examples of Geocities preserved websites

Cross-lingual collection about the 2019 European Elections is available

print_europeanelections_q

Last updated on September 28th, 2020 at 12:04 pm

Print European Elections 2019
Print from an archived page on Arquivo.pt: https://www.european-elections.eu

The special collection of web pages about the 2019 European Elections is available for search at Arquivo.pt.

To compile this collection, pages written in 24 European languages ​​were identified through automatic searches on the Bing search engine and suggestions from 17 European countries.

We emphasize the collaboration of the Publications Office of the European Union, which reviewed the list of search terms in the different languages ​​of the European Union.

Between May and July 2019, Arquivo.pt exhaustively collected pages related to the European Elections in several countries.

The resulting collection named “European Elections 2019” comprises 99 million web files that sum 4.8 Terabytes of information.

The technical report “A transnational crawl of the European Parliamentary Elections 2019 ” details the applied methodology. This methodology has been applied to generate other thematic collections such as about Covid-19.

We invited all citizens, especially the researchers, to try this service especially created to search the 2019 European Elections cross-lingual and international collection: https://arquivo.pt/ee2019

To know more

Search 17 million images from the past with Arquivo.pt!

Image viewer Arnold

At the end of 2018, Arquivo.pt launched an experimental image search service from the past, which it was possible to search around 4 million images from the past, coming from some collections of Arquivo.pt.

From April 2019, it became possible to search for images from all the collections of Arquivo.pt.

aArnold Schwarzenegger arquivo.pt image search

You can now search over 17 million unique images (over 50 pixels in width and height) since 1996.

Find pages from the past through the new image search service.

Try the “Visit Page” option to find the Web page from the past that contained the image you selected.

Image viewer Arnold

Try the image search service now!

Search images from the past with Arquivo.pt at https://arquivo.pt/images.jsp?l=en

Arquivo.pt 2019 Award: Submissions are open!

Last updated on February 19th, 2019 at 04:12 pm

Submissions for Arquivo.pt Award 2019 are officially open! The deadline is May 3, at 1 pm (Lisbon time).

The three best works will receive a total of 15 000 euros.

Individual or group works on any subject are accepted. The only requirement is that they use Arquivo.pt as their main source of information.

Here’s how to apply: arquivo.pt/award2019

Good luck!

EXPO’98: Twenty years by Arquivo.pt

Last updated on December 13th, 2018 at 01:59 pm

With the theme “ “Os Oceanos, Um Património para o Futuro” (Oceans, a Heritage for the Future“), EXPO’98 took place in Lisbon, 1998.

Considered a turning point for Lisbon and Portugal, from May 2 to September 30, 1998, the event attracted around 11 million visitors.  EXPO’98 also had as its purpose to highlight the 500 years of the Portuguese Discoveries.

The importance of EXPO’98 did not stop there:

  • It was a strategic project for the country and culminated with the regeneration of an area of about 340 hectares in the eastern part of the city, next to the Tagus River.
  • Attended by 146 countries and 14 international organizations.

EXPO’98 website preserved by Arquivo.pt

However, as in the digital world, much of the information disappears after a short time, only with Arquivo.pt you can even browse EXPO’98 page, serving as a historical and research base about the event.

A moment that Arquivo.pt keeps for the memory of those days not be lost and for future research about their impact on the lives of all Portuguese.

Travel through EXPO’98 with Arquivo.pt!

Examples of EXPO’98 pages preserved by Arquivo.pt