EuropeanCommission

Arquivo.pt preserved websites about Research & Development projects funded by the EU

Last updated on July 27th, 2021 at 04:07 pm

Arquivo.pt automatically identified R&D project websites to preserve their content. It preserved 52 million web files (7 TB) related to science for future access.

R&D websites publish valuable information but are being lost

Websites about Research and Development (R&D) projects are increasingly being used to publish important scientific information that complements published literature (e.g. data sets, documentation or software). However, after projects ending, the corresponding websites usually disappear causing a permanent loss of unique and valuable scientific information.
Percentage of project URLs from the EU Open Data Portal that referenced relevant content in November 2015 distributed per work programme since FP4 (1994). 
Percentage of project URLs from the EU Open Data Portal that referenced relevant content in November 2015 distributed per work programme since FP4 (1994).

Online information related to R&D projects is not being fully documented. For example, information about the URLs of projects funded by the 7th Framework Program (FP7) available at the European Union’s Open Data Portal is missing for 92% of the projects.

Arquivo.pt automatically identified URLs related to Research and Development projects

The main objective of Arquivo.pt is to preserve online information for scientific and academic purposes. Therefore, it developed a pragmatic and low-cost process that automatically identifies URLs related to R&D projects to be systematically preserved. Automatic identification is achieved through the combination of open data sets with free search services. This work is detailed in an article published at the International Conference on Digital Preservation 2016.

All the data sets and tools developed during this research have been made publicly available in open access so that they can be reused and collaboratively enhanced.

52 million web files related to science were preserved

The application of the developed process already enabled the preservation of 52 million files (7 TB) obtained from 53 993 websites of R&D projects financed since the FP4 (1994), such as the WEZARD project funded by FP7 aimed at “preparing the future research community in the area of air transport system robustness when it is faced with weather hazards”. The website for this project (www.wezard.eu) is no longer available online. However, it was preserved and can be accessed at Arquivo.pt.
All the websites identified and preserved during this project are accessible through Arquivo.pt since March 2017.
Preserved website of the WEZARD project (www.wezard.eu), funded by FP7 between 2011 and 2013, available at Arquivo.pt.
Preserved website of the WEZARD project (www.wezard.eu), funded by FP7 between 2011 and 2013, available at Arquivo.pt.

Contributions to complement the European Open Data Portal data sets

The developed process was applied to the data sets published through the European Open Data Portal to try to complement the missing information regarding project URLs. The obtained results showed that the completeness of the FP7 data set was improved by 86.6%.

All the resulting data sets were made publicly available so that they can be improved and reused by other organizations also interested on preserving this digital heritage (FP4FP5FP6FP7).

References

Are you a researcher?