This event is a meeting for sharing knowledge among the entities that make up the national higher education and research community.
The event counts with the participation of decision-makers of the institutions, people in charge of computer technical services and people responsible for libraries and documentation services, among others.
Arquivo.pt presented two 90-minute sessions, on June 28th from 2h30 p.m. to 6 p.m., under the theme “Arquivo.pt services for managing citations and cybersecurity” and the service Arquivo.pt Memorial in the Zapping session.
Agenda
June 28 2:30-16 p.m.: Arquivo.pt:availableservicesandsystemarchitecture
The platform is maintained by the Celestino Domingues Library of The Estoril Higher Institute for Tourism and Hotel Studies (ESHTE) and has the participation of institutions from various areas of heritage that are content providers.
Among the digitized contents that can be consulted in the catalog and accessed in the provider institutions were sound, image, photography, printed material, but websites were missing.
Thus, the idea for the MUVITUR’s new “Web Pages” collection emerged.
Collaboration between MUVITUR and Arquivo.pt
In 2019, a collaboration between Arquivo.pt and MUVITUR began with the aim of identifying websites related to Tourism in Portugal and to disseminate the history of content published on the Web since 1996.
In 2022, a list was established with about 400 records of websites of various entities related to tourism, hotels, travel agencies, pages of municipalities’ websites dedicated to tourism and others.
Collection of records in the MUVITUR catalog with webpages preserved at Arquivo.pt.
How the integration was done
MUVITUR uses Nyron software, which allows content from different sources to be aggregated using the OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) interoperability protocol, which is very common among libraries, archives and museums to provide content to portals such as Europeana.
Arquivo.pt, however, does not make information available through OAI-PMH so it was necessary to find alternative ways to create a record in Nyron with descriptive information from preserved sites.
The procedure for integration was as follows:
The XML schema with the fields for the metadata, according to what works in Nyron, was exported to an Excel sheet.
The information was entered manually, respecting the format and syntax, in collaboration with the computer technicians.
The XML file with the inserted data was validated and imported into Nyron.
Creating records in catalogs is largely a manual task and requires human curation. However, it was possible to input information to be automatically processed in the records of the Website collection. For example, the thumbnail was obtained using the Arquivo.pt API, more specifically the linkToScreenShot, visible in the technical details of a preserved page (see the options menu on the top right of a replayed page).
For other elements, such as the site’s title, it would be possible to obtain them automatically through the Arquivo.pt API, however the quality of the information depends on what the site’s producers have inserted and may not be accurate. The dates to limit the temporal scope can also be obtained automatically, but the manual method was chosen to control the information presented.
In the continuation of the project, the collection will be increased with new records, as there are thousands of websites about the Tourism sector.
Description of Web contents in the MUVITUR catalog
In the collection “Paginas Web” the following data are used:
Denomination – usually the title of the website
Organization – the entity to which the publication belongs
Website address on the Internet
Address for version in Arquivo.pt
Moment(s) to remember
Link for miniature in Arquivo.pt
Descriptors
Geographical data (location, coordinates, geographical name)
The presentation of the information was adjusted to be aligned with that of other MUVITUR resources and contains links to Arquivo.pt.
For example, in the register of the Turismo do Algarve site, we find a link to a moment to remember in 2011 and another link to the history in Arquivo.pt under “Consultar objecto”.
Organizations can create collections of Websites from their area
The National Library of Australia, for example, included records of preserved Websites in its catalog. In the Library of Congress there are collections of old Websites alongside traditional resources.
However, websites are rarely included in museums.
With this unprecedented project we can say that preserved Web sites have gained citizenship in digital platforms dedicated to cultural heritage.
MUVITUR has paved the way with this project for other entities to create collections of websites of their interest on their own platforms.
The research and education community has been requesting the bulk download of web-archived data and index files (CDXJ), for instance, to feed AI training models, optimize routing of web archive requests or recover information from selected websites (e.g. news).
Arquivo.pt begun making all its CDXJ index files publicly available in real-time to facilitate the bulk download of web-archived data. Learn how at:
Documents cite web content by referencing their URLs so that readers can later access them.
In the case of scientific articles, the importance of these citations is even greater to maintain the integrity of research works because they often reference essential information to enable the reproducibility of an experiment or analysis.
For example, links in a scientific article may cite the datasets, software or web news that supported the research, which are not included in the text of the article.
To respond to the need of preserving the integrity of documents, Arquivo.pt launched the CitationSaver.
CitationSaver automatically extracts cited links in a document and preserves their content (e.g. web pages cited in a book) so that they can be retrieved later from Arquivo.pt.
Use CitationSaver to preserve the integrity of your documents
Upload a document and CitationSaver will extract the cited URLs, archive their content and make it available on Arquivo.pt after a short notice. There are 3 methods to upload a document:
insert the address (URL) of the PDF or TXT file, if it is published online
upload the file in PDF or TXT format
paste the text containing the addresses you want to preserve (e.g. References section of an article or Bibliography of a book).
Organizations keep domains that referenced websites which are no longer used, to prevent them from being bought or because they were just forgotten.
The aim of project Renascer (Reborn) is to bring back historical websites whose content is no longer available online and whose domain continues to be held by their authors.
“Forgotten” domains can cause cybersecurity problems
In this situation, the original content of the website was inaccessible despite the fact that the domain continued to be owned by the author of the website.
Furthermore, since the domain was still pointing to an active web server, cybersecurity issues could occur if this server was not being properly maintained.
The domain owner only has to redirect it to Arquivo.pt, through the Memorial service.
For example, the mctes.pt domain started to reference back its original contents preserved by Arquivo.pt, thus making this website to be reborn.
Examples of Reborn domains
Project Renascer identified active domains managed by FCCN which were not referencing any content, and gave them a new life turning them to reference its historical contents preserved by Arquivo.pt.
Contact Arquivo.pt to reborn the historical websites of your organization.
Arquivo.pt is a free public service that allows searching and accessing Web pages preserved since the 1990’s, such as viewing an old news or accessing an old version of a website.
The collaboration between the AMCC and Arquivo.pt is materialized in a training program entitled Arquivo.pt: Digital Skills for the Media, developed in four webinars, and in the attribution of the AMCC Honorable Mention to work done on Portuguese centenary newspapers in the Arquivo.pt Award 2023.
Webinar cycle: Arquivo.pt: digital skills for media
The webinar cycle aims to equip trainees with digital skills that enable them to solve problems caused by the disappearance of digital information and gain competitive advantage in the production of unique and exclusive content.
Webinar 1: A tool for quickly searching the past
Data: Mars 24, 2023 Time: 14h00-15h30 (in Portuguese)
Until May 4th, Arquivo.pt launches the challenge of creating a work based on historical information preserved from the Web.
In this 6th edition of the Arquivo.pt Award, 15 000 euros will be granted to the three best works (1st place: 10 000 euros).
Works about any subject may be submitted, done individually or in group. The only condition is that Arquivo.pt was the main source of information.
The Público newspaper will grant an Honorable Mention for works based on the web-archived content of Público online.
The Aveiro Media Competence Center (AMCC) will also grant an Honorable Mention to one of the submitted works that focuses on the archives of the online version of century-old newspapers.