Arquivo.pt in Paris for international event

GAWAC2024-website-banner

Last updated on August 21st, 2024 at 12:04 pm

The Arquivo.pt team took part in the Web Archiving Conference e na Assembleia Geral do International Internet Preservation Consortium (GA&WAC 2024), an event that annually brings together web archiving initiatives from around the world.

The National Library of France (BNF), in partnership with the Institut Nationale de l’Audiovisuelle (INA), hosted this meeting, which took place from April 24 to 25, 2024, in the iconic François Mitterrand building in Paris.

For three days, participants were able to share knowledge and experience on the preservation of information published on the Web.

Arquivo.pt contributed the following presentations:

  • Training the Trainers – Helping Web Archiving Professionals become Confident Trainers (Pre-Conference Workshop, Training Working Group) – Ricardo Basílio (Abstract, slides)
  • 80 Thousand Pages On Street Art : Exploring Techniques To Build Thematic Collections (Session#02: unique content) – Ricardo Basílio (Abstract, vídeo, slides)
  • Renascer Project Brings Back Old Websites at Arquivo.pt, Ricardo Basílio, Daniel Gomes  and Vasco Rato (Session#04: Delivery & Access (Abstract, vídeo, slides)
  • Arquivo.pt CitationSaver: Preserving Citations for Online Documents (Session#09: Digital Preservation) – Pedro Gomes, Daniel Gomes (Abstract, vídeo, slides)
  • Fixing Broken Links with Arquivo404 (Poster session 2) – Vasco Rato, Daniel Gomes (Abstract, slides)

Training about web archiving in Madeira island

jornadas-fccn-2024-funchal-thumb

Last updated on May 8th, 2024 at 07:31 pm

The Arquivo.pt team was in Funchal between April 15 and 19, 2024, and presented two different sessions on web preservation. The first took place during the Jornadas FCCN 2024 and the second was a workshop, after the event had ended, at the headquarters of the Regional Agency for the Development of Research, Technology and Innovation (ARDITI).

Arquivo.pt at Jornadas FCCN 2024

The session held during the Jornadas FCCN 2024 was entitled “Arquivo.pt at the service of culture” and aimed to highlight two of Arquivo.pt’s collaborations in the field of culture and knowledge, namely with Wikipedia Portugal and the Virtual Museum of Tourism (MUVITUR).

At the FCCN Zapping session, Arquivo.pt presented the Arquivo404 service, which allows websites to offer historical content instead of the negative “Page not found”.

Workshop with ARDITI

The post-Day Workshop, promoted by ARDITI, was open to regional institutions and citizens in general. It was entitled “Arquivo.pt and the preservation of Internet memory”.

The contents were structured according to the training program run by Arquivo.pt and preceded by a framework between the other services of the FCCN – FCCN – Computação Científica da FCT.

Just as important as the content was the dialog that was established during the sessions between the participants and the Arquivo.pt team to clarify doubts or ask questions.

Web preservation is increasingly important for organizations that want to preserve part of their institutional memory and develop security policies.

ARDITI gave an important signal about preserving the web memory of Madeiran institutions by hosting and promoting the Arquivo.pt training sessions.

If you want to promote the preservation of web content in your organization, check out the Arquivo.pt training and contact us.

More about

Artificial Intelligence processes data from Arquivo.pt

Artificial Intelligence AI

Last updated on July 16th, 2024 at 08:33 am

Artificial Intelligence (AI), covers various areas of knowledge, such as linguistics and computing, and is present in the new technologies used by citizens on a daily basis.

For example, when we search for information on the Internet and the computer generates an amazingly accurate response, in a language very close to our own.

Natural Language Processing (NLP) is what allows machines to perfect the algorithm that generates these answers tailored to Internet users.

The problem is that natural language processing models have been developed more for the English language and less for Portuguese and other languages with less representation.

The more the processing models are trained on a language, the better they will be able to interpret the complexities of the language. But this is only possible if quality data is available.

Portuguese text collection on Arquivo.pt available for research

Arquivo.pt appears here as the largest Portuguese-language textual dataset in Portugal, available in open access, for researchers to train NLP models.

In recent years, researchers from various research groups and projects have drawn attention to the usefulness of preserved web data for large-scale processing.

Arquivo.pt has more than 1 Petabyte of preserved web content dating back to the 1990s, including everything that can be found on web pages. It’s not just text, but also images, audio files, video, page code and various metadata.

The content is accessible via the search interface and the Arquivo.pt APIs.

In order to make it easier to download archived resources from the web, Arquivo.pt has created indexes for researchers in CDXJ format.

GlórIA, a model for the Portuguese language

One of the projects that used Arquivo.pt to obtain large amounts of text is called GlórIA and is a large-scale language model (LLM) focused on the European Portuguese language.

“Despite the abundance of LLMs for many high-resource languages, the availability of such models remains limited for European Portuguese” as the authors of GlórIA project, Ricardo Lopes, João Magalhães, David Semedo, researchers at the NOVA School of Science and Technology, explain in their article GlórIA – A Generative and Open Large Language Model for Portuguese.

The model used 35 billion tokens, or expressions that machines can process, from various sources.

Arquivo.pt contributed a collection of 1.4M European Portuguese archived news and periodicals.

You can try generating text in European Portuguese using the GlórIA API inference on the Hugging Face Model card.

If you want to develop a project or study using Arquivo.pt, you can start your research and, if you need help, contact us.

Know more

Arquivo.pt in the top 3 of government services in Portugal

portugal-digital-awards-2023

Last updated on August 6th, 2024 at 05:32 pm

Arquivo.pt, Portugal’s national web preservation service, has earned a prominent position by being named one of the top 3 government services in the 2023 Portugal Digital Awards. This recognition is testimony to the crucial role played by Arquivo.pt in the preservation and accessibility of Portugal’s digital heritage.

The three finalists in the category Best Government Project (best digital transformation project in the public administration sector) were Arquivo.pt, the Porto Digital Association and Banco de Portugal, which received the winning award.

Mission and recognition

Arquivo.pt, developed by the FCCN – National Scientific Computing , stands out as an innovative initiative in the field of digital preservation. Its mission is to collect and archive web content, allowing users to access past versions of web pages, documents and other online resources.

portugal-digital-awards-2023

The recognition at the Portugal Digital Awards highlights not only the importance of digital preservation, but also the effectiveness and relevance of Arquivo.pt as a government service. By providing a journey through time via the Internet, this resource becomes a valuable tool for researchers, academics and the general public.

Commitment to digital preservation

Participation in the award underlines Arquivo.pt’s commitment to improving the historical record of the evolution of the Web in Portugal. This service not only contributes to the country’s digital memory, but also facilitates research, promoting understanding of digital evolution over time.

In addition, Arquivo.pt’s distinction reflects FCCN’s ongoing effort to develop and improve innovative services that benefit society. Digital preservation is a crucial component in ensuring that Portugal’s digital heritage is passed on to future generations, and Arquivo.pt fulfills this role in a unique way.

In conclusion, recognition in the Portugal Digital Awards 2023, a competition that received over 300 candidate services, solidifies Arquivo.pt’s position as one of the leading government services at the forefront of digital preservation. This achievement highlights the growing importance of digital preservation in the digital age in which we live.

Know more