Commemoration of the 50th anniversary of April 25 – the Portuguese revolution of 1974

50anos25abril-ArquivoPT-IG-Feed-2

Arquivo.pt joined the celebrations of the 50th anniversary of April 25, the Portuguese Revolution of 1974, as part of the initiatives promoted by the Fundação para a Ciência e a Tecnologia (FCT) in partnership with the Estrutura de Missão – Comissão Comemorativa 50 anos 25 de Abril.

The initiatives were as follows: a journey through time, a special collection on the theme “Abril 25”, a presentation at the “50 years of April International Congress” and the inclusion of a special mention in the 2025 edition of the Arquivo.pt Award.

Memories of April 25 on the Internet exhibition

The exhibition Memories of April 25 on the Internet presents a selection of web pages about the celebrations of April 25 in various regions of the country, since the beginning of the web in the 1990s.

The criteria for choosing the pages for the exhibition were as follows:

  • Pages relating to the April 25 commemorations;
  • Pages found on Arquivo.pt on dates close to the anniversary each year;
  • Diversity to include different areas of the country;
  • Popular demonstrations and official ceremonies.

A historical memory without web archives is incomplete. The aim of this journey through time is to invite citizens to travel back in time, browsing through old web pages and reliving recent episodes in our life as a democracy.

See the exhibiton: arquivo.pt/50anos25abril

Special collection on April 25 – the Portuguese Revolution of 1974

To mark the anniversary, Arquivo.pt carried out a special collection on the topic of “April 25” and made the results available in an open dataset, published on the Dados.gov portal.

The dataset contains a list of keywords put into a search engine in order to obtain results on the topic of “April 25”. The search considered names of people, places, political, social and cultural aspects, as well as words associated with the event.

The searches were carried out on March 22, 2024 using the Bing Search API, an automatic search service that returns results according to the relevance criteria of the Bing service itself and others configured by us.

A total of 12,650 unique web page addresses were obtained. It is hoped that the recording of these pages will be useful for the organizations that produced this content, for researchers who want to study our history and for citizens who cultivate a sense of memory and democracy.

Participation in the 50 years of April International Congress

memorial-congresso-internacional-50anos25abril
João Gomes, Director of Advanced Services, FCCN-FCT presenting the Arquivo.pt Memorial service at the 50 years of April International Congress

On May 2, 2024, João Gomes, Director of Advanced Services at the FCCN Scientific Computing Unit of the Foundation for Science and Technology I.P., presented Arquivo.pt to the participants of the 50 years of April International Congress, as a distinctive service, open to citizens and useful for organizations.

This event, organized by the Estrutura de Missão – Comissão Comemorativa 50 anos 25 de Abril and the University of Lisbon, included a presentation of two FCT services for citizens: Arquivo.pt and NAU’s massive online open courses.

Arquivo.pt is a web preservation service available to all citizens who want to search for old content published on the web.

Using Arquivo.pt contributes to a better understanding of our history. It also provides useful services for cybersecurity, such as the Arquivo.pt Memorial, which is able to maintain institutions’ old websites, preventing attacks and saving them resources.

Special mention for “April 25 and Democracy” at the Arquivo.pt Awards 2025

The Arquivo.pt Award is held annually and honors works that use Arquivo.pt.

In 2025, as part of the celebrations for the 50th anniversary of April 25, a special mention will be made of work on the theme “April 25 and Democracy”.

We therefore challenge researchers and interested citizens to create innovative works using Arquivo.pt.

If you have any questions about the Arquivo.pt Award, please contact us.

Arquivo.pt in Paris for international event

GAWAC2024-website-banner

Last updated on August 21st, 2024 at 12:04 pm

The Arquivo.pt team took part in the Web Archiving Conference e na Assembleia Geral do International Internet Preservation Consortium (GA&WAC 2024), an event that annually brings together web archiving initiatives from around the world.

The National Library of France (BNF), in partnership with the Institut Nationale de l’Audiovisuelle (INA), hosted this meeting, which took place from April 24 to 25, 2024, in the iconic François Mitterrand building in Paris.

For three days, participants were able to share knowledge and experience on the preservation of information published on the Web.

Arquivo.pt contributed the following presentations:

  • Training the Trainers – Helping Web Archiving Professionals become Confident Trainers (Pre-Conference Workshop, Training Working Group) – Ricardo Basílio (Abstract, slides)
  • 80 Thousand Pages On Street Art : Exploring Techniques To Build Thematic Collections (Session#02: unique content) – Ricardo Basílio (Abstract, vídeo, slides)
  • Renascer Project Brings Back Old Websites at Arquivo.pt, Ricardo Basílio, Daniel Gomes  and Vasco Rato (Session#04: Delivery & Access (Abstract, vídeo, slides)
  • Arquivo.pt CitationSaver: Preserving Citations for Online Documents (Session#09: Digital Preservation) – Pedro Gomes, Daniel Gomes (Abstract, vídeo, slides)
  • Fixing Broken Links with Arquivo404 (Poster session 2) – Vasco Rato, Daniel Gomes (Abstract, slides)

Training about web archiving in Madeira island

jornadas-fccn-2024-funchal-thumb

Last updated on May 8th, 2024 at 07:31 pm

The Arquivo.pt team was in Funchal between April 15 and 19, 2024, and presented two different sessions on web preservation. The first took place during the Jornadas FCCN 2024 and the second was a workshop, after the event had ended, at the headquarters of the Regional Agency for the Development of Research, Technology and Innovation (ARDITI).

Arquivo.pt at Jornadas FCCN 2024

The session held during the Jornadas FCCN 2024 was entitled “Arquivo.pt at the service of culture” and aimed to highlight two of Arquivo.pt’s collaborations in the field of culture and knowledge, namely with Wikipedia Portugal and the Virtual Museum of Tourism (MUVITUR).

At the FCCN Zapping session, Arquivo.pt presented the Arquivo404 service, which allows websites to offer historical content instead of the negative “Page not found”.

Workshop with ARDITI

The post-Day Workshop, promoted by ARDITI, was open to regional institutions and citizens in general. It was entitled “Arquivo.pt and the preservation of Internet memory”.

The contents were structured according to the training program run by Arquivo.pt and preceded by a framework between the other services of the FCCN – FCCN – Computação Científica da FCT.

Just as important as the content was the dialog that was established during the sessions between the participants and the Arquivo.pt team to clarify doubts or ask questions.

Web preservation is increasingly important for organizations that want to preserve part of their institutional memory and develop security policies.

ARDITI gave an important signal about preserving the web memory of Madeiran institutions by hosting and promoting the Arquivo.pt training sessions.

If you want to promote the preservation of web content in your organization, check out the Arquivo.pt training and contact us.

More about

Artificial Intelligence processes data from Arquivo.pt

Artificial Intelligence AI

Last updated on July 16th, 2024 at 08:33 am

Artificial Intelligence (AI), covers various areas of knowledge, such as linguistics and computing, and is present in the new technologies used by citizens on a daily basis.

For example, when we search for information on the Internet and the computer generates an amazingly accurate response, in a language very close to our own.

Natural Language Processing (NLP) is what allows machines to perfect the algorithm that generates these answers tailored to Internet users.

The problem is that natural language processing models have been developed more for the English language and less for Portuguese and other languages with less representation.

The more the processing models are trained on a language, the better they will be able to interpret the complexities of the language. But this is only possible if quality data is available.

Portuguese text collection on Arquivo.pt available for research

Arquivo.pt appears here as the largest Portuguese-language textual dataset in Portugal, available in open access, for researchers to train NLP models.

In recent years, researchers from various research groups and projects have drawn attention to the usefulness of preserved web data for large-scale processing.

Arquivo.pt has more than 1 Petabyte of preserved web content dating back to the 1990s, including everything that can be found on web pages. It’s not just text, but also images, audio files, video, page code and various metadata.

The content is accessible via the search interface and the Arquivo.pt APIs.

In order to make it easier to download archived resources from the web, Arquivo.pt has created indexes for researchers in CDXJ format.

GlórIA, a model for the Portuguese language

One of the projects that used Arquivo.pt to obtain large amounts of text is called GlórIA and is a large-scale language model (LLM) focused on the European Portuguese language.

“Despite the abundance of LLMs for many high-resource languages, the availability of such models remains limited for European Portuguese” as the authors of GlórIA project, Ricardo Lopes, João Magalhães, David Semedo, researchers at the NOVA School of Science and Technology, explain in their article GlórIA – A Generative and Open Large Language Model for Portuguese.

The model used 35 billion tokens, or expressions that machines can process, from various sources.

Arquivo.pt contributed a collection of 1.4M European Portuguese archived news and periodicals.

You can try generating text in European Portuguese using the GlórIA API inference on the Hugging Face Model card.

If you want to develop a project or study using Arquivo.pt, you can start your research and, if you need help, contact us.

Know more

Arquivo.pt in the top 3 of government services in Portugal

portugal-digital-awards-2023

Last updated on August 6th, 2024 at 05:32 pm

Arquivo.pt, Portugal’s national web preservation service, has earned a prominent position by being named one of the top 3 government services in the 2023 Portugal Digital Awards. This recognition is testimony to the crucial role played by Arquivo.pt in the preservation and accessibility of Portugal’s digital heritage.

The three finalists in the category Best Government Project (best digital transformation project in the public administration sector) were Arquivo.pt, the Porto Digital Association and Banco de Portugal, which received the winning award.

Mission and recognition

Arquivo.pt, developed by the FCCN – National Scientific Computing , stands out as an innovative initiative in the field of digital preservation. Its mission is to collect and archive web content, allowing users to access past versions of web pages, documents and other online resources.

portugal-digital-awards-2023

The recognition at the Portugal Digital Awards highlights not only the importance of digital preservation, but also the effectiveness and relevance of Arquivo.pt as a government service. By providing a journey through time via the Internet, this resource becomes a valuable tool for researchers, academics and the general public.

Commitment to digital preservation

Participation in the award underlines Arquivo.pt’s commitment to improving the historical record of the evolution of the Web in Portugal. This service not only contributes to the country’s digital memory, but also facilitates research, promoting understanding of digital evolution over time.

In addition, Arquivo.pt’s distinction reflects FCCN’s ongoing effort to develop and improve innovative services that benefit society. Digital preservation is a crucial component in ensuring that Portugal’s digital heritage is passed on to future generations, and Arquivo.pt fulfills this role in a unique way.

In conclusion, recognition in the Portugal Digital Awards 2023, a competition that received over 300 candidate services, solidifies Arquivo.pt’s position as one of the leading government services at the forefront of digital preservation. This achievement highlights the growing importance of digital preservation in the digital age in which we live.

Know more

Arquivo.pt reaches 1 PetaByte of preserved information!

The collection of 1 PetaByte of content predominantly in Portuguese, accessible to both researchers and ordinary citizens, is a milestone that deserves to be celebrated, in the month of its 16th anniversary.

At Arquivo.pt you can search for information published on the Web in the past, such as:

Discover more pages through the selected pages in the Arquivo.pt Online Exhibitions.

The first European page
News from The New York Times in 2008
European Film Awards 2014

Purpose and mission of the Portuguese Web Archive

Arquivo.pt was created on November 8, 2007 with the aim of preserving content from the Portuguese Web.

In 2013, as a service operated by the Fundação para a Ciência e a Tecnologia (FCT), its mission was formulated as follows: “To promote the preservation of content available on the national Internet, ensuring that it is made available to the scientific community and the general public” (Decreto-Lei no. 55/2013).

In recent years, Arquivo.pt has created new services, such as CitationSaver, which allows researchers to record references to web content in their scientific articles, Memorial and Complete page, which facilitate access to content scattered throughout the huge 1 PetaByte block of data.

Where did so much information come from?

In order to reach the 1 PetaByte volume, Arquivo.pt periodically recorded content from websites in the .PT domain and from Portuguese websites in other domains.

In addition, frequent daily and monthly collections were made from a small number of government sites and the main news sites in Portugal.

As part of international collaborations, content was collected from sites in various languages, for example on the 2019 European Elections.

Content prior to 2008 came from the Internet Archive and donations, such as a collection made by the National Library and INESC on the 2005 Legislative Elections.

The largest Portuguese-language dataset available to researchers

By making 1 PetaByte of information available, in open access and through the use of APIs (Application Programming Interfaces), Arquivo.pt is a useful tool for research.

For example, a researcher who wants to do a study on elections in Portugal can use the entire Arquivo.pt collection. Better still, they can focus on just a few special collections dedicated to the elections, choosing the ones that interest them and downloading just a few Terabytes to process automatically with the APIs.

Contributions from the various teams and friends of Arquivo.pt

The development of Arquivo.pt is more than a technological issue and has been due to the dedication and persistence of the various teams that have worked on it since 2007.

It was also due to the contribution of many friends of Arquivo.pt, who were always on hand to help improve, and to the response of the user community.

Congratulations to all! Thank you.

Arquivo404 more powerful!

Last updated on August 9th, 2024 at 12:59 pm

Arquivo.pt has been launching innovative complementary services useful for organizations to optimize their functioning.

The new release of Arquivo.pt named Helios was launched on November 13, 2023 and includes developments in Arquivo404 and CitationSaver.

Arquivo404 with new methods for defining time intervals

Arquivo404 is a service that presents website users with links to web-archived versions, instead of laconic “Page not found” error messages.

However, sometimes it is necessary to specify the correct version of a web-archived page to be displayed. For example, a website’s domain may have belonged to another entity in the past, and only web-archived versions since the website came under its current owners should to be displayed.

For this purpose, 3 new methods for configuring Arquivo404 were released :

  • setMinimumDate( minDate : Date ) – specifies the earliest date of the web-archived version of the URL that can be displayed.
  • setMaximumDate( maxDate : Date ) – specifies the latest date of the web-archived version of the URL that can be displayed.
  • setMostRelevantMemento( criterion : ‘oldest’ | ‘most-recent’ ) – specifies the order of results for the versions retrieved from the web archive. By default, the oldest version is displayed ( ‘oldest’ ).

In short, Arquivo404 now allows you to define whether to display the oldest or most recent web-archived page to the users, within a certain time interval.

CitationSaver processes HTML documents

CitationSaver is a service that extracts citations in documents to online resources and archives them. This service is particularly useful for maintaining the integrity of scientific articles and the reproducibility of the experiments and studies described in them.

Many open-access articles are published in hypertext format (HMTL). CitationSaver now processes documents in HTML format, in addition to PDF and TXT formats.

For example, if a user finds an article on the Web  which contains citations to online resources, he/she simply needs to submit the URL of the article into CitationSaver. The URLs cited in the article will be extracted and their content will be web-archived for later access.

Example of an article from the Journal of Integrated Coastal Management, available on SciELO

Know more

Give us feedback about our services and if you detect any issue, please contact us.

World Digital Preservation Day dedicated to Justice

Last updated on November 13th, 2023 at 08:59 am

The Instituto de Gestão Financeira e Equipamentos da Justiça (IGFEJ) and Secretaria Geral do Ministério da Justiça (SGMJ), in collaboration with BAD, organized the event “Digital Preservation in Justice” to mark World Digital Preservation Day on November 2, 2023.

The event, which took place in the auditorium of the Polícia Judiciária in Lisbon, was attended by representatives from the government’s justice department and professionals from the archives, communications and IT departments.

How to use Arquivo.pt to preserve institutional websites

Arquivo.pt took part in the presentation “Preserve your website”, which addressed the issue of preserving institutional websites and critical aspects such as cybersecurity.

Justice entities can benefit from Arquivo.pt and its various services to ensure good preservation of their websites, mitigate cybersecurity threats and provide historical content to citizens.

The presentation concluded with the following recommendations:

  • Inventory and publicize your current and historical websites
  • Use Arquivo.pt services collaboratively
  • Save content in a standardized format with ArchiveWeb.page

Resources

Prepare a work for the Arquivo.pt Award 2024!

1080x556-EN-arquivopt-award

Last updated on August 6th, 2024 at 05:15 pm

1080x556-EN-arquivopt-award

Until May 6, 2024, Arquivo.pt is launching the challenge of creating a work based on  historical information preserved from the Web.

In this 7th edition of the Arquivo.pt Award, €15,000 will be awarded to the 3 best works (€10,000 for 1st place), plus 3 honorable mentions.

Know more at: arquivo.pt/award

Honorable mentions for authors and professors

To promote the use of the Arquivo.pt in the context of teaching, research or professional usage, three partners institutions promoted honorable mentions with an associated prize.

  • The Público newspaper will award an Honorable Mention to works based on the Público online content preserved by Arquivo.pt.
  • The Aveiro Media Competence Center (AMCC) will award an Honorable Mention to the best work on the web archive of one or more Portuguese online media.
  • Association DNS.PT will award an Honorable Mention to a professor or teacher who has encouraged the submission of works.

Share and spread the word!

Help us spreading the word about the Arquivo.pt Award 2024 among potential candidates!

University of Lisbon preserved over 100 historical websites in the Arquivo.pt Memorial

thumb-memorial-fcul

Last updated on March 27th, 2024 at 11:17 am

More than 100 historical websites from the Faculty of Sciences of the University of Lisbon (FCUL) are now accessible through the Memorial service of Arquivo.pt.

FCUL’s IT Department sent to Arquivo.pt a list of old websites hosted on its servers that were no longer updated, but whose historical content continues to be interesting to the community (e.g. websites of research projects or scientific events).

Arquivo.pt preserved these websites in collaboration with their ownersa, seeking to maintain a faithful representation of the published content for the future.

FCUL redirected the domain of each website to Arquivo.pt, and then, became able to disconnect the respective servers and  begin sparing the resources spent on their maintenance (e.g. electricity, data center space, human resources).

The show case of MiNEMA

print-memorial-example-minema-project

Landing page of www.minema.di.fc.ul.pt at Memorial do Arquivo.pt.

The MiNEMA scientific program website was the first that FCUL integrated into the Memorial. This website stopped being updated in 2009 when the project ended. FCUL invested resources in maintaining the website for another 10 years until it became necessary to suspend it down for cybersecurity reasons.

The Memorial of Arquivo.pt emerged as an option and since 2020, FCUL just needs to maintain the domain www.minema.di.fc.ul.pt while Arquivo.pt preserveS the information contained on the website.

Please note that the website’s content continues to be displayed in search engine results.

Follow FCUL and preserve your historical websites in the Memorial!

An increasing number of institutions are recurring to the Memorial of Arquivo.pt to safely preserve the content of their historical websites. For example, FCUL preserved 116 websites, the Government IT Network Management Center preserved 23 and the Foundation for Science and Technology preserved 40.

Public institutions have priority to benefit from this service. However, other entities can also request it as long as they own the website domain.

Identify your historical websites candidate to be integrated into the Memorial of Arquivo.pt and contact us!

To know more