CDXJ index files are available to support bulk access

Um grupo de investigadores olham para um bastidor de servidores

Last updated on May 5th, 2023 at 01:39 pm

The research and education community has been requesting to support the bulk download of web-archived data and index files (CDXJ), for instance, to feed AI training models, optimize routing of web archive requests or recover information from selected websites (e.g. news).

Arquivo.pt begun making all its CDXJ index files publicly available in real-time to facilitate the bulk download of web-archived data. Learn how at:

Your feedback with comments or suggestions is most welcome to improve this service!

Please disseminate this information among potentially interested parties.

Tutorial: how to explore Arquivo.pt using Python

Last updated on July 17th, 2023 at 01:44 pm

The Programming Historian aims to develop digital skills among the Humanities researchers through the publication of practical lessons in several languages.

The call Computational analysis skills for large-scale humanities data originated 7 new lessons.

One of them was the tutorial “Timeline summarization for large-scale past-web events with Python: the case of Arquivo.pt” developed by Daniel Gomes and Ricardo Campos.

It shows how to explore Arquivo.pt user interface and the Application Programming Interface (API) to execute advanced queries, process large amount of data or build new services, such as Tell me stories.

All the developed resources are freely available in open-access.

Open-access resources of the tutorial “Timeline summarization for large-scale past-web events with Python: the case of Arquivo.pt”

 

 

Millions of images from the past!

thumbnail_image_search

Last updated on August 23rd, 2022 at 04:21 pm

Arquivo.pt launched a new version named Dionisius, on March 24th 2021.

1.8 billion images from the past Web are now searchable on Arquivo.pt.

Supporting large-scale image search over Web archives is a world-wide innovation.

To learn more about the development of this system, watch the video “Arquivo.pt image search 2020-2021“.

Try for example, how a search for images about “golf” returns images gathered from archived websites.

Print of an example of the image search at Arquivo.pt

Results page from a search for the term “golf” on Arquivo.pt.

The new image search API also allows you to create new works to apply for the Arquivo.pt Awards.

Help us to improve!

To help us, just try to perform an image search on Arquivo.pt using any device (e.g. laptop, mobile phone, tablet).

If you have any comment, please contact us!

Remember to always send us the URL of the page you are referring to.

To know more