Um grupo de investigadores olham para um bastidor de servidores

CDXJ index files are available to support bulk access

Last updated on May 5th, 2023 at 01:39 pm

The research and education community has been requesting to support the bulk download of web-archived data and index files (CDXJ), for instance, to feed AI training models, optimize routing of web archive requests or recover information from selected websites (e.g. news).

Arquivo.pt begun making all its CDXJ index files publicly available in real-time to facilitate the bulk download of web-archived data. Learn how at:

Your feedback with comments or suggestions is most welcome to improve this service!

Please disseminate this information among potentially interested parties.