Um grupo de investigadores olham para um bastidor de servidores

CDXJ index files are available to support bulk access

Last updated on August 22nd, 2024 at 10:48 am

The research and education community has been requesting the bulk download of web-archived data and index files (CDXJ), for instance, to feed AI training models, optimize routing of web archive requests or recover information from selected websites (e.g. news).

Arquivo.pt begun making all its CDXJ index files publicly available in real-time to facilitate the bulk download of web-archived data. Learn how at:

Your feedback with comments or suggestions is most welcome to improve this service!

Please disseminate this information among potentially interested parties.