Crawling and archiving Web content

How often do you collect the Portuguese Web and how long does it take? Do you collect the whole Portuguese Web? Which media types do you archive? What about the dynamically generated pages? Do you archive restrict access data? What is the Arquivo.pt crawler? How does it work? Have I been archived? What is the … Continue reading Crawling and archiving Web content