Texts published using textual formats

Last updated on August 1st, 2017 at 01:54 pm

To enable that an archived content can be found within a web archive, it is fundamental that texts are published using textual formats.

Web archives process the text contained in contents to make them searchable. However, it is frequently impossible to extract texts from contents in a non-textual format, such as images, executable programs or videos.

  • Publish texts using HTML because it is the mostly used and better supported textual formats on the Web.

How to detect is a content is in a textual format

The following simple test detects if a text within a content was published using an adequate format:

  1. Select the text on the content;
  2. Edit -> Copy on the browser;
  3. Edit -> Paste on a text editor, such as Microsoft Word.

If you cannot performed with success one of these steps, then probably the text was published using an inadequate format.