Accessibility statement

Managing your digital files

We are often asked for recommendations as to which file formats are best-suited to long-term preservation and access. While the Borthwick Institute for Archives places no restrictions on the types of files that we collect, it is important to consider how your choices today will affect whether or not your files will be accessible and understandable in the future. In particular, the file format that you choose can have a significant impact on how easily your files can be shared and used. The following guidance is intended to help you identify file formats suitable for long-term preservation. Generally speaking, these file formats are open-source, widely-used, and are either uncompressed or use lossless compression.

Please note: This information is intended to provide some guidance as potential donors create and manage their files. We do not ask that donors migrate all files to conform with these recommendations before transfer to the archives. Instead, staff at the Borthwick will work with you to make decisions regarding file formats.

Why are open-source file formats better for long-term preservation and access?

Every file format has a set of specifications detailing how it is encoded and how it is interpreted by the appropriate software. In cases where file formats are owned and managed commercially, these specifications may not be openly available to the public. As a result, continued access to proprietary file formats depends on ongoing support from the organisation that develops and manages them. When support for a format ends due to changing priorities at the organisation or because the organisation goes out of business, lack of access to the format specifications increase the risk of that format becoming obsolete. In contrast, organisations or communities committed to maintaining open-source formats make specifications and other related documentation openly available to users. What’s more, these formats are often compatible with multiple software platforms.

The Borthwick does not recommend use of proprietary file formats, as these formats may become difficult or impossible to access in the future. Examples of proprietary file formats include Photoshop's .PSD files and Paint Shop Pro's .PSP files (for still images) and RealMedia's .RM files (for audio files).

How does compression affect long-term preservation and access?

In order to maximise storage space, some file formats use compression algorithms to identify data that can be removed from the file at negligible cost to the end-user and thereby reduce the overall file size. Files are compressed in one of two ways, using either lossless or lossy compression. Lossless compression is reversible, as it removes data without discarding it. This allows for reduced file size without compromising the quality of the file. In contrast, lossy compression removes and discards data every time the file is saved. This means that the quality of a file using lossy compression is affected with each new copy or change.

How should I choose file formats?

The file format that you choose will depend largely on what is most suitable for the use and display of your content. In general:

  • Do choose file formats that are widely used and well-documented.
  • Do choose file formats that are widely supported, meaning that they are compatible with multiple software platforms rather than being dependent on a single piece of software developed by a single company. 
  • Wherever possible, do use file formats that are either uncompressed or use lossless compression, so as not to compromise the quality of the file. With that being said, if your files are already stored using lossy file formats (JPEG, MP3, and MP4 for example), there is no need to migrate them to uncompressed or lossless file formats.
  • Don’t embed files within other files unless absolutely necessary:
    • It is better to store groups of individual files than a single file containing embedded files. Be sure to give your files concise, meaningful names and store them together in a single folder. Write a brief description of the folder’s contents and save this as a TXT file. This description should explain the relationship between the files in order that they can be interpreted and used in the future. Name this file README.txt.

What files formats do you recommend?

The file formats that you choose will depend largely on what is most suitable for the use and display of your content. The following recommendations are intended to help guide your decision-making.

 

Office documents and text-based files

PDF/A: Portable Document Format (Archival)

PDF: Portable Document Format

DOCX: MS Word Open XML Document (created in MS Office 2007 onwards)

PPTX: MS PowerPoint Open XML Document (created in MS Office 2007 onwards)

XLSX: MS Excel Open XML Document (created in MS Office 2007 onwards)

ODT: OpenDocument Text Document (created in OpenOffice)

ODS: OpenDocument Spreadsheet (created in OpenOffice)

ODP: OpenDocument Presentation (created in OpenOffice)

TXT: Plain Text File (ANSI or UTF-8 encoded)

RTF: Rich Text Format File

XML: Extensible Markup Language Data File

CSV: Comma Separated Values File

TSV: Tab Separated Values File

Raster (or bit-map) image files

TIFF: Tagged Image Format File

JPEG/JFIF: Joint Photographic Experts Group JPEG Interchange Format File (lossy compression)

JPEG 2000: Joint Photographic Experts Group (lossless compression)

GIF: Graphic Interchange Format

PNG: Portable Network Graphic

Vector image files

SVG: Scalable Vector Graphics File

Audio files

WAV: Waveform Audio File Format

FLAC: Free Lossless Audio Codec File

AIFF: Audio Interchange File Format

MP3: Moving Picture Experts Group Layer 3 compression

Video files

AVI: Audio Video Interleave File (uncompressed)

MOV: Quicktime Movie (uncompressed)

MXF: Material Exchange Format

MP4: Moving Picture Experts Group (with H.264 encoding)

How should I name and organise my files?

Organise your files into meaningfully named folders with logical relationships.

Using unique, concise, and descriptive file and folder names will help ensure that you and other users are able to find the information that you need when you need it. Choose a file-naming system that is easy for you to use and manage, and which provides a short but meaningful description of the file or folder’s contents. Once you have chosen a file-naming system, be consistent in its application.

Keep file names short: Long file names can cause problems for computers. Keep files names under 25 characters. Where possible, abbreviate, truncate, and use acronyms, but only if those abbreviations and acronyms will be understood by other users.

Avoid certain special characters: Certain special characters (such as ^ ~ \ / : * < > | ! # % & £ $ , . ‘) are often used by computers to indicate specific commands, which may result in problems if they appear in your file names. The exceptions to this rule are hyphens ( - ) and underscores ( _ ), which can both be used in file names.

Do not put spaces in file names: Spaces in file names can also cause problems. Instead use hyphens (“file-name”), underscores (“file_name”) or camel-casing (“fileName”).

Dates at the beginning of a file name will enable chronological sorting: This provides a quick and easy way to sort your files by date if appropriate. The recommended format for dates is YYYY-MM-DD (for example, 2nd February, 2023 would be represented as 2023-02-02).

Where relevant, use file names for version control: Where you are creating multiple versions of a file, use the file name to help quickly keep track of those those versions. For example:

  • policy_draft.docx, policy_final.docx
  • policy_v1.docx, policy_v2.docx, policy_v3.docx