Tuesday, September 27, 2016

Digital Preservation File Names

Digital Preservation File Names. Chris Erickson. September 27, 2016. Updated 31 Oct. 2016.
     While processing some collections, we had difficulty creating the mets xml files because of some characters in the file names. The characters may be valid in some systems, but may cause difficulties in others. From comments on the internet it appears that there are only a few characters that are forbidden, but experiences from a number of people suggest that some systems may not support all the characters in file names. We decided that it was better to use only alpha numeric characters, and underscores as a separator, and a fullstop (period) before the extension.  When preserving digital files it is important to remember that the files may be used by a variety of computer systems over their life time. To have the greatest chance of keeping the files usable in the future it is best to follow some basic standards when naming files.

Here are some suggestions we are considering:
  1. Decide on file naming conventions so that file names have meaning.
  2. File extensions can help determine the type of file it is (such as .txt, .doc, .wav, .jpg)
  3.  File name length varies for different operating systems, so generally stay under 30 characters
  4. Avoid spaces in file names. Spaces are an acceptable character for most file names, but they can cause difficulty when processing. Underscores may be used as a separator.
  5. Avoid punctuation and special characters. The safest characters to use are numbers and letters. Most operating systems are case sensitive. Some characters to avoid for our preservation system are spaces, ampersands, brackets, and commas
  6. Keep the filenames to a reasonable length and it is best if they are under 30 characters.
  7. Don’t start or end the filename with a space, special characters, or punctuation marks.
  8. These conventions apply to folders as well as files
Characters that others have had difficulties with and which should not be used in filenames:

# pound                      < left angle bracket               $ dollar sign                      + plus sign
% percent                   > right angle bracket             ! exclamation point           ` backtick
& ampersand             * asterisk                               ‘ single quotes                   | pipe
{ left bracket              ? question mark                     “ double quotes                = equal sign
} right bracket            / forward slash                       : colon                                      
\ back slash                 blank spaces                          @ at sign

It appears that xml in general has a specific problem with ampersands and brackets in file names. Some other resources of information:


No comments: