For the past several months we have been working on a number of digital preservation projects, which include:
- Reworking the ContentDM to Rosetta ingest interface. Orginally it was just for images or simple objects. It has been expanded to include also the compound objects particularly those with page level metadata, page by page transcriptions, and such.
- Improving our unstructured data ingest process. It uses a spreadsheet template for metadate related to files to be ingested into Rosetta. The content creator can enter the metadata or we have a file discovery tool that can traverse a directory structure and enter file and folder metadata into the spreadsheet template. The collection I am just finishing with this tool totaled about 45,000 tiff images.
- Restructuring our digital ingest workflows from project based into a digital pipeline. We now have a shared drive between Rosetta and our content creators, more storage disk space, and this makes it easier to transfer files at the end of a project, or they can transfer files as they go if it is a long project.
- Using all this to keep up with all new projects being created and adding them to Rosetta, which allows more time to ingest the backlog of projects waiting for preservation. The usual rate of ingest now, depending on preparations of the collections is usually a couple of TBs each week.