Things to consider when scanning archival documents
First off let’s distinguish the difference between digital and digitised.
Digital refers to the object being “born” digital in the first instance. For example a photograph taken on a digital camera, a document created in a word processing system. A digital object is a product created originally as a series of 0’s and 1’s and then uses technological tools to convert it to human recognisable form. These are already created and stored in a digital environment from “birth”.
Digitisation is the act of taking an analog object, document or image and recreating it to be displayed in an electronic format. For example taking a paper document and digitising it via scanning which means the object can now be displayed via an electronic means. The paper is the original object and the act of scanning it has provided a secondary object now held as 0’s and 1’s.
There are a number of ways to digitise an object. Scanning is the most obvious in relation to digitising archival documents. However, taking a digital photo of an object is also “digitisation” as well. Personally, a digital photograph has many merits for particular archives such as photographic archives, fragile or over sized objects and artifacts or for ease of viewing. However, for paper documents scanning technology is the mainstream method for many digitisation purposes, especially large scale projects.
For the following we will refer to scanning archival paper documents (as that is what I generally work with) but the questions can be applied equally to anything else you wish to represent digitally. We will also work with the reason for digitisation (in my case at work) which is more often for providing better access, not destruction, of the source original. That requires some planning and forward thinking on that course of action.
When thinking about digitising archives for better access there are a number of questions to ask. These are some that I would recommend taking into consideration.
What is the purpose of scanning the document?
Is it for permanent retention, quick or easy access, back up to the original? All of these?
Knowing the intended use of the digitised object will help determine the quality and file size requirements needed. A quick reference document digitised at lowest file size and gray scale would not serve the purpose, for example, of a document selected for digitisation where the original is to be disposed after the process. Understanding whether the object for digitisation is required for the short, medium or long term will play a key role in your decision making. For short term retention you could get away with lower quality. Long term retention and/or those documents which have a reasonable risk of serving as evidential records then the recommendation is to go for best quality possible. Digitise like for like. Colour for colour, quality to match quality of the original.
What format and where will I store the digital files?
It is all well and good to digitise information but where are you going to store those files now and in what format? If you choose a proprietary scanning system will that product continue to be available and what format is the digital data saved as? Are you storing the files in a cloud environment or in-house environment? Who will manage those files to ensure they remain accessible into the future. What disaster recovery and security systems are in place for providing ongoing access to those files?
What quality and processing standards will I set for digitisation?
Will the replication be in colour, grayscale, or bi-tonal, will you use Optical Character Recognition (OCR) technology when digitising documents, what resolution (dots per inch) should be used, what type of compression should be used.
There are some incredibly useful guidelines for helping to make these decisions. Again it comes back to the purpose and lifespan required for the digitised object. For archival objects of course we will recommend the best replication possible with the tools available at that point in time and technological capability. This then has to be weighed up against cost, risk, time and data storage capability.
What metadata will I use when digitising?
Metadata I hear you say. What’s that? That’s the information held behind the digital file that tells you all the good stuff such as; when the object was created, by whom, file size, when it was last modified, last accessed, file size, file type, dpi etc. We can use metadata to determine qualities that help us determine authenticity, integrity and much more of that file. Also included in this is the naming of the file. I like to use the same archival descriptors as the source original with a DIGI as an added descriptor to differentiate. For example 2017_012_235_Minutes of the ABC Team_digi
How will you provide access to the digitised object
Will the end result remain in your internal system or did you want to put it out for a wider audience. If so, thinking about the contents of the digitised object means you need to take into account any personal privacy issues, confidential information and copyright factors. In addition, if you are storing the end product via systems you maintain or on hardware someone else owns what are the rights you have to your data should something go wrong?
If any of these points have helped you think a bit more about how you intend to look at a digitisation project then I would be keen to hear from you. As a Archivist who continues to learn and grow myself the digital framework is rapidly moving and I’m always keen to continue learning and growing as well.
Just a plug here for the National Digital Forum coming up in Wellington in November. I plan on attending as are some great workshops and talks on the digital realm in the GLAM sector. Three days not to be missed by the looks. Go ahead, have a look and sign up today – National Digital Forum
Share this Post
Photo by Samuel Zeller on Unsplash
Samuel Zeller