Get Connected

What is metadata?

This is a term that you may have heard of. Until we started with our big project to digitize select Massachusetts church records, it was a very nebulous concept. Basically, metadata is data about materials that contain data -- catalog records for books, database entries for journal articles, descriptions of photographs, that sort of thing. If you are planning a big digital project, this is a concept you will definitely need to get comfortable knowing about. 

If you have 5 minutes, I recommend watching this YouTube video that gives a nice overview and some examples.

Ok, did you go watch it? Cool. Now what does that mean for us when we do our digital projects? I'm glad you asked. 

When we go to get church records scanned, what we get back is an external drive that's filled with image files in TIF and JPG form and a PDF version of each multi-page document. But what we see when we open the hard drive menu is possibly something like: "Middleboro_v01_01".

Does that tell me anything about what that image captured? Not a clue. What we need to know is: What's on that page? Is it a baptismal record? Minutes of a committee meeting? Membership lists? What dates does the information on that page span? Is there any other major information that might help a researcher decide if he or she wants to look at that page more? There might be a reference to a church member's name, the minister or leadership's names, there may not be. Then each project will have a lot of the same metadata: when was it created, by whom, with what equipment, at what settings, the major bibliographic points for the original collection (city of origin, year span, etc.).

Sometimes this material can be generated very quickly, especially if it's relatively modern and typed. "OCR" or Optical Character Recognition can be applied in those situations and then it's a matter of proofreading the data the computer created to make sure that "m" is "m" and not "n". When we deal with hand-written documents that are hundred(s) of years old with possible water damage, page damage, ink bleed-through, paper that has darkened, and/or ink that has lightened, then you have to rely on reliable staff to guarantee that what you've written down is complete and accurate. And this doesn't even cover the notion of transcription.

This, my friends, is one of the major reasons why digital projects are so expensive. It's all about the people-power. So, when you hear that our own Robin has been in the midst of documenting one of the most complicated metadata projects we've ever had to deal with, I hope you'll feel a big pang of sympathy. Possibly if you see her, you'll ply her with chocolate. Or whiskey. Or both. 

A hat-tip to my father, Alan, for pointing me to the video.


Beacon Street Blog