by Dr. Ashley Vance
Upon completion, the George C. Marshall Foundation digitization project will be an invaluable resource to researchers of all ages and backgrounds. This project will offer something that few other archives do—document-level searchability. Most archives offer researchers a box-level description. At best, a folder-level title list will provide a glimpse of what’s inside. The complexity and depth of Marshall’s papers make box-level generalizations unhelpful and folder titles too vague.
Why Document-Level Search Transforms Archival Research
For example, Box 80 contains correspondence with President Franklin D. Roosevelt from 1939 to 1943. (The years 1944–1945 are contained in a separate box.) A general description would mention memos, reports, and correspondence with Roosevelt. The documents are organized chronologically, so a folder title list would only provide timeframes. Yet, Marshall and Roosevelt’s communications were incredibly complex. As the war progressed, the number of daily messages increased, and the topics varied wildly. The October 1942 folder includes topics like combat in Africa, relations with Charles De Gaulle, army regulations, German Army operations, planning for the invasion of France, Japanese strategy, American prisoners of war in the Philippines, press releases, Combined Chiefs of Staff conferences, and social invitations. And that is not an exhaustive list. No box- or folder-level description could accurately capture this complexity.
As a military historian and metadata specialist for this project, I have witnessed both the challenges and potential that a document-level digitization project can offer. Unironically, the greatest challenge to process the files will be the greatest asset for future researchers. Every staff member working on this project is a trained historian in their own right, which allows for a much richer set of subject headings and file descriptions—the keys to searchability. Our experience and background knowledge allows us to create subject headings and descriptions that an untrained, outsourced contractor could not offer. These skills become especially important when key terms are not directly listed on a document or when they are listed on one document but not the next, despite it discussing the same topic.
How Trained Historians Create Smarter Metadata
For instance, if we are reviewing an army status report from North Africa in November 1942, we know to add Operation Torch to the subject heading list even if it is not listed. Some researchers will only search for Operation Torch and not include North Africa as a keyword. Others will do the opposite. With- out both subject headings, a researcher may never come across the report when searching the online catalog.
Rethinking Subject Headings for Better Discoverability
The gold standard of metadata categories is the Library of Congress subject headings. Unfortunately, these headings are not intuitive to many researchers. I have personally experienced the frustration caused by the standardized headings. On many instances, I searched a digitized collection for a file I knew existed but never located it because I did not format the keyword search correctly or because my search terms were not included in the provided headings. This problem becomes increasingly complex when dealing with the decades before, during, and after World War II.
To highlight this issue, I located Michael Green’s 1995 monograph Patton’s Tank Drive:
D-Day to Victory on the online catalogs for both the Library of Congress and the Marshall Foundation’s research library. Both catalogs list the same three subject headings for the 160page monograph and neither provide a book description.
Patton, George S. (George Smith), 1885–1945.
World War, 1939–1945—Tank Warfare.
World War, 1939–1945—United States.
The simple addition of a book summary would make the monograph more searchable because, fundamentally, keywords searches scour a website for matching terms. Depending on the coding, some catalog search engines will only pull from the subject headings and others will search the entire site. When the Marshall Foundation’s new digitized collection launches, the search engine will catch both the subject headings and the file descriptions.
Using only the book description listed on goodreads.com, a new subject list for Patton’s Tank Drive could potentially include the 29 headings below. To be sure, the dramatically longer list may be visually cumbersome to some researchers. However, the inclusion of the units, operation names, locations, and planning categories will ensure that researchers will come across the monograph, regardless as to how they search for it:
Example subject headings
- Patton, George S.
- Patton, G. S.
- Third Army
- 3rd Army
- Army G-2, Intelligence
- Office of Strategic Services
- O.S.S.
- Cross-Channel Invasion
- D-Day
- Normandy Breakout
- Normandy Invasion
- Operation Cobra
- Operation Neptune
- Operation overlord
- Tank
- European theater of Operations
- E. T. O.
- German Army
- Wehrmacht
- World War II
- World War II – European Theater
- World War II – Liberation
- World War II – Planning – Strategy
- World War II – Planning -Tactics
- France
- Paris, France,
- Rhine River
- Rhineland, Germany
- Rhine Province, Germany
Given the physical location of Marshall’s papers in Lexington, Virginia, many researchers may not know the collection exists or may not have the ability to visit the quaint town that hosts a veritable treasure trove of historical documents. Historians of all types will benefit from the document-level descriptions and subheadings being manually applied to each file. Whether the person visiting the future online collection is an academic, a veteran, a history buff, or a high school student, the digitization of Marshall’s papers will provide an invaluable open-access resource for generations to come.