May 5, 2025

Digitization Preserves the Past

by Melissa Davis, Director of Library and Archives

Doctor Forrest Pogue and a team of researchers worked on the four-volume George C. Marshall biography for more than twenty-four years; the first volume, Education of a General, was published in 1963 and the last volume, Statesman, in 1987. Research required traveling to the archive holding a resource and spending days or weeks poring over materials, so research projects were limited to those with financial backing or who were lucky enough to live near the needed archival collection. If an archival resource remains in hard-copy form, research is still performed in this manner.


Why Digitization Matters

So how can we make the research process more accessible and practical at the George C. Marshall Foundation library? By scanning the extensive George C. Marshall Papers and uploading them to the library catalog. The process of digitization will make each of the approximately 165,000 documents, comprising about 250,000 pages, searchable and immediately available to anyone with an Internet connection from anywhere in the world.

This project is gigantic, and the scanning of each page is the easy part. Backstage Library Works of Pennsylvania transports the boxes containing the Marshall Papers documents to their facility in Bethlehem where staff scan each document using professionally accepted standards and state-of-the-art equipment. The boxes are then returned to the Marshall Foundation, and the real work begins!


What Metadata Makes Possible

For a document to be searchable within the library catalog, it needs to be part of a catalog record containing information specific to that document. This information is called “metadata,” and is defined by the National Information Standards Organization as “structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource.”

The metadata for each document in the Marshall Papers includes the individual scan number, correspondent names, date, location, summary of the document, and subject headings—search terms that allow the user to find the document. Subject headings come from a controlled vocabulary; a list of selected terms used for all documents connected with the topic. For the Marshall Papers project, we opted not to use the more traditional Library of Congress Subject Headings (LCSH), as their complexity can be an unnecessary barrier to searching. For example, the LCSH term for the 1944 invasion of France is “World War, 1939–1945 —D-Day, 1944 (Normandy invasion).” The controlled vocabulary we are creat- ing uses the natural-language terms “D-Day,” “Operation Overlord,” “Operation Neptune,” “Cross-Channel Invasion,” and “Normandy Invasion,” all of which connect to the documents associated with the invasion. The names, location, and subject headings all link to other items on the same topic, which are of great assistance to the searcher.

Thousands of documents from the Marshall Foundation archive, boxed up and ready to go.


From Scan to Search: Bringing it All Together

The first thirty-nine boxes of Marshall Papers that have been scanned so far cover the years 1938–1945, leading up to and during World War II, when General Marshall served as the Army Chief of Staff. The documents are primarily memoranda, reports, speeches, and correspondence. Many of the documents lack the attributes and context for catalog records, as parties involved in the communications were deliberately vague and used code words to protect secret information.

Several World War II historians have been hired to analyze each document to ensure that information specific to the document is collected. These historians—for this project called metadata technicians—work remotely, accessing the scanned documents through cloud storage. Their extensive knowledge of the war is necessary to fill in missing context, decipher code words, and create complete information for the catalog records. Some documents are a single page and take a few minutes of work; other documents are much longer and may take several hours to complete.

Hollinger boxes are stored in larger banker boxes for the journey to Pennsylvania.
At times, even the expertise of the historians is not enough to construct complete records. One issue is common last names— there were seventeen U.S. generals named Smith during the war. Nicknames frequently appear in documents; Marshall was known as “Flicker” to his childhood friends from Uniontown, and Admiral Harold Stark was addressed as “Betty,” his Naval Academy nickname. Nearly all women mentioned in the papers were called by their husbands’ names: Mrs. George C. Marshall rather than Katherine T. Marshall, so reparative cataloging restores the womens’ first names.

We call these missing pieces “Not Founds,” and when they crop up, the technicians try to find them. Unsolved queries are shared with me at the library to research and find the necesssary information. The historians and I have solved more than 20,000 “Not Founds” in the past year working on the World War II section of the project, and our effort makes the documents much more user friendly.

When the metadata is complete for the 1938–1945 section of the Marshall Papers, the scanned documents and metadata information collected will be sent to Soutron Global, the manager of the Marshall Foundation library catalog. The information will be connected to the documents by the individual scan number, and a record will be created and uploaded into the library catalog. The library record will include a searchable PDF and image. This type of catalog record is a best practice, as all the information about the document is viewable in the record.

The project of scanning, creating metadata, and uploading to the library catalog each document in the George C. Marshall Papers will take about six or seven years but is well worth the effort. Library catalog users, whether 8thgraders working on History Day projects, PhD students from overseas, researchers, or the curious who just want to learn more about George Marshall will all benefit from the accessibility of each of the Marshall Papers.