Into the Bright Sunshine: Digitizing Hubert H. Humphrey's Speeches

That voice that could not be silenced in life--that famous, unmistakable voice, instantly recognized throughout the land--will not be silenced in death. Hubert Humphrey will go on talking down the ages, to us, to our descendents, as long as the Republic endures.
--Strom Thurmond quoted in Sheldon D. Engelmayer and Robert J. Wagman, Hubert Humphrey: The Man and His Dream 1911-1978 (New York: Methuen, 1978): rear jacket text.

Introduction

More than most, Hubert H. Humphrey was a man who articulated his political values and pressed forward his agenda through the medium of speechmaking. His speeches document the path of his career, his evolving political thought, the maturation and high-water mark of the liberal tradition in twentieth century American politics, and the trajectory of the U.S. federal government during the latter half of the century. Because Humphrey had been memorialized as a voice that could not be silenced in death and because his politics continue to reverberate through our age, we thought that digitizing his speeches would be the best possible way to preserve and amplify his voice.

Between July 2012 and September 2013, with generous support from the National Historical Records and Publications Commission (NHPRC) of the National Archives, the Minnesota Historical Society digitized all of the speech texts and a sample of sound recordings from our collection of Hubert H. Humphrey's papers. Digitizing Humphrey's speeches was, in part, an experiment in applying low-cost, minimal processing techniques to the large scale digitization of an entire archival series and, in part, an exploration of how to apply similar techniques to an analogous portion of audio recordings. The project resulted in extending digital access to more than 4,000 speech texts and over 100 sound files that represent one of the most powerful voices to shape American political and social history from the end of World War II through the end of the Vietnam War.

When we began this project, we thought that we had already developed high-volume, low-cost techniques in earlier projects that had digitized portions of the papers of James J. Hill, Louis W. Hill, Harold Stassen, and Walter F. Mondale. But we had never detemined the costs associated with our process. The experimental part of our Humphrey project did result in scaling up digitization of an entire series of papers and also helped us determine the cost of each page. The results surprised us by confirming a geater economy than we had anticipated and by proving that we had developed a very efficient workflow for effective textual products.

However, we had no previous experience with sound recordings. Dazzled by groundbreaking projects such as the Sound Directions collaboration between Harvard and Indiana Universities and the Library of Congress National Jukebox, we were cautioned that digitizing sound recordings required expensive audio equipment, professional sound engineering knowledge, and considerable conservation expertise. Boiled down, the message was "do not try this at home." Undaunted, we were determined to explore our own in-house approach on a small-scale. Our hope in digitizing a selection of Humphrey's recordings was that we would discover a sustainable model for the future. What we learned was that we could indeed digitize sound at home at cost-effective scale. Sustaining that model is another challenge.

Methodology

Speech Texts

Workflow: The workflow for digitizing the speech texts included preparing each file by removing staples, paper clips, and duplicates; scanning the files; adding metadata; performing quality control checks; and storing the digital files.

Scanning station Scanning: Imaging hardware included a Fujitsu fi-6230 sheet feed scanner that was used for paper of good quality no larger than 8 1/2" x 14". An Epson Expression 10000 XL flatbed scanner was used for paper that was larger than 8 1/2" x 14" or of a fragile nature. Each file was scanned page by page, in 8-bit grayscale at 300 ppi into a PDF file.

Quality control: Once each text file was scanned, the resulting PDF files were viewed for image clarity, orientation, and completeness.

Metadata requirements: A very minimal set of descriptive and administrative metadata was added to the XMP properties of each PDFA file following standards we had already established in earlier projects. The data elements included the title of each speech, Hubert Horatio Humphrey as the author, keywords that serve as a preferred citation to identify the collection title and the Minnesota Historical Society as holding repository, and a copyright statement about the digital manifestation.

Given the minimal nature of the metadata we added, digital processing of the speech texts also included adding OCR to each file to enable full-text searching and voice activated readers for the sight impaired. OCR was added using the application built into Adobe Acrobat Pro. Although we knew the OCR application would introduce some typographical errors and imperfections, no attempt was made to proof, edit, or revise the content of the OCR layer.

File format conversion: Once scanning, qualiy control, and metadata tasks were completed, the files were converted to the PDFA-1b preservation format using the Preflight function in Adobe Acrobat Pro v. 11.

Preservation pathways: The PDF files made during this project were only intended to serve as access copies, and not as preservation copies, of the original papers. The files are stored on our web server and are backed up on a web mirror in the Society’s enterprise technology network as well as a department network. Because the entirety of each text file was digitized, the papers are now closed to general public access. Restricting future physical handling of the original papers by favoring access to the digital copies will help preserve the original texts.

Sound Recordings

Workflow:The workflow for digitizing the sound recordings included selecting a representative sample of speeches for digitization, playing back the sound recordings in real time while converting them to digital WAV files, adding BWAV metadata, producing derivative MP3 copies and adding MP3 metadata, storing the digital files, and adding links to an EAD inventory.

Selection criteria: ADD TEXT criteria

Audio conversion (equipment, procedures, file formats): ADD TEXT

Metadata requirements: ADD TEXT Sound file metadata guidelines

Preservation pathways: ADD TEXT

Signal Chain

Sound transfer studio

Jennifer Huebscher transfers a sound recording in the project's sound studio.

Public Access and Discovery

The textual PDF files and the audio MP3 files were added as linked digital archival objects to the inventories of each respective series of Humphrey's papers. Thumbnail images that represent either the first page of each speech file or an iconic pair of headphones were included to draw audience attention to the presence of digital reproductions. For users, this embedded content mirrors the actual experience of reaching into a box to view the contents of the files or listen to the recordings.

Results

Interested in determining the unit costs associated with digitizing a page of text or a minute of sound, the time required to perform defined digitization tasks were tracked throughout this project. These tasks included only those technical activities that resulted in the production of a digital file.

Tasks associated with preparing the speech texts for scaning or EAD encoding were lumped together and included as support time. Because these tasks were often performed at the same time that digitization activities occurred, they were more difficult to measure as whole, discrete actions. Instead, only the time spent on these tasks when scanning activites were not also occurring were tracked.

For the audio filles, the time required to create the EAD inventory was not tracked. This task was not tracked because the collection was being processed at the same time that the recordings were being digitized. Additionally, the task of creating an EAD inventory or encoding digital archival objects is aided by spreadsheet templates that we know from past experience have significantly minimized our encoding time. Asked to speculate the time required to add a digital sound recording to an EAD inventory, staff reported an estimated minute or less per recording.

ADD TEXT - summarize cost findings and give alternative archival measure for per file [4,270 PDF files with an average number of 24 pages at $10.38 each] and per recording [86 BWAV and MP3 audio files with an average duration of 30 minutes at $11.52 each] costs

ADD TEXT - analysis of user pageviews of inventories

Speech Texts

Speech text digitization costs per page

Scan Time: Time in minutes required to scan all pages.

PDF/A Time: Time in minutes to proof for image clarity, orientation and completeness; to add OCR and descriptive metadata; and to perform PDF/A-1b file conversions.

Support Time: Time in minutes spent on physical processing tasks such as pulling staples and weeding duplicates and on EAD coding that was not tracked separately as discrete costs.

Total Digitization Time: Total digitization time in minutes (scan time plus PDF/A and support time).

Total Pages: The total number of pages that were digitized.

Average Time Per Page: Total digitization time divided by total pages.

Average Cost Per Page: Total digitization time multiplied by a salary of $0.3677 per minute divided by the total number of scanned pages.

Speech texts finding aid pageviews

Sound Recordings

Sound recording digitization costs per minute

Recording Duration: Total duration in minutes of all selected recordings.

QC & Metadata Time: Time in minutes to test all WAV files for audibility and completeness and to add BWAV metadata.

Derivative Time: Time in minutes to output derivative MP3 files.

Total Digitization Time: Total digitization time in minutes (recording duration plus metadata & QC time plus derivative time).

Total Minutes: Total time in minutes of all selected recordings (same as recoding duration).

Average Time Per Minute: Total digitization time divided by total minutes.

Average Cost Per Minute: Total digitization time multiplied by a salary of $0.3677 per minute divided by the total recorded minutes.

Sound recordings finding aid pageviews

Sample Digital Files

As a small sample that you can read and listen to, here are the text and audio of two of Humphrey's most memorable and powerful speeches.

Speech on Civil Rights, July 14, 1948:

text Text (PDF) headphones Audio (MP3)

Vice Presidential Acceptance Address, August 27, 1964:

text Text (PDF) headphones Audio (MP3)

All of the text and audio digitized by this project are available in inventories to each series:

Speech Text Files | VP Speech Research & Misc. Files | Sound Recordigs

Next Challenges

Transferring project expertise
Rationalizing workflows and procedures
Maintaining/upgrading studio space, equipment, and software
Solving discovery barriers to public access

Knowing what these challenges are, we know where we can make improvements and where we need to focus in the future.

Hubert H. Humphrey making his acceptance speech at the Democratic National Convention, Atlantic City, NJ.

Hubert H. Humphrey making his acceptance speech for the vice presidential nomination at the Democratic National Convention, Atlantic City, August 27, 1964.