Copy of Introducing the Structure of Different Document Types

As we know, you can break down your documents into categories, and even further into types within that category—like letters, diaries, and financial records are different types of inscribed sources. From here, you can begin considering the structural units of a document type. For example, letters may consist of several structural units, such as salutations, datelines, signatures, and postmarks; weather records may consist of daily, weekly, or monthly readings; and financial ledgers may consist of credits or expenses to an account.

As you break down each document type into its structural unit, you can begin considering which of those elements are important for you to represent, especially given the goals of your project. You may also begin thinking about which structural elements should be formatted as faithfully as possible (such as line breaks in poems), which elements could be standardized when represented (such as the placement of a dateline at the beginning of every letter), and which elements could be omitted altogether (such as a page breaks in a printed speech).

In this section, we’ll explore a few projects to observe how each represented the structural elements of their documents. Please open the link for each example on your own electronic device (e.g., computer, tablet, etc.).

Examples of Document Types

Directions: Click on the arrows below to explore the following examples of different document types.

This first example is an inscribed, or written text, but it’s not a letter. When you think of written text, the first document type that often comes to mind is a letter, and letters are indeed prevalent in much of documentary editing. However, let’s look at an example of written text that falls outside of that scope. As you explore this example, please note the utility of side-by-side transcriptions with corresponding images. 

To access the log book, please visit Champlain Club and Yacht Log, Mount Desert Island.

Try to answer the following questions as you peruse the project:

  • What kind of transcription practices do the editors follow–e.g., diplomatic transcription, line-by-line transcription? 
  • Are the editors faithful to the original punctuation, or do they use modernized punctuation?
  • How do the editors portray tables and figures?
  • Do you notice any damage to the original documents? If so, what type of damage? 
  • Do you notice any missing information or “ textual unknowns”? If so, how are they indicated in the transcription?

One final note about this example pertains to the publication choices made by the editors. The publishing apparatus is quite simple (i.e., published PDFs of the images and transcriptions) and extremely effective for presenting the log books as they were written. However, this publishing option does make specific concepts within the project difficult to search for. In other words, these PDFs are difficult to link to the semantic web. This means that readers will have difficulty conducting word searches for specific terms of concepts found within the documents. Because of these editorial decisions, this edition is an excellent example of how there is no “wrong” or “right” way to complete a documentary edition. It further demonstrates how the choices you make as an editor determine the way the content of your documents is presented. These choices, in turn, can influence the types of inquiries and searches available within the document, which can ultimately facilitate understanding and interpretation of your documents.

Sheet music presents an interesting multimodal opportunity for historical document editors. This particular musical text uniquely combines oral and textual elements that attempt to recover the history of African music in the Caribbean. The affordances of the digital medium allow readers not only to read but also to “listen” to the text as they interpret the musical notes. Be sure to view the musical performance included on the “About” page as you explore this website.

To access the sheet music, please visit Musical Passage.

Finally, as you review this piece of sheet music, consider the following questions:

  • How does the layering of the sheet music and sound recording in the user experience seem similar to the side-by-side text and transcription experience from the first example? How is it different?
  • What interactive elements exist within the digital representation of the text?

Along with sheet music, oral speeches also provide unique opportunities for historical document editors. This Freedom’s Ring edition of Martin Luther King, Jr.’s “I Have a Dream” speech is another powerful multimodal example that combines a variety of media types. More specifically, it includes an oral recording, a printed speech, and moving artistic imagery.

To access the speech, please visit Freedom’s Ring, MLK “I Have a Dream.”

As you listen to the speech, read the scrolling text, and observe the artistic presentation, consider the following questions:  

  • In this edition, documentary editing is blended with several other methods of representation and recovery (artistic imagery, moving images, etc.). How does the document stand out in this interdisciplinary display? 
  • How does the project handle statistical information and other graphical elements of the document?

The William Blake Archive is a long-standing digital documentary editing project. William Blake’s illuminated manuscripts are multimodal texts that offer unique challenges for transcribers and editors.

To access the complicated text, please visit The William Blake Archive.

First, explore the short text (8 pages) and note how the edition provides readers with multiple ways of interacting with the text and images.

Next, read this editorial note from the Blake Archive editors that explains why the Book of Thiel is so important for the project. As you read through this editorial note, consider that updates to this text have been made in 1996, 1997, 2000, 2004, and 2007. This is another example of the layers of text that can envelop a document. In other words, the foundation of this edition is Blake’s text itself, which serves as the original document for analysis. However, the edition also features the many transcriptions and overlays of technical tools (from Java to XML) that render and recover the text for modern-day users. The Blake Archive’s decades-long history illustrates the ways in which the transcriptions we create as editors serve as texts themselves. 

Now that you have a better understanding of how to identify various document types and interpret the influence they have on the reading experience, let’s apply what you’ve learned to a practice activity.

Course Glossary

AJAX progress indicator
  • A publisher based at or sponsored by a university.

  • The access file is a derivative of the master file, produced by converting the master file to a smaller file format. Access files are suitable for presentation to researchers.

  • The condition of source materials being physically available to users and intellectually understandable by users.

  • The use of descriptive, contextual, referential, or illustrative content or structure that supports the discoverability and accessibility of source materials. Annotation may take many forms (footnotes, source notes, metadata, glossaries, essays, indexes, keywords, images, maps, and more) and multiple forms of annotation may be used by a project.

  • The “supports” of any edition (other than the reading text itself) that are created for the purpose of providing additional clarifying information. Typically, this term is applied to textual and contextual notes, but it can also apply to introductions, headnotes, dictionaries, lists, indexes, and appendices as well as newer, innovative annotation types, such as data visualizations.

  • A collection of textual and non-textual artifacts in physical and/or digital form; records created or received by a person, family, or organization and preserved because of their continuing value. See also the definition provided by the Society of American Archivists:

  • A collection of “authority records” (usually in a database or in a structured data file like XML) with stable, reliable information about places, people, and other kinds of named entities.

  • A version of a person’s name that is used every time when referencing that individual, such as in annotation or metadata.

  • The back-end refers to the data (and/or database), site system, and structure underlying a digital project, whereas the front-end refers to the website’s style, appearance, and features (otherwise known as the user interface). Many websites rely on the communication between the back-end data and the web browser for displaying it.

  • The number of bits used to represent each pixel in an image.

  • An edition conceptualized with the goal of online publication, meaning that editorial policies are made considering the digital environment.

  • 1) The act of creating and maintaining a list that describes the content, structure, and/or administration for each source material within a collection. This can be created for the benefit of document control and/or discoverability of the materials.
    2) The act of making an edition discoverable within external infrastructures, such as library catalogs.

  • The act of seeking or identifying the location of source material for the purpose of acquisition, in some form of a print or digital copy, for research purposes.

  • An edition that includes both images and transcriptions. See in contrast to image-based editions and transcribed editions.

  • With respect to a collection of source materials, a comprehensive edition publishes all or nearly all of those materials. See in contrast to a selected edition.

  • A consistent or standardized way of describing data. For example, one practitioner working with poems may choose to describe the creators of these source materials as "Author," whereas a practitioner working with correspondence may choose to describe the creators of those source materials as "Sender." Regardless of what these practitioners choose, both have used a controlled vocabulary by standardizing how they describe the source material's creator. Practitioners can develop their own controlled vocabulary, use an existing controlled vocabulary, or a combination of both. Examples of existing controlled vocabularies can be found at the University of North Carolina Library:

  • Copyediting involves revising the text of your annotations and other apparatus to ensure that your work is clear and readable and that it conforms to conventional rules of grammar. See in contrast to proofreading.

  • An identifier placed on copies of a work to inform the world of copyright ownership.

  • The length of time that copyright applies to a work before it passes into the public domain.

  • A free license that provides all creators—from individuals to large institutions—with a standardized way to grant permission for public use of their creative work under copyright law. Learn more at the Creative Commons website:

  • A particular kind of annotation that records textual notes about the sources of the reading text and, in some cases, information about authoritative readings when multiple versions of a text exist.

  • A reconstructive form of editing that establishes an authoritative reading text based on a critical examination of existing witnesses—i.e., imposing change on a text through correction, emendation, or apparatus. See in contrast documentary/historical editing.

  • The user interface for a content management system's backend.

  • A surrogate file created from the original master file.

  • The process of rotating an image that has been scanned crookedly.

  • The production of printed matter by means of a desktop computer and a page layout software that integrates text and graphics.

  • The act of using digital tools in the practice of editing source materials.

  • An edition published—and sometimes also prepared or edited—in a digital or online environment. A digital edition may be created instead of or in addition to a print edition.

  • The process of creating a high-quality digital copy of your source material.

  • A set of guidelines that governs the digitization of material and aligns them to industry specifications.

  • A literal transcription of a document, where all words, including those that were added or deleted, are represented. See in contrast to normalized transcription.

  • A folder containing files.

  • The condition of source materials being findable by users, such as through browsing, filtering, or searching.

  • A handwritten, printed, or oral type of source material. Documents may include letters, diaries, financial records, invitations, event flyers, newspaper articles, poems, speeches, interviews and more.

  • Editing documents (either private or public documents) with the goal of making them accessible and, in some cases, reproducing their content as closely as possible to their original form. This form of editing has often been distinguished from critical editing, which focuses on editing documents with the goal of establishing an authoritative text, yet many practitioners now agree that historical editing incorporates many aspects of critical editing as reflected in decisions about  presentation, formatting, and annotation.

  • The process of writing down the policy decisions that you have made in order to share them with readers and ensure that you apply them consistently.

  • The application of a system for locating specific documents or groups of documents in your collection. Creating such a system involves defining what metadata needs to be collected for each document, and then consistently and accurately collecting that metadata.

  • A structure developed for creating meaningful divisions between documents. Closely related to document control, document organization refers to the framework by which documents are organized, whereas document control refers to the practical application of that framework.

  • The act of gathering, preparing, and presenting source materials in such a way as to increase their accessibility and discoverability. This work may involve a variety of activities, such as collection, selection, digitization, cataloging, versioning, transcription, annotation, and encoding, and may result in a variety of presentations or publication outputs.

  • A print, digital, or hybrid publication resulting from the act of editing source materials.

  • See practitioner.

  • The decisions practitioners make regarding how to represent source materials in their edition. Practitioners may make decisions related to the selection of materials for publication, the form and focus of transcription, the form(s) of annotation, the elements to be captured in metadata, the processes for quality control, and more. Practitioners may choose to document any or all of these decisions for use in sharing internally and/or for informing users of their edition.

  • Changing the reading of a text to correct an inaccuracy or to reflect a judgment about an author’s intentions.

  • An edition prepared using TEI-XML—a descriptive, standardized XML-based language developed and maintained by humanists. More information about the TEI and how to use it can be found at the scholarly, non-profit TEI Consortium:

  • The act of using a computing language, such as Markdown or XML, to represent or describe source material. See especially text encoding.

  • A note or essay that follows the presentation of a document or other source material. It is a form of annotation used for providing information about the source material and/or for creating connections to relevant resources.

  • A guided search and navigation feature that lets users filter search results by selecting a range of different attributes. For example, a faceted search of place names allows you to search a list of place names mentioned in a collection of documents. See the Wikipedia entry for more information:

  • The process of verifying the accuracy of information provided in annotation and citations.

  • A legal doctrine in the United States that permits the unlicensed use of copyright-protected works under certain circumstances. There are four factors that guide determination on whether the unlicensed use of a copyright-protected work is permissible or fair. These four factors are outlined at the U.S. Copyright Office Fair Use Index:

  • The amount of space a file consumes on a storage medium.

  • A note attached to a specific element in an essay or source material (such as a word, sentence, section of an image or recording, etc.). It is a form of annotation used for providing information about that specific element and/or for creating connections to relevant resources.

  • Similar to an authority file (or list), a gazetteer is a database that combines name authority files into a stable, reliable source about places, people, and other kinds of named entities to which websites can connect. Gazetteers are often used in Linked Open Data (LOD) projects, which connect their projects to gazetteers to link them to the “semantic web.” Examples include Wikidata, Geonames, and VIAF.

  • A curated collection of resources, such as of biographies, key terms, images, or other content. It is a form of annotation used to collect any resources that may be referenced frequently and make them available in one easily-findable location.

  • A note or essay that precedes the presentation of source material or a collection of source materials. It is a form of annotation used for introducing or discussing the source material.

  • First coined in the 1960s by Ted Nelson, a hypertext is any text shown on a computer screen that can link out to other documents.

  • Created by Tim Berners-Lee in the late 1980s, HTML was the first official instantiation of a hypertext data model which became the de facto language for web writing and publishing in the World Wide Web.

  • An edition that presents images or facsimiles of the source materials. May also be referred to as a facsimile edition. See in contrast to transcribed editions and combined editions.

  • The level of detail portrayed in an image, measured in pixels per inch (PPI) or dots per inch (DPI).

  • While an index refers to a list at the end of a printed book that helps you to find the location of certain references, indexing in digital editions is a means of serializing data in a digital edition so that certain semantic elements (identifiers, people, places, dates) can be processed and accessible.

  • Usually in printed editions, the lemma signals the place in the reading text to which a note is referring.

  • An agreement to utilize or reproduce a creative work. May also be referred to as a “permission(s) agreement.”

  • Used for a typesetting machine that produces each line of type in the form of a solid metal slug.

  • The act of formatting text for HTML using a plain-text editor.

  • The master file is the original file, generally produced through scanning processes that attain a high-level specification. The master file is archived for long-term preservation.

  • The process by which communications (either verbal or textual) are delivered through a material such as a book or computer.

  • Essentially, data about data. It can be used to describe the content, physical or structural features, and/or administrative elements of data. In providing such descriptions, metadata supports the management and discoverability of data. See the University of North Carolina Library's definition of metadata for more information:

  • With respect to a collection of source materials, a modified comprehensive edition publishes all materials that fit within a defined category. For example, a practitioner creating a modified comprehensive edition might select all materials from a specific range of years, a certain format of materials (eg. letters, speeches, oral interviews), or all materials from a specific geographical area. See in contrast to a selected edition, and a comprehensive edition.

  • A transcription of a document where the substance of the content is retained, but some elements like spelling, punctuation, or contractions are changed with the intention of improving the readability of the text. See in contrast to diplomatic transcription.

  • A letter seeking to obtain permissions from the copyright holder to use, reproduce, or adapt a creative work.

  • Any individual who practices editing or recovery for the purpose of promoting the accessibility and discoverability of source materials; or any individual who engages in the discussion, development, or use of tools or methodologies relating to those practices.

  • The act of confirming the presentation of a text, whether transcription or annotation, immediately prior to publication by reviewing and making any necessary revisions. See in contrast to copyediting.

  • Where documents or data come from, which individuals or repositories have previously owned them, and how we end up accessing them (or how they have changed, through mediation).

  • Refers to creative works that are not protected by intellectual property laws, such as copyright, trademark, or patent laws. When a work is released into the public domain, the public, rather than an individual author or artist, owns the work as a collective entity. This means that anyone can use or adapt the work without obtaining permission, but no one can ever own it.

  • The act of inviting the public to substantially contribute to project work. In the practice of editing and recovery, may include involving the public in the conceptualization of the project, crowd-sourcing transcriptions or annotations, and more.

  • The act of reviewing editorially-produced content, like transcription or annotation, for the purpose of ensuring quality or accuracy. Practitioners may use one or more of a variety of processes for the purpose of reviewing their content, including copyediting, fact-checking, proofreading, tandem reading, and more.

  • The act of focusing research activities like archiving or editing on source materials that have previously in their collection, preservation, organization, description, or presentation been silenced, dismissed, neglected, or ignored.

  • A statement about the intellectual property rights regarding a resource, a legal document giving official permission to use a resource, or a statement about access rights.

  • With respect to a collection of source materials, a selected edition publishes only a subset of those materials. The practitioner decides what subset will be prepared and published within the edition and what materials fit within that subset. See in contrast to a comprehensive edition.

  • The process of deciding which source materials will be included in your publication.

  • The act of labeling texts with specific, meaningful categories for machine processing.

  • The ways that the organization and material forms of a book affect our interpretations and experiences of the text. See D. F. McKenzie’s Bibliography and the Sociology of Text (1999).

  • Any handwritten, printed, oral, visual, kinetic, or physical item that a practitioner chooses to work with. Source materials may include diaries, letters, newspapers, poems, audio recordings, video recordings, novels, short stories, artwork, dances, objects, and more.

  • A note that describes a source material's provenance and/or creation. It is a form of annotation.

  • The quality of being based on or influenced by personal feelings, tastes, or opinions, or societal or historiographical practices and beliefs.

  • A written statement allowing users to request that an item be removed (e.g., from a public website) due to a possible copyright infringement.

  • The application of digital tools that mine and analyze text in the pursuit of finding new meanings or connections within the text.

  • The act of using the Text Encoding Initiative or TEI—a set of XML guidelines that have been developed to describe humanities texts—to edit source materials through encoding. More information about TEI and how to use it can be found at the TEI Consortium, a scholarly community that maintains the guidelines:

  • The core components of a textual artifact, including the material history of communications among humans and their underlying systems of publication and dissemination.

  • An edition that publishes transcriptions of documents. See in contrast to image-based editions and combined editions.

  • The act of interpreting and adapting source material to create a readable form or representation of it.

  • The particular design of letters, numbers, and symbols to be used for publication.

  • The quality to which a website's design results in the presentation of clear pathways for users to navigate the site, and in the usage of features and functions that are practical and accessible.

  • How an individual (or user) interacts with a product (like websites) and how that interaction is shaped by the product's design. Sometimes referred to as UX.

  • A sequence of tasks concerning the movement of work through a stage or stages in the prepation of an edition. A practitioner may design and employ a variety of workflows to suit their needs, including cataloging or digitization workflows, a quality control or verification workflow, a publication workflow, and more.