As we know, you can break down your documents into categories, and even further into types within that category—like letters, diaries, and financial records are different types of inscribed sources. From here, you can begin considering the structural units of a document type. For example, letters may consist of several structural units, such as salutations, datelines, signatures, and postmarks; weather records may consist of daily, weekly, or monthly readings; and financial ledgers may consist of credits or expenses to an account.
As you break down each document type into its structural unit, you can begin considering which of those elements are important for you to represent, especially given the goals of your project. You may also begin thinking about which structural elements should be formatted as faithfully as possible (such as line breaks in poems), which elements could be standardized when represented (such as the placement of a dateline at the beginning of every letter), and which elements could be omitted altogether (such as a page breaks in a printed speech).
In this section, we’ll explore a few projects to observe how each represented the structural elements of their documents. Please open the link for each example on your own electronic device (e.g., computer, tablet, etc.).
Examples of Document Types
Directions: Click on the arrows below to explore the following examples of different document types.
Now that you have a better understanding of how to identify various document types and interpret the influence they have on the reading experience, let’s apply what you’ve learned to a practice activity.
Course Glossary
- Academic or University Press
A publisher based at or sponsored by a university.
- Access File
The access file is a derivative of the master file, produced by converting the master file to a smaller file format. Access files are suitable for presentation to researchers.
- Accessibility
The condition of source materials being physically available to users and intellectually understandable by users.
- Accession
The process of adding a new item to a collection.
- Annotation
The use of descriptive, contextual, referential, or illustrative content or structure that supports the discoverability and accessibility of source materials. Annotation may take many forms (footnotes, source notes, metadata, glossaries, essays, indexes, keywords, images, maps, and more) and multiple forms of annotation may be used by a project.
- Apparatus
The “supports” of any edition (other than the reading text itself) that are created for the purpose of providing additional clarifying information. Typically, this term is applied to textual and contextual notes, but it can also apply to introductions, headnotes, dictionaries, lists, indexes, and appendices as well as newer, innovative annotation types, such as data visualizations.
- Archive
A collection of textual and non-textual artifacts in physical and/or digital form; records created or received by a person, family, or organization and preserved because of their continuing value. See also the definition provided by the Society of American Archivists: https://dictionary.archivists.org/entry/archives.html.
- Authority File or List
A collection of “authority records” (usually in a database or in a structured data file like XML) with stable, reliable information about places, people, and other kinds of named entities.
- Authority Name
A version of a person’s name that is used every time when referencing that individual, such as in annotation or metadata.
- Back-end/Front-end
The back-end refers to the data (and/or database), site system, and structure underlying a digital project, whereas the front-end refers to the website’s style, appearance, and features (otherwise known as the user interface). Many websites rely on the communication between the back-end data and the web browser for displaying it.
- Bit Depth
The number of bits used to represent each pixel in an image.
- Born-Digital Edition
An edition conceptualized with the goal of online publication, meaning that editorial policies are made considering the digital environment.
- Cataloging
1) The act of creating and maintaining a list that describes the content, structure, and/or administration for each source material within a collection. This can be created for the benefit of document control and/or discoverability of the materials.
2) The act of making an edition discoverable within external infrastructures, such as library catalogs. - Collection
The act of seeking or identifying the location of source material for the purpose of acquisition, in some form of a print or digital copy, for research purposes.
- Combined Edition
An edition that includes both images and transcriptions. See in contrast to image-based editions and transcribed editions.
- Comprehensive Edition
With respect to a collection of source materials, a comprehensive edition publishes all or nearly all of those materials. See in contrast to a selected edition.
- Controlled Vocabularies
A consistent or standardized way of describing data. For example, one practitioner working with poems may choose to describe the creators of these source materials as "Author," whereas a practitioner working with correspondence may choose to describe the creators of those source materials as "Sender." Regardless of what these practitioners choose, both have used a controlled vocabulary by standardizing how they describe the source material's creator. Practitioners can develop their own controlled vocabulary, use an existing controlled vocabulary, or a combination of both. Examples of existing controlled vocabularies can be found at the University of North Carolina Library: https://guides.lib.unc.edu/c.php?g=8749&p=44502.
- Copyediting
Copyediting involves revising the text of your annotations and other apparatus to ensure that your work is clear and readable and that it conforms to conventional rules of grammar. See in contrast to proofreading.
- Copyright Notice
An identifier placed on copies of a work to inform the world of copyright ownership.
- Copyright Term/Duration
The length of time that copyright applies to a work before it passes into the public domain.
- Creative Commons
A free license that provides all creators—from individuals to large institutions—with a standardized way to grant permission for public use of their creative work under copyright law. Learn more at the Creative Commons website: https://creativecommons.org/.
- Critical Apparatus
A particular kind of annotation that records textual notes about the sources of the reading text and, in some cases, information about authoritative readings when multiple versions of a text exist.
- Critical Editing
A reconstructive form of editing that establishes an authoritative reading text based on a critical examination of existing witnesses—i.e., imposing change on a text through correction, emendation, or apparatus. See in contrast documentary/historical editing.
- Dashboard
The user interface for a content management system's backend.
- Derivate File
A surrogate file created from the original master file.
- Deskewing
The process of rotating an image that has been scanned crookedly.
- Desktop Publishing
The production of printed matter by means of a desktop computer and a page layout software that integrates text and graphics.
- Digital Editing
The act of using digital tools in the practice of editing source materials.
- Digital Edition
An edition published—and sometimes also prepared or edited—in a digital or online environment. A digital edition may be created instead of or in addition to a print edition.
- Digitization
The process of creating a high-quality digital copy of your source material.
- Digitization Standards
A set of guidelines that governs the digitization of material and aligns them to industry specifications.
- Diplomatic Transcription
A literal transcription of a document, where all words, including those that were added or deleted, are represented. See in contrast to normalized transcription.
- Directory
A folder containing files.
- Discoverability
The condition of source materials being findable by users, such as through browsing, filtering, or searching.
- Document
A handwritten, printed, or oral type of source material. Documents may include letters, diaries, financial records, invitations, event flyers, newspaper articles, poems, speeches, interviews and more.
- Documentary/Historical Editing
Editing documents (either private or public documents) with the goal of making them accessible and, in some cases, reproducing their content as closely as possible to their original form. This form of editing has often been distinguished from critical editing, which focuses on editing documents with the goal of establishing an authoritative text, yet many practitioners now agree that historical editing incorporates many aspects of critical editing as reflected in decisions about presentation, formatting, and annotation.
- Documentation
The process of writing down the policy decisions that you have made in order to share them with readers and ensure that you apply them consistently.
- Document Control
The application of a system for locating specific documents or groups of documents in your collection. Creating such a system involves defining what metadata needs to be collected for each document, and then consistently and accurately collecting that metadata.
- Document Organization
A structure developed for creating meaningful divisions between documents. Closely related to document control, document organization refers to the framework by which documents are organized, whereas document control refers to the practical application of that framework.
- Editing
The act of gathering, preparing, and presenting source materials in such a way as to increase their accessibility and discoverability. This work may involve a variety of activities, such as collection, selection, digitization, cataloging, versioning, transcription, annotation, and encoding, and may result in a variety of presentations or publication outputs.
- Edition
A print, digital, or hybrid publication resulting from the act of editing source materials.
- Editor
See practitioner.
- Editorial Policies
The decisions practitioners make regarding how to represent source materials in their edition. Practitioners may make decisions related to the selection of materials for publication, the form and focus of transcription, the form(s) of annotation, the elements to be captured in metadata, the processes for quality control, and more. Practitioners may choose to document any or all of these decisions for use in sharing internally and/or for informing users of their edition.
- Emendation
Changing the reading of a text to correct an inaccuracy or to reflect a judgment about an author’s intentions.
- Encoded Edition
An edition prepared using TEI-XML—a descriptive, standardized XML-based language developed and maintained by humanists. More information about the TEI and how to use it can be found at the scholarly, non-profit TEI Consortium: https://tei-c.org/.
- Encoding
The act of using a computing language, such as Markdown or XML, to represent or describe source material. See especially text encoding.
- Endnote
A note or essay that follows the presentation of a document or other source material. It is a form of annotation used for providing information about the source material and/or for creating connections to relevant resources.
- Faceted Searching
A guided search and navigation feature that lets users filter search results by selecting a range of different attributes. For example, a faceted search of place names allows you to search a list of place names mentioned in a collection of documents. See the Wikipedia entry for more information: https://en.wikipedia.org/wiki/Faceted_search.
- Fact Checking
The process of verifying the accuracy of information provided in annotation and citations.
- Fair Use
A legal doctrine in the United States that permits the unlicensed use of copyright-protected works under certain circumstances. There are four factors that guide determination on whether the unlicensed use of a copyright-protected work is permissible or fair. These four factors are outlined at the U.S. Copyright Office Fair Use Index: https://www.copyright.gov/fair-use/.
- File Size
The amount of space a file consumes on a storage medium.
- Finding Aid
A description that provides contextual and structural information about an archival resource.
- Footnote
A note attached to a specific element in an essay or source material (such as a word, sentence, section of an image or recording, etc.). It is a form of annotation used for providing information about that specific element and/or for creating connections to relevant resources.
- Gazetteer
Similar to an authority file (or list), a gazetteer is a database that combines name authority files into a stable, reliable source about places, people, and other kinds of named entities to which websites can connect. Gazetteers are often used in Linked Open Data (LOD) projects, which connect their projects to gazetteers to link them to the “semantic web.” Examples include Wikidata, Geonames, and VIAF.
- Glossary
A curated collection of resources, such as of biographies, key terms, images, or other content. It is a form of annotation used to collect any resources that may be referenced frequently and make them available in one easily-findable location.
- Headnote
A note or essay that precedes the presentation of source material or a collection of source materials. It is a form of annotation used for introducing or discussing the source material.
- Hypertext
First coined in the 1960s by Ted Nelson, a hypertext is any text shown on a computer screen that can link out to other documents.
- HyperText Markup Language (HTML)
Created by Tim Berners-Lee in the late 1980s, HTML was the first official instantiation of a hypertext data model which became the de facto language for web writing and publishing in the World Wide Web.
- Image-Based Edition
An edition that presents images or facsimiles of the source materials. May also be referred to as a facsimile edition. See in contrast to transcribed editions and combined editions.
- Image Resolution
The level of detail portrayed in an image, measured in pixels per inch (PPI) or dots per inch (DPI).
- Index(ing)
While an index refers to a list at the end of a printed book that helps you to find the location of certain references, indexing in digital editions is a means of serializing data in a digital edition so that certain semantic elements (identifiers, people, places, dates) can be processed and accessible.
- Lemma
Usually in printed editions, the lemma signals the place in the reading text to which a note is referring.
- License
An agreement to utilize or reproduce a creative work. May also be referred to as a “permission(s) agreement.”
- Linotype
Used for a typesetting machine that produces each line of type in the form of a solid metal slug.
- Markdown
The act of formatting text for HTML using a plain-text editor.
- Master File
The master file is the original file, generally produced through scanning processes that attain a high-level specification. The master file is archived for long-term preservation.
- Mediation
The process by which communications (either verbal or textual) are delivered through a material such as a book or computer.
- Metadata
Essentially, data about data. It can be used to describe the content, physical or structural features, and/or administrative elements of data. In providing such descriptions, metadata supports the management and discoverability of data. See the University of North Carolina Library's definition of metadata for more information: https://guides.lib.unc.edu/metadata/definition.
- Modified Comprehensive Edition
With respect to a collection of source materials, a modified comprehensive edition publishes all materials that fit within a defined category. For example, a practitioner creating a modified comprehensive edition might select all materials from a specific range of years, a certain format of materials (eg. letters, speeches, oral interviews), or all materials from a specific geographical area. See in contrast to a selected edition, and a comprehensive edition.
- Normalized Transcription
A transcription of a document where the substance of the content is retained, but some elements like spelling, punctuation, or contractions are changed with the intention of improving the readability of the text. See in contrast to diplomatic transcription.
- Permissions Letter
A letter seeking to obtain permissions from the copyright holder to use, reproduce, or adapt a creative work.
- Practitioner
Any individual who practices editing or recovery for the purpose of promoting the accessibility and discoverability of source materials; or any individual who engages in the discussion, development, or use of tools or methodologies relating to those practices.
- Proofreading
The act of confirming the presentation of a text, whether transcription or annotation, immediately prior to publication by reviewing and making any necessary revisions. See in contrast to copyediting.
- Provenance
Where documents or data come from, which individuals or repositories have previously owned them, and how we end up accessing them (or how they have changed, through mediation).
- Public Domain
Refers to creative works that are not protected by intellectual property laws, such as copyright, trademark, or patent laws. When a work is released into the public domain, the public, rather than an individual author or artist, owns the work as a collective entity. This means that anyone can use or adapt the work without obtaining permission, but no one can ever own it.
- Public Engagement
The act of inviting the public to substantially contribute to project work. In the practice of editing and recovery, public engagement may include involving the public in the conceptualization of the project, crowd-sourcing transcriptions or annotations, and more.
- Quality Control
The act of reviewing editorially-produced content, like transcription or annotation, for the purpose of ensuring quality or accuracy. Practitioners may use one or more of a variety of processes for the purpose of reviewing their content, including copyediting, fact-checking, proofreading, tandem reading, and more.
- Recovery
The act of focusing research activities like archiving or editing on source materials that have previously in their collection, preservation, organization, description, or presentation been silenced, dismissed, neglected, or ignored.
- Rights Statement
A statement about the intellectual property rights regarding a resource, a legal document giving official permission to use a resource, or a statement about access rights.
- Selected Edition
With respect to a collection of source materials, a selected edition publishes only a subset of those materials. The practitioner decides what subset will be prepared and published within the edition and what materials fit within that subset. See in contrast to a comprehensive edition.
- Selection
The process of deciding which source materials will be included in your publication.
- Semantic Tagging
The act of labeling texts with specific, meaningful categories for machine processing.
- Sociology of Text
The ways that the organization and material forms of a book affect our interpretations and experiences of the text. See D. F. McKenzie’s Bibliography and the Sociology of Text (1999).
- Source Material
Any handwritten, printed, oral, visual, kinetic, or physical item that a practitioner chooses to work with. Source materials may include diaries, letters, newspapers, poems, audio recordings, video recordings, novels, short stories, artwork, dances, objects, and more.
- Source Note
A note that describes a source material's provenance and/or creation. It is a form of annotation.
- Subjectivity
The quality of being based on or influenced by personal feelings, tastes, or opinions, or societal or historiographical practices and beliefs.
- Takedown Notice
A written statement allowing users to request that an item be removed (e.g., from a public website) due to a possible copyright infringement.
- Text Analysis
The application of digital tools that mine and analyze text in the pursuit of finding new meanings or connections within the text.
- Text Encoding
The act of using the Text Encoding Initiative or TEI—a set of XML guidelines that have been developed to describe humanities texts—to edit source materials through encoding. More information about TEI and how to use it can be found at the TEI Consortium, a scholarly community that maintains the guidelines: https://tei-c.org/.
- Text Technology
The core components of a textual artifact, including the material history of communications among humans and their underlying systems of publication and dissemination.
- Transcribed Edition
An edition that publishes transcriptions of documents. See in contrast to image-based editions and combined editions.
- Transcription
The act of interpreting and adapting source material to create a readable form or representation of it.
- Typeface
The particular design of letters, numbers, and symbols to be used for publication.
- Usability
The quality to which a website's design results in the presentation of clear pathways for users to navigate the site, and in the usage of features and functions that are practical and accessible.
- User Experience
How an individual (or user) interacts with a product (like websites) and how that interaction is shaped by the product's design. Sometimes referred to as UX.
- Workflow
A sequence of tasks concerning the movement of work through a stage or stages in the prepation of an edition. A practitioner may design and employ a variety of workflows to suit their needs, including cataloging or digitization workflows, a quality control or verification workflow, a publication workflow, and more.