Glossary

Jump to Content

Glossary

In: Direct Speech in Greek and Latin Epic

Editors:

Christopher W. Forstall

Christopher W. Forstall
Search for other papers by Christopher W. Forstall in
Current site
Google Scholar
PubMed

and

Berenice Verhelst

Berenice Verhelst
Search for other papers by Berenice Verhelst in
Current site
Google Scholar
PubMed

Pages:: 457–467

DOI:: https://doi.org/10.1163/9789004750227_021

Access via:

Dar Hadith al Hassania

Download PDF

Accuracy scores	the average degree of accuracy of an automated process of analysis, as measured based on verified sample data. See also, “recall and precision”
Ancient Greek and Latin Dependency Treebank (AGLDT)	first and largest dependency treebank (see also, “treebank”) for Greek and Latin Texts, including, e.g., the full text of Homer’s Iliad and Odyssey. Hosted and maintained by Perseus Digital Library. https://perseusdl.github.io/treebank_data/
Apostrophe	the address of a person not physically present, as for example when a narrator addresses a character in a poem as a form of metalepsis
Application Programming Interface (API)	interface of a computer program or database oriented not towards the human user, but towards other computer programs. It contains the basic information and definitions needed for communication between computer programs
Authorship attribution	branch of stylometry concerned with matching anonymous or pseudonymous texts to their true authors based on similarity of textual features
Bar chart	a graphical representation for categorical data, in which values are shown by the extent of vertical or horizontal bars
Betweenness centrality	quantitative measure for network graphs, representing a given node’s importance in linking other nodes. If one imagines every possible pair of nodes within the graph as connected by the shortest possible route, the number of such routes that pass through a particular node can be used to quantify this node’s centrality as a connector (or “bridge”) between parts of the graph
Box-and-whisker plot (also box plot)	a graphical representation of the spread of values in a data set. A collection of values is represented by a rectangular shape divided into two by a horizontal line (the “box”); above and below the box are vertical lines (the “whiskers”). The box represents the values from the 25th to the 75th percentile (i.e., the middle 50 % of all the data), the horizontal line dividing the box into two is the median (50th percentile). The whiskers can be defined in various ways. In the chapters of Berlincourt (chapter 13) and Verhelst and Forstall (chapter 18) in this volume, the authors use a conventional setting defining the whiskers as extending to 1.5 times the interquartile range (distance between 1st and 3rd quartile) beyond the box. Values beyond that range, the “outliers”, are represented as dots.
Bridge	in network graphs, bridges are connections between nodes that otherwise belong to different communities (see also, “network graphs”)
Canonical Text Services (CTS)	protocol for identifying and retrieving passages of literary text. Using a canonical reference system, this protocol allows for connecting databases, digital libraries, and more for automated queries (see also, “linked data”)
Chicago Homer	multilingual database and reading environment that makes the distinctive features of Early Greek epic accessible to readers with and without Greek. https://homer.library.northwestern.edu/
The Classical Language Toolkit (CLTK)	a Python library offering natural language processing (NLP) tools for the languages of pre-modern Eurasia. CLTK also hosts several text corpora, collected from various open-source digital libraries, both for Latin and Greek texts. http://cltk.org/
Classification	the process of assigning a given sample to one of a set of predetermined classes based on a set of features. For example, a sample consisting of epic language might be classified as “male” or “female”, or as “speech” or “narrator text”, according to features such as the frequency of certain words or grammatical forms. In computational studies, a “classifier” commonly refers to a machine learning model trained on pre-classified data, which can be used to predict to which class new samples belong
Cognitive Poetics	a school of literary criticism in which literature is approached through the lens of cognition (the mental processes of reading, understanding and remembering), drawing on insights from the fields of psychology and cognitive linguistics
Collective speech	speech represented as spoken by multiple speakers, often a crowd. While in some cases multiple speakers may realistically say the exact same thing simultaneously (e.g. a crowd shouting their assent or dissent), most often collective speeches represent the general sentiment of what multiple individuals within a crowd roughly say to one another (one speech representing many)
Community	in network graphs communities are defined as a subset of nodes, densely connected to each other and loosely connected to nodes in other communities in the same graph (see also, “network graphs”)
Corpus Linguistics	an empirical method for the study of language phenomena based on large text corpora which are analyzed with computational and statistical methods
Daphne treebanks	collection of Ancient Greek dependency treebanks (see also, “treebank”) curated by F. Mambrini. https://github.com/francescomambrini/Daphne
Delta measure	Also known as “Burrows’ Delta”. Stylometric tool for authorship attribution designed to measure the similarities between texts in a corpus by calculating the distance between them in a multidimensional vector space based on word frequencies
Diagnostic feature set	set of features used to predict whether an entity is likely to belong to a specific category. For example, a specific set of linguistic or lexical features may be used to predict whether a given passage is speech or narrative. See also, “classification”
Direct Speech in Greek Epic Poetry (DSGEP)	companion website and digital appendix to Verhelst’s 2017 book Direct Speech in Nonnus’ Dionysiaca, with data on all direct speech in Homer, Apollonius, Quintus and Nonnus’ Dionysiaca. The data presented in this database have now been integrated in the new DICES database. https://www.dsgep.ugent.be/
Dirichlet prior distributions	statistical method for calculating probabilities
Edge	link between nodes in a network graph (see also, “network graph”)
Eidolopoiia	rhetorical term for a speech by a deceased person
Embedded speech	direct speech embedded in direct speech. The speaker of the embedding speech acts as a secondary (or tertiary) narrator
Epithalamium	poetical and/or rhetorical genre comprising poems or speeches to be performed at a wedding
Ethopoiiai	rhetorical exercises in which the student must compose a fitting speech for a given character or character type in a given situation. Ethopoiiai were among the standard exercises or progymnasmata for young rhetoricians in an early phase of their training
F1-score	widely used metric to evaluate the accuracy of automated searches. The F1-score combines precision and recall (see also, “recall and precision”) into a single metric by giving both equal weight (harmonic mean).
Face	sociological term for the social image a person creates and maintains for themself in interaction with others
False negatives	see “recall and precision”
False positives	see “recall and precision”
Function words	words whose primary role is grammatical rather than lexical, such as articles, conjunctions, prepositions, pronouns and particles. Examples in Greek include καί and γάρ; in Latin, cum and et. Function words tend to make up a large proportion of the most frequent words in a text
General form of address	a form of address in the vocative which does not identify the addressee by name or by means of patronymics or ethnics. Instead, kinship and age terms are used (“father”), titles (“king”), terms of affection and esteem (“my dear”), insults (“dog”), or collective addresses (“friends”). Cf. “name-vocative”.
Gephi	open-source software for creating (network) graphs. https://gephi.org/
Gini importance	statistical measure evaluating the relative importance of a specific feature in a multi-feature classification experiment. It measures how much each feature contributes to reducing uncertainty (or “impurity”) in the predictions of the classifier (see also, “classification”)
Ground-truth dataset	a thoroughly verified dataset which serves as a benchmark for evaluating the results of an automated process.
Heatmap	a diagram where the color hue of each square represents the data values, e.g. to indicate the intensity of a relation between datapoints as in the chapters of Mambrini and Schirner (resp. indicating the amount of shared vocabulary and the frequency of co-occurring emotions)
Hypotaxis	syntactic subordination; especially the phenomenon of multiple clauses which are subordinated to one another in a complex nested structure. Subordinate clauses can for example be relative clauses or are introduced by subordinating conjunctions
Hypothetical speech	speech that is represented as hypothetical, potential or counterfactual, and not actually uttered by any character. Examples include speculating about an absent character’s reaction (X would have said Y) or predicting one’s own future reactions (when X happens, I will say Y). A well-known category of hypothetical speech is the so-called potential τις speech, a hypothetical reaction by an anonymous character. See also, “τις speech”
Idiolect	an individual person or character’s unique use of language
*If-not* situation	recurring narrative feature in Homeric (and later) epic, describing the hypothetical outcome of an action not taken: “then X would have happened, if had Y not intervened”
Interquartile range (IQR)	see “box-and-whisker plot”
Large language model (LLM)	computational model, trained on vast amounts of texts, designed for natural language processing tasks such as generating “new” texts. ChatGPT is a well-known general-purpose example, but LLM s can also perform more specialized tasks such as lemmatization (see also, “lemmatized text”)
LatinCy	An open language model for Latin. See “spaCy”
Lemmatized text	a lemmatized text is a text in which every word (or “token”) is linked to a corresponding lemma (dictionary headword). In Latin and Greek, for example, lemmata are often more useful than the original inflected forms for tasks such as calculating frequencies or detecting repetition
Lemma frequency	the sum frequency of all inflected forms of a given lemma (dictionary headword) within a text or passage
Lexeme	the basic unit of meaning in a language
Library of Latin Texts (LLT)	a digital library of Latin literature, hosted by Brepols Publishers, with advanced search functions and built-in analytical tools. The LLT is not open access but part of the paywalled section of the larger platform of Brepolis Databases. http://clt.brepolis.net/llta/pages/QuickSearch.aspx
Linear regression	mathematical model proposing a linear correlation between two measured variables. As one variable changes, the other is expected to change proportionally. The expected relationship between the two variables can be plotted as a straight line on a graph. The distance between this linear fit and actual individual observations represents variation not explained by the model
Linked Open Data (LOD)	machine-readable structured data designed for sharing online and linking to other datasets and released under an open license. Interoperability can be facilitated by means of non-proprietary data formats and a shared system of naming and request conventions, for example, those defined for classical text passages by the CTS protocol (see also, “Canonical Text Services”)
Logarithmic scale	method to visually represent numerical data spanning a broad range of values. Whereas on a linear scale each unit of distance corresponds to the same increment (e.g. 1, 2, 3, 4, …), on a logarithmic scale, the increment is each time multiplied by the base value (e.g. 1, 10, 100, 1000, …)
Log odds	a logarithmic representation of probability. For example, while the probability of a given word within a sample of text must fall within the range 0 to 1, the log odds ratio is scaled to the range -∞ to ∞, so that very uncommon words are represented by large negative numbers, and common words by large positive numbers. Weighted log odds applies a further adjustment to account for differences in the sizes of samples
MANTO	MANTO is an authoritative database of Greek myth, providing open access to metadata on mythological characters, places and source texts and as well as numerous types of relationships between the different entities. https://www.manto-myth.org/
Median	in calculating averages for a set of values, the median is the value representing the middle of the group: 50 % of all values are higher, 50 % are lower. The median provides a popular alternative for the mean (sum of all values divided by the number of values) and the mode (value that occurs most often)
Mertens-Pack³ (MP³)	Papyrological database. http://www.cedopalmp3.uliege.be/
Modularity	In network graphs modularity calculations are used as a measure for establishing “community” structures in larger and more complex graphs (see also, “community”). When calculating the modularity measure, the number of connections (“edges”) within a community are compared to the number of connections in an equivalent randomized network (see also, “network graph”)
Morphological tagging	the process, frequently performed by an NLP model, of providing a morphological analysis for every word in a text, so that in the resulting data the morphological features (such as case, number, tense, …) appear as annotations (tags) for each word
Name-vocatives	a form of address in the vocative, consisting of the name of the addressee. This category also includes patronymics and ethnics. Cf. “General form of address”.
Narrative level	the narrative level indicates whether a speech (or any other narrative feature) occurs at the level of the primary narrator, or embedded in language attributed to a secondary or tertiary narrator. In Homer’s Odyssey, the words of the narrator are level 0; when the narrator quotes the direct speech of Odysseus, this is level 1; when, in recounting his adventures, Odysseus himself quotes the words of the Cyclops, this is level 2
Natural Language Processing (NLP)	NLP is a subfield of computer science focusing on the computational analysis of human language. Examples of NLP include automated lemmatization and morphological tagging of Greek and Latin. NLP generally relies on machine learning to extract computational models of linguistic patterns from large text corpora. NLP tools offer possibilities for large scale automated text analysis and manipulation
N-gram	a sequence of a given number (n) of adjacent items in a particular order. For text-based computational analysis, these items can be (lemmatized) words, letters, etc. Depending on the value of n, n-grams can be used, for example, to detect longer or shorter units of repetition within a text
Network graph	graphical visualization of the relations or interconnections between a set of entities. Each entity is represented by a node. The connections between the entities are represented by edges
Node	see “network graph”
Noise	irrelevant examples that show up in an automated search, also “false positives”
The Oath in Archaic and Classical Greece Database	database recording and annotating Greek oaths until 322 BC across all genres and text types, including epigraphical evidence. https://www.nottingham.ac.uk/~brzoaths/database/
OdyCy	an open language model for Ancient Greek (see “spaCy”)
Open Greek and Latin	open-source resource featuring a large set of digital texts in Latin and Greek, reading tools and software. http://opengreekandlatin.org
Optical Character Recognition (OCR)	the automatic conversion of images of printed text (for example, scanned images of public domain books) into machine-readable text data
Oratio obliqua	indirect speech
Oratio recta	direct speech
Outlier	statistical term for a datapoint that differs significantly from the large majority of datapoints. It can be worthwhile investigating outliers because they may point at interesting, exceptional examples. On the other hand, outliers are often excluded from further statistical calculations because their exceptional features would distort the results. See also, “box-and-whisker plot”
Pars epica	a narrative part of a primarily hymnic or epideictic poem
Parsing	a parser uses an NLP model to extract linguistic features from text, typically providing information for each word (or token), such as the lemma (dictionary headword), the part of speech (POS), number, case, mood, etc
Part of speech (POS)	The intrinsic grammatical type of a word, such as noun, adjective, or verb. One of the tasks typically performed by a linguistic parser (see also, “parsing”) is annotating all tokens with POS tags
Passim	open-source software for automatically detecting repeated sequences within texts. https://github.com/dasmiq/passimParsing
Permutation importance	statistical measure evaluating the relative importance of a specific feature in a multi-feature classification experiment. It measures how much the performance of a classifier worsens when the values of a given feature are rearranged at random. See also, “classification”.
Perseus Digital Library	open-source digital library containing, among many other things, an extensive collection of Greek and Latin literary texts. https://scaife.perseus.org/
Principal Component Analysis (PCA)	statistical method used to represent and analyze multi-variable data. PCA transforms and reorders multi-dimensional data to highlight the most salient dimensions of variance. For example, a collection of samples originally characterized by hundreds of individual word frequencies can be reduced to a two-dimensional visualization while retaining as much meaningful information as possible
Progymnasmata	Set of standard writing exercises practiced in antiquity by young rhetoricians in an early phase of their training. These exercises are described extensively in several Greek rhetorical handbooks of the first centuries AD. For an example, see “ethopoiiai”
Python	general-purpose computer programming language. The DICES client, spaCy, CLTK and various other digital resources used in this volume can be controlled using Python. The use of a common language makes it easier to automate complex tasks in which several tools must be used in combination
R	general purpose programming language popular for statistics and digital humanities
Recall and precision	when evaluating the accuracy of a classification experiment based on automated analysis, recall and precision are calculated based on the number of true positives (correctly retrieved as belonging to a target group), false positives (wrongly retrieved as belonging to a target group), true negatives (correctly rejected as not belonging to a target group) and false negatives (wrongly rejected as not belonging to a target group). The recall score gives the ratio of true positives in relation to the total number of relevant items. The precision score gives the ratio of the correctly identified items in relation to the total number of retrieved items
Reported speech	depending on the theoretical framework that is being used, this term has multiple meanings. 1. Speech within speech. Throughout the volume we prefer the term embedded speech for direct reported speech (see also, “embedded speech”). The term reported speech can also refer to speech quoted by a character in oratio obliqua (indirect speech) and is used in this way in the chapter of Cesca and Romanello. Sometimes a further distinction is made between (actually) reported speech (“what a character has said”) and hypothetical speech (“what a character might say/might have said”). 2. A mere mention of a speech (see also, “speech mention”), without it being quoted directly or paraphrased. It is used in this sense in the chapters of Oughton and Burns
Responsibility Exchange Theory (RET)	social psychology theory, proposed by Shereen Chaudry and George Loewenstein, analyzing social patterns of conduct in assuming and attributing responsibility in collaborative situations
Rolling windows	A method for analyzing continuous data by aggregating overlapping samples (“windows”) of fixed size. For example, an epic poem may be divided into overlapping segments of five lines: the first sample comprises lines 1–5, the second, 2–6, and so on. This method produces a continuous rather than quantized metric and avoids accidentally overlooking features that occur at the boundaries of non-overlapping samples. This method is used in the chapters of Burns and Forstall and Verhelst.
Scatter plot	a representation comparing two variables (x-axis and y-axis) for a large amount of data points which are represented as dots
Secondary narrator	an embedded story (often in the form of direct speech) is told by a character who acts as secondary narrator and may include direct speech in his story (see also, “embedded speech”)
SpaCy	a general-purpose natural language processing package for Python, which must be used in conjunction with third-party, pre-trained language models specific to a given language and task. For Latin NLP, several chapters in this volume use the latinCy spaCy model trained by Burns et al.; for Greek NLP, the spaCy model used in this volume is odyCy by Kostkan and Kardos
Speech act	an act performed through speaking. Speech act theory distinguishes different actions performed through speech, e.g. to blame, to apologize. In this volume and the DICES database, the standard unit is not that of one speech act, but that of one direct speech (from the opening to the closing of quotation marks). One such speech can consist of multiple speech acts (shifting e.g. from a complaint to a request). Conversely, a single speech act can be represented by the narrator partly with direct speech, partly with indirect speech. For an approach to epic speech using speech act theory, see the chapters by Minchin and Beck
Speech cluster	a conversational cluster consisting of multiple speeches, most often in the form of a dialogue
Speech conclusion	language used by a narrator to conclude a direct speech and move back to a narrative mode (e.g. “This is what he said in tears. Then …”). Alternatively called “speech capping”
Speech framing language	Umbrella term for both speech conclusions and speech introductions
Speech introduction	language used by a narrator to introduce direct speech (alternatively called “attributive discourse”), usually identifying speaker and addressee, often also hinting at the intentions of the speaker
Speech mention	mere mention of a speech without its being quoted directly or paraphrased (also “reported speech”)
Speech Presentation in Homeric epic	companion website and database for the 2012 book by Deborah Beck with the same title, containing information about as well as the full text of all instances of speech representation in Homer, including indirect forms of speech representation. https://homeric-speech-beck.la.utexas.edu/
Speech type	while various classification systems for speeches co-exist and are being referred throughout the volume, “speech type” is used primarily to refer to a content-based typology of (epic) speeches such as (battle) exhortations, prayers or laments. The DICES database includes tags tentatively indicating applicable speech type classifications for all speeches in the corpus
Stacked column chart	graphical representation for categorical data. For each category, a number of different values is being compared, which are represented stacked onto each other as the building blocks of a tall column. The columns can show either relative or absolute values. In the example in the chapter of Berlincourt, a full column represents 100 % of all speeches occurring in an epic
Stylometry	the study of quantifiable aspects of style
Tesserae	web-based tool for detecting allusions in Greek and Latin literature on the basis of shared vocabulary (with similar functionality as “Passim”). Secondarily, Tesserae is also the source for one of the text corpora available through CLTK. https://tesserae.caset.buffalo.edu/
Theory of Mind (ToM)	psychological concept referring to the human ability to understand something of our own mental states, and the capacity to form intuitions about the intentions of others
Thesaurus Linguae Graecae (TLG)	digital library of Greek literature, hosted by the University of California, with advanced search functions and built-in analytical tools. Only part of the extensive TLG corpus is open access. The larger part is on the paywalled section of the site. https://stephanus.tlg.uci.edu/index.php
Tidylo	R library used to calculate weighted log odds. https://github.com/juliasilge/tidylo
τις speech	conventional term for a speech in Greek epic by an anonymous character (τις, “someone”), usually as a representative for a larger group. Further distinctions are made between real and hypothetical τις speeches. See also, “hypothetical speech”
Tokenized text	Tokenization is the process, typically performed by an NLP parser, of segmenting a text into a sequence of “tokens” for analysis. Tokens are often equivalent to words, but may also include numerals, punctuation marks, or combining forms such as the Latin enclitic ‑que, depending on the parser
Treebank	dataset consisting of syntactically annotated texts. Dependency treebanks, for example, record the dependency relations among the words of each sentence in a machine-readable, tree-like structure. While some language models can generate automatic syntactic annotations, treebanks are usually meant to be authoritative and are hand-corrected by human readers to ensure accuracy
Trismegistos (TM)	database for Greco-Roman epigraphical and papyrological materials. https://www.trismegistos.org
True negatives	see “recall and precision”
True positives	see “recall and precision”
Unseen data	data that was not part of the dataset used to build a certain metric
Violin plot	graphical visualization for comparing data distributions. The width of the curved lines corresponds with the density of datapoints in each region
Wikidata	open knowledge base storing structured data about the world. Part of the Wikimedia movement, it both supports and derives content from Wikipedia; like Wikipedia it is editable by the public. Wikidata stores sometimes extensive biographical details for authors as well as for historical and mythical epic characters, including known family relationships and alternative names. https://www.wikidata.org/
Z-score	standardized way of representing how much datapoints differ from the mean, e.g. allowing for a comparison of frequencies of word use in corpora of different sizes

Citation Info

Save
Cite
Email this content

Share link with colleague or librarian

You can email a link to this page to a colleague or librarian:
Email this content
or copy the link directly:

https://brill.edhh.ma/display/book/9789004750227/back-1.xml
The link was not copied. Your current browser may not support copying via this button.

Link copied successfully

Collapse
Expand

Direct Speech in Greek and Latin Epic

Expanding the Methods and Canon

Series: The Language of Classical Literature, Volume: 43

Cover Direct Speech in Greek and Latin Epic

E-Book ISBN:: 9789004750227

Publisher:: Brill

Print Publication Date:: 03 Dec 2025

Subjects
- Classical Studies
  - Greek & Latin Literature
- Literature and Cultural Studies
  - Literary Theory

Front Matter

Part 1 Form

Part 2 Tradition

Part 3 Style

Part 4 Emotion

Part 5 Interaction

Back Matter

Metrics

	All Time	Past 365 days	Past 30 Days
Abstract Views	0	0	0
Full Text Views	21	21	9
PDF Views & Downloads	26	26	7

African Studies	Education	Media Studies
American Studies	History	Middle East and Islamic Studies
Ancient Near East and Egypt	Human Rights and Humanitarian Law	Musicology
Art History	International Law	Philosophy
Asian Studies	International Relations	Religious Studies
Biblical Studies	Jewish Studies	Slavic and Eurasian Studies
Biology	Languages and Linguistics	Social Sciences
Book History and Cartography	Life Sciences	Theology and World Christianity
Classical Studies	Literature and Cultural Studies

Subjects

Authors

Open Access

Product Information

Company

Contact

Glossary

Citation Info

Share link with colleague or librarian

Direct Speech in Greek and Latin Epic

Expanding the Methods and Canon

Table of Contents

Metrics

Metrics