Computational valency lexica and Homeric formularity

Barbara McGillivray; Martina Astrid Rodda

doi:10.1163/15699846-02402003

Jump to Content

Share link with colleague or librarian

You can email a link to this page to a colleague or librarian:

Email this content

or copy the link directly:

https://brill.edhh.ma/view/journals/jgl/24/2/article-p264_5.xml

Link copied successfully

Stay informed about this journal!

Share link with colleague or librarian

You can email a link to this page to a colleague or librarian:

Email this content

or copy the link directly:

https://brill.edhh.ma/view/journals/jgl/24/2/article-p264_5.xml

Link copied successfully

Stay informed about this journal!

Computational valency lexica and Homeric formularity

In: Journal of Greek Linguistics

Authors:

Barbara McGillivray

Barbara McGillivray King’s College London Department of Digital Humanities London UK

Search for other papers by Barbara McGillivray in
Current site
Google Scholar

https://orcid.org/0000-0003-3426-8200

and

Martina Astrid Rodda

Martina Astrid Rodda Merton College Oxford UK

Search for other papers by Martina Astrid Rodda in
Current site
Google Scholar

https://orcid.org/0000-0002-5214-8037

Online Publication Date:: 14 Nov 2024

Abstract

The wider availability of large-scale datasets and reproducible algorithms has boosted the application of NLP to living languages. On the other hand, dead languages benefit from the availability of curated resources both to offset the sparseness of available data and to make data accessible to researchers. We present here AGVaLex, a computational valency lexicon automatically extracted from the Ancient Greek Dependency Treebank. It contains quantitative corpus-driven morphological, syntactic and lexical information about verbs and their direct and indirect arguments and has a wide range of applications for the study of Ancient Greek. To illustrate these applications, we offer a case study that compares the semantic flexibility of transitive verb formulae in archaic Greek epic to a non-formulaic corpus, with the goal of detecting unique patterns of variation. We also illustrate the possibilities afforded by AGVaLex to scholars with a less extensive background in computational corpus-based research.

Abstract

Keywords: Ancient Greek; verb valency; computational lexicon; formula studies; semantic flexibility; Distributional Semantics

1 Verbal valency and valency lexicons

The concept of verbal valency has received much attention in different linguistic traditions (cf., e.g., the overview in Zanchi 2018). Introduced by Lucien Tesnière in the context of Dependency Grammar in 1959, the term valency refers to the extent to which verbs determine the configuration of a consistent and predictable number of participants, which Tesnière referred to as actants. Actants are contrasted with circumstants, which are free modifiers of verbs. Actants are commonly termed arguments, a term that encompasses semantic and syntactic roles. The English ditransitive verb give, for example, requires three arguments: one expressing the person giving, one expressing the object given, and one expressing the recipient. If we know that an English sentence contains an active form of the verb give, we can expect to find its three arguments realized in the sentence, as we can see in (1) or (2). The transitive verb print, on the other hand, only requires two arguments (the person or object printing, and the object being printed), so we can expect to see two arguments if a sentence contains the verb print in its active form as in (3). An adjunct like yesterday can occur with most verbs and its presence cannot be expected based only on the presence of a verb like give or print (1 and 2).

(1) I gave you the phone yesterday.

(2) He gave the receipts to the customers.

(3) They printed the paper yesterday.

Different linguistic subfields have referred to concepts related to valency with different terms, each of which has a slightly different scope. While ‘valency’ was first used in linguistics in the context of Dependency Grammar by Tesnière, subcategorization was introduced with phrase structure grammars in the generative linguistics tradition, and it has been widely adopted in Natural Language Processing research. Following McGillivray (2014: 31 ff.), we adopt here an operational definition of valency, based on corpus and distributional methods, and take a theory-agnostic view on this topic. Further, we describe AGVaLex, a corpus-driven valency lexicon for ancient Greek, and illustrate its value for historical linguistics scholarship through a case study on Homeric formulae. The lexicon was created automatically from the dependency syntax annotation of the Ancient Greek Dependency Treebank 2.0 ( https://github.com/PerseusDL/treebank_data , Celano, 2019) by adapting existing database queries written for Latin and described in McGillivray (2014: 31–60) and McGillivray & Vatri (2015).

Our focus on ancient languages, and particularly on ancient Greek, a ‘large-corpus language’ (Mayrhofer 1980; Untermann 1983), offers us the opportunity to test the effectiveness of corpus methods on a language for which no native speakers are available. This has received an increasing level of attention in recent years, in conjunction with the development and analysis of large-scale annotated corpora for corpus languages (see for example McGillivray 2014 for an overview on Latin).

The lexicon has a number of advantages and reuse potential. Thanks to its automatic creation procedure, the lexicon can be regenerated if new annotated data become available or if the annotation is corrected, which enhances its potential applications in future research. Moreover, unlike traditional dictionaries and handmade valency lexicons, computational valency lexicons like the one we present here provide a quantitative and systematic account of the valency properties of verbs reflecting the corpus they are extracted from. They can tell us whether a verb, for example, is found with a particular argument pattern, and how many times this occurs in the corpus. They can also give us information about the distribution of authors, genres and works of these patterns, and whether there is a change over time. The lexicon provides information about the number and type of arguments of all verbs occurring in the treebank. Its entries are equipped with morpho-syntactic information, namely the case of nouns and the mood of verbs, the gender and number of nouns and adjectives, and the voice of verbs. This information is highly valuable to investigate a range of linguistic questions, from the analysis of word order patterns to the study of the constructions that occur with specific verbs. The lexicon’s valency patterns also display the lemmas of the arguments, which allows for lexical-semantics studies, for example to investigate the semantic fields of the subjects or objects of verbs and how they vary by author or work. At the same time, because of the automatic extracted procedure it was built with, the lexicon is bound to inherit any annotation errors that were present in the treebank.

1.1 Applications to Ancient Greek

Valency lexicons are an extremely useful tool in linguistic research on verbal valency patterns. For Ancient Greek, an important recent contribution on the topic is Keersmaekers (2020), who analyses language variation in a corpus of Greek papyri, focusing on complementation and the role of tense, aspect, and modality in verbal complements. This study, however, also showcases how much corpus pre-processing is necessary for this kind of scholarship; AGVaLex will offer an important tool for researchers wishing to conduct research on similar topics without collating their own corpus.

To illustrate this application, we propose a case study taken from Rodda (2021), on linguistic variation in archaic Greek epic poetry. The language of early Greek epic relies extensively on formulae, repeated constructions with limited syntactic and semantic flexibility; generations of researchers have investigated the precise extent of this flexibility and how it relates to issues of oral performance and language change (Rodda 2021 provides an extended bibliography; see particularly Hainsworth 1968 for an important example of this approach, and Friedrich 2019 for some criticism). Our study shows how the application of a well-developed pre-existing resource such as AGVaLex allows for new approaches to this crucial question in Homeric studies.

2 Previous work

Dictionaries typically display some information about verbal valency in their lexical entries. This is usually in the form of the grammatical case of arguments and the prepositions introducing the arguments themselves. For example, the dictionary entry for the verb αἱρέω ‘to take’ in the Brill Dictionary of Ancient Greek (Montanari 2015; from here on GE, i.e. Greek-English) reports that the verb occurs ‘with acc. […], with two acc. […], τινά τινος [i.e., with the accusative and genitive] […], with inf. […], with ptc.’, providing examples and translations for each construction. This information is normally followed by references and examples from texts illustrating the constructions. The number of examples shown is not proportional to the frequency of the constructions, and in many cases more uncommon constructions are given disproportionally more space in the entry. This is confirmed by the introduction to the Thesaurus Linguae Latinae (Bayerische Akademie der Wissenschaften 2002, hereafter TLL),^¹ for instance, and is common practice in other lexicographic resources.

Over time, dedicated valency lexicons have been created for specific languages. For example, Happ (1976) presents the only hand-made valency lexicon for Latin and was derived from a manual analysis of 800 verbal occurrences in Cicero’s Orationes. Such resources offer high-quality information derived from a detailed manual analysis and are therefore very reliable. However, they suffer from the lack of completeness which we observed earlier, and which affects other handmade resources like traditional dictionaries.

Several large textual resources of Ancient Greek are available today, including full-text databases such as TLG (Thesaurus Linguae Graecae), and the Perseus Digital Library (Bamman & Crane 2011), manually annotated corpora such as the Ancient Greek Dependency Treebank (AGDT 2.0), PROIEL (Pragmatic Resources of Old Indo-European Languages, Haug & Jøndal 2008),^² and automatically annotated corpora such as the Diorisis Ancient Greek Corpus (Vatri & McGillivray 2018). A number of syntactically annotated corpora is a subset of this list and includes PROIEL and the AGDT 2.0. AGDT 2.0 follows the Dependency Grammar annotation model of the Prague Dependency Treebank for Czech (PDT 3.0; Hajič et al. 1999), which was created in the tradition of the Functional Generative Description (Sgall et al. 1986). AGDT 2.0 follows a predicate-centric approach where each word corresponds to a node in the treebank. The texts are annotated in separate yet interconnected layers. The analytical layer, focusing on syntax, comprises dependency syntactic trees, is built on the morphological layer, and serves as the foundation for the tectogrammatical layer, where semantic information such as semantic role labeling, information structure, and annotations for anaphora/ellipsis resolution is included.

The increased availability of such large syntactically annotated corpora has made it possible to conduct large-scale analyses of historical languages (Haug 2015; Eckhoff et al. 2018; Biagetti et al. 2021), and in particular to develop methods for extracting valency information automatically, thus supporting the creation of corpus-driven computational resources aiming at a systematic account of the valency behaviour of the verbs in the corpora. Typically, such resources are drawn from corpora provided with morpho-syntactic annotation. As the annotation marks predicates and their arguments, it is then possible to automatically identify them and extract them in the form of a table or database. For an overview of computational valency lexicons and a discussion of valency vs subcategorization for Latin, see McGillivray (2014: 31 ff.), and for a description of such a lexicon for Latin see McGillivray et al. (2009), McGillivray & Passarotti (2012), and Passarotti et al. (2016). Computational valency lexicons have several advantages over their manual counterparts. Because they directly rely on corpus data, they can easily show quantitative information such as the frequencies of each pattern for each verb, and link those back to the original corpus occurrences. They can also be easily expanded as the corpora they are based on grow, because they have been created programmatically.

Only one corpus-based valency lexicon is currently available for Ancient Greek, as far as we are aware: HoDeL, the Homeric Dependency Lexicon (Zanchi et al. 2018; Zanchi 2021), a project run at the University of Pavia.^³ HoDeL was automatically extracted from the syntactically annotated portion of the AGDT containing the Homeric poems. As explained in its guidelines,^⁴ for every verb in the Homeric poems, HoDeL includes its arguments, i.e. those dependents that are tagged as subjects, objects, object complements, and predicate nominals. In the guidelines, the authors point out that AGDT does not contain referential null arguments and that there are several consistency issues that affect the annotation of the treebank. Therefore, HoDeL has been manually edited to correct some annotation errors in the corpus, particularly on lemmatisation.

The online tool Myria ( https://relicta.org/pedalion/myria/ ), developed and maintained by Toon Van Hal and Alek Keersmaekers, is described as ‘a treebank-based vocabulary tool’. It is part of the Pedalion project (Keersmaekers et al. 2019), based at KU Leuven, which aims to improve the automatic syntactic analysis of Ancient Greek. Myria can display information about ca. 6000 Greek words occurring at least 50 times in a corpus of literary texts (8th century BC to 1st century AD) from the Perseus and First One Thousand Years of Greek projects. For each of these words, ‘collostructions’ are displayed: while the concept of collostruction is established in Construction Grammar as a framework exploring the interface between lexis and grammar (see e.g. Stefanowitch 2013), it is not clear whether Myria uses the same definition.^⁵ The information in Myria is self-avowedly incomplete and being continuously updated.

3 The lexicon

AGVaLex was created from the Ancient Greek Dependency Treebank (AGDT 2.0; Celano 2019). AGDT 2.0 contains 557,922 tokens from the works listed in Table 1. AGVaLex is licensed under a Creative Commons Attribution-ShareAlike 3.0 United States License and is available on Figshare (McGillivray 2021).

Table 1

List of authors and works included in the AGDT 2.0 treebank and in AGVaLex

Author	Title
Aeschylus	Agamemnon, Eumenides, Libation Bearers, Persians, Prometheus Bound, Seven Against Thebes, Suppliant Women
Aesop	Aesop’s Fables 1.1–1.50
Athenaeus of Naucratis	Deipnosophistae
Diodorus Siculus	Bibliotheca Historica
Herodotus	Histories
Hesiod	Shield of Heracles, Theogony, Works and Days
Homer	Iliad, Odyssey
Lysias	Against Alcibiades 1 and 2, Against Pancleon, On the Murder of Eratosthenes
Plato	Euthyphro
Plutarch	Alcibiades, Lycurgus
Polybius	Histories
Pseudo Apollodorus	Bibliotheca
Pseudo Homer	Hymn to Demeter
Sophocles	Ajax, Antigone, Electra, Oedipus Tyrannus, Trachinae
Thucydides	History of the Peloponnesian War

The xml files of the so-called analytical layer of annotation of the treebank contain dependency-based syntactic trees. The treebank files were first converted into a tab-separated format via a Perl script, and then imported into a MySQL database; a series of MySQL query scripts then produced several database tables making up the lexicon. The scripts were adapted from the work done to create the Latin Dependency Treebank valency lexicon described in McGillivray (2014: 31–60). Specifically, we extracted all dependents of verbal forms labelled as ‘SBJ’ (subjects), ‘OCOMP’ (object complements), ‘PNOM’ (predicate nominals) and ‘OBJ’, which includes all other arguments, i.e. nouns and pronouns in the accusative, dative and genitive cases, prepositional phrases, infinitive verbs, and subordinate clauses that can function as verbal objects such as accusative + infinitive constructions. We excluded dependents labeled as ADV (adverbials), ATR (modifiers), and ATV/ATVV (non-governed complements, such as predicative noun phrases), as these are not considered part of the verbal valency. The lexicon contains both dependents which are direct children and indirect children of verbal forms via preposition (AUXP), conjunction (AUXC), coordination (COORD) and apposing (APOS) nodes. The specific handling of recursive relations such as those between predicates and their indirect children made the extraction of the frames non-trivial. It is important to note that, because it was extracted from an annotated corpus and because of the annotation of AGDT 2.0, the lexicon does not include referential null arguments, i.e. those arguments that are identifiable in the context but not lexically realised, and which were employed in ancient Indo-European languages (Luraghi 2003, Keydana & Luraghi 2012, Haug 2012, Sausa & Zanchi 2015).^⁶ This means that a valency frame with zero arguments in principle could indicate either an instance of an impersonal verb or an instance of an intransitive verb with a null subject.

Figure 1

A selection of six entries from AGVaLex

Citation: Journal of Greek Linguistics 24, 2 (2024) ; 10.1163/15699846-02402003

Figure 1 displays six entries from AGVaLex. Each entry (or database record) corresponds to a verbal token occurrence in the AGDT and each column corresponds to each of eight different attributes of the token, which we can categorize into three main groups:

Metadata: the columns ‘author’, ‘title’, ‘subdoc’, and ‘sentence_id’ contain, respectively, the name of the author, the title of the work, the passage where the verb token occurs, and the identifier of the sentence in the treebank.
Verb token attributes: the columns ‘verb’ and ‘voice’ display the verb’s lemma and voice, respectively.
Argument patterns: the columns ‘frame’ and ‘frame_fillers’ contain the valency information, as explained in more detail below.

Let us consider the first entry in Figure 1. This entry corresponds to sentence 2901046 of the treebank, from Persians by Aeschylus, lines 703–706:

ἀλλ᾽ ἐπεὶ δέος παλαιὸν σοὶ φρενῶν ἀνθίσταται,

τῶν ἐμῶν λέκτρων γεραιὰ ξύννομ᾽ εὐγενὲς γύναι,

κλαυμάτων λήξασα τῶνδε καὶ γόων σαφές τί μοι

λέξον·

‘Since dread long ingrained in your mind restrains you, cease, noble woman, venerable partner of my bed, from your tears and laments, speak to me with all frankness.’
Smyth 1926

The annotation of the first part of this sentence in the treebank is shown below. Each token is indicated by the XML tag ⟨word⟩, whose attributes are id (identifier of the token in the sentence), cid (another identifier of the token), form (the form of the token in the sentence), lemma (the lemma of the token), postag (the part-of-speech tag of the token), head (the syntactic head of the token), relation (the syntactic relation that holds between the token and its head), and cite (the text passage). Each attribute is followed by its value between double quotes. For example, “1” is the value of the attribute “id” in the first line.

⟨word id="1" form="ἀλλ̓" lemma="ἀλλά" postag=“d--------” head="25" relation=“AuxY” cite=“urn:cts:greekLit:tlg0085.tlg002:703”/⟩

⟨word id="2" form="ἐπεὶ" lemma="ἐπεί" postag=“c--------” head="25" relation=“AuxC” cite=“urn:cts:greekLit:tlg0085.tlg002:703”/⟩

⟨word id="3" form="δέος" lemma="δέος" postag=“n-s---nn-” head="7" relation="SBJ" cite=“urn:cts:greekLit:tlg0085.tlg002:703”/⟩

⟨word id="4" form="παλαιὸν" lemma="παλαιός" postag=“a-s---nn-” head="3" relation="ATR" cite=“urn:cts:greekLit:tlg0085.tlg002:703”/⟩

⟨word id="5" form="σοὶ" lemma="σύ" postag=“p-s----d-” head="7" relation="OBJ" cite=“urn:cts:greekLit:tlg0085.tlg002:703”/⟩

⟨word id="6" form="φρενῶν" lemma="φρήν" postag=“n-p---fg-” head="3" relation="ATR" cite=“urn:cts:greekLit:tlg0085.tlg002:703”/⟩

⟨word id="7" form="ἀνθίσταται" lemma="ἀνθίστημι" postag=“v3spie---” head="2" relation="ADV" cite=“urn:cts:greekLit:tlg0085.tlg002:703”/⟩

⟨word id="8" form="," lemma="," postag=“u--------” head="2" relation=“AuxX” cite=""/⟩

According to the treebank annotation guidelines (Celano 2014), which follow the foundations of Dependency Grammar and display some difference with concepts in traditional grammars, the PRED (“predicate”) function is assigned to the verb of the main clause in a sentence, while any other verb receives a different label indicating its function in relation to its parent node, subjects are tagged as ‘SBJ’, other verb arguments are tagged as ‘OBJ’, predicate nominals are tagged as ‘PNOM’, and object complements are ‘OCOMP’. All these elements can depend on a coordination node (tagged as ‘COORD’) and therefore take the suffix ‘_CO’, or an apposition node (tagged as ‘APOS’) and then the suffix ‘_AP’.

More specifically, the label ‘SBJ’ is used to mark the syntactic subject of a clause. These include typically a noun or pronoun in the nominative case, but also participles, infinitives, and substantive clauses. Moreover, AGDT distinguishes between OBJ (direct object) and other arguments. Direct objects typically include nouns, but also infinitive clauses and substantive clauses, for example. AGDT uses OCOMP for predicative complements that complete the meaning of the object. These can include nouns or adjectives that further describe the direct object and participles when used predicatively with certain verbs such as verbs of perception. Finally, the label ‘PNOM’ is used to annotate predicate nominatives, i.e. predicate nouns or adjectives in copular constructions. For a full explanation of the annotation, see Celano (2014). In the example sentence, the verb form ἀνθίσταται takes the subject δέος (tagged as ‘SBJ’) and the object σοὶ (tagged as ‘OBJ’), as indicated by the ‘head’ attribute which links each of these two nodes to the verbal form (tagged with the identifier ‘5’).

The AGDT has 548,782 word tokens (i.e. individual instances of words) of which 95,841 have been tagged with the part of speech ‘verb’ or ‘participle’; these correspond to 36,964 verb types (i.e. distinct verbs). AGVaLex was extracted from this treebank and contains 71,868 entries, one for each of the verb tokens occurring with at least one argument in this corpus. Table 2 displays some basic statistics of the lexicon.

Table 2

Basic statistics of AGVaLex

Entity	Count
Verb tokens (lexical entries)	71,868
Unique verb lemmas	5077
Unique frames	4116
Unique frames with lexical fillers	43624

The treebank contains texts of 15 authors and 31 works. Table 3 shows the number of lexical entries for each author.

Table 4 shows the 20 most frequent frames in the lexicon. The most frequent frame is the pattern ‘active_OBJ[accusative]’ which corresponds to constructions with accusative direct objects. Note that subjects in ancient Greek by default are not expressed if topical, so this frame includes those cases in which the predicate is, for example, inflected in the first person singular and the subject is not expressed lexically.

Table 3

Number of AGVaLex’s entries for each of the authors. Each entry corresponds to a verb token from the Ancient Greek Dependency Treebank.

Author	Number of lexicon entries
Aeschylus	6007
Aesop	826
Athenaeus	5766
Diodorus	3445
Herodotus	4784
Hesiod	2169
Homer	30567
Lysias	1176
Plato	745
Plutarch	2884
Polybius	3357
Pseudo-Apollodorus	150
Pseudo-Homer	450
Sophocles	6293
Thucydides	3249
TOTAL	72067

Table 4

Most frequent valency frames, with their frequency in the lexicon

Frame	Count
active_OBJ[accusative]	12557
active_OBJ[accusative],SBJ[nominative]	4512
active_SBJ[nominative]	4218
active_OBJ[infinitive]	2260
active_OBJ[dative]	1723
medio-passive_OBJ[accusative]	1624
middle_OBJ[accusative]	1543
medio-passive_SBJ[nominative]	1506
active_PNOM[nominative]	1234
active_OBJ[genitive]	1190
active_PNOM[nominative],SBJ[nominative]	1037
medio-passive_OBJ[dative]	829
active_OBJ_CO[accusative]	806
active_OBJ[infinitive],SBJ[nominative]	792
active_OBJ[dative],SBJ[nominative]	790
medio-passive_OBJ[infinitive]	750
active_OBJ[accusative],OBJ[dative]	744
active_(εἰς)OBJ[accusative]	682
active_OBJ[dative],OBJ[accusative]	596
middle_SBJ[nominative]	584

3.1 Comparison with traditional lexicographical resources

A practical way to show the usefulness of the lexicon is to compare it with a commonly used scientific dictionary. We chose to compare it with the relatively recent GE (seeSection 2), rather than the older LSJ (Liddell et al., 1996), as GE highlights valency information more clearly, especially for high frequency verbs. So, for instance, the various constructions for τίθημι ‘to put’ are provided in italics at the start of the dictionary entry, together with their most common translations in bold, before examples are provided in the body of the entry. Not all entries for verbs have an initial summary of their constructions, but even those which do not still highlight information about syntactic dependencies in italics throughout the body.

In order to compare the constructions listed in the dictionary with those in the lexicon, we chose a small set of 5 transitive verbs, from the larger dataset used in Section 4. These are very high frequency verbs with a wide range of constructions: αἱρέω ‘to take,’ δίδωμι ‘to give,’ φέρω ‘to bring/carry,’ βάλλω ‘to throw/hit,’ and τίθημι ‘to put.’ While this selection is not meant to be exhaustive or representative, these 5 verbs have two advantages: given how common they are, they all have a summary of constructions in GE, and the range of available constructions makes them a good test case for the coverage in both dictionaries and valency lexica.

For each of these verbs, we noted down the dependency information that is given in GE, without taking note of diathesis (active vs. middle vs. passive), as the dictionary does not always break down meanings by diathesis unless a specific passive or middle meaning is involved. We then searched AGVaLex for all dependencies that are recorded for each verb, and noted which ones do not appear in the dictionary, as well as where they are attested. We made this choice because constructions that occur in a range of authors are arguably more likely to be recorded in a dictionary than constructions that are unique to one author, even in a partial sample like the one that forms the basis of the AGVaLex.

The results of this comparison are summarised in Table 5. The final column in this table contains the number of ‘collostructions’ reported for the same verb in Myria (see Section 2). There is only limited overlap between the way Myria categorises collostructions and the data in VaLex: while both resources record information about the lexical fillers that occur with each verbal lemma in the corpus, VaLeX does not provide any information about the lemma’s preference for a specific construction compared to expectations. Therefore, the numbers for Myria in Table 5 are only reported for reference.

Table 5

Comparison between GE, Myria, and AGVaLex on transitive verbs

Verb	Constructions recorded in GE	Constructions only in AGVaLex	Constructions only in AGVaLex that occur in more than one author	Constructions only in AGVaLex that occur at least 10 times	Collostructions in Myria
αἱρέω ‘to take’	5	31	1	1	10
δίδωμι ‘to give’	9	27	7	2	12
φέρω ‘to bear’	9	43	7	3	9
βάλλω ‘to throw’	1	69	15	7	8
τίθημι ‘to put’	9	60	12	5	12

As table 5 shows, while AGVaLex lists significantly more constructions than the dictionary, most of them are very rare and/or unique to one author, which makes them less relevant to a lexicographical resource that is meant to represent ‘standard’ Greek, with only limited reference to special usage. In addition to this, a large proportion of constructions that are unique to one author are unique to Homer, a phenomenon that sometimes has to do with Homeric syntax preserving traces of an archaic stage of development (see e.g. Hackstein 2010); for instance, as many as 48 of the 69 constructions listed for βάλλω involve prepositions such as πρός or κατά that at a later stage of the development of the Greek language are incorporated into the verb itself, creating individual compound verbs that are listed as separate dictionary entries (on similar changes in the argument structure of verbs from Homer to Classical Greek, see Luraghi 2020). The issue of preverbs and their lexicalisation has been explored through computational valency lexica for Latin (McGillivray 2014: ch. 6), but no discussion of the issue in Ancient Greek using similar methods exists.

That said, even for such a small sample as the one we tested, AGVaLex does sometimes bring useful additional information. For instance, we can hypothesise that a small cluster of constructions for δίδωμι ‘to give’ plus dative and infinitive, which only appears in Herodotus (two times), Hesiod (once), Homer (27 times) and Athenaeus (two times), is a feature of the Ionic dialect which is shared by all these authors.^⁷ Sometimes, the dictionary misses out on the fact that a verb can occur with a whole extra case: αἱρέω ‘to take’ occurs with an object in the genitive 15 times in Homer,^⁸ a construction that is not reported in GE. This sort of information is useful for traditional textual criticism, which often requires answering questions such as ‘can this verb occur with x case?’. AGVaLex offers a convenient database on which to search for answers to these questions, without having to manually check thousands of occurrences of one verb in a corpus of text.

The results above, of course, are not meant to show that the valency lexicon is superior to a published dictionary. Each lexical resource has its own purpose, but the valency lexicon does prove its worth in a test of its completeness against a common lexicographical resource, as well as offering possibilities in relation to common philological aims like textual criticism, as detailed above.

In addition to the features described above, AGVaLex allows the user to retrieve summary data by construction (e.g., searching for all verbs that take the preposition ἐν ‘in’ plus the dative case), a type of search that cannot be performed even with an online dictionary such as GE, which has a limited user interface. On the other hand, AGVaLex is not suited to a language learner wanting to know about common constructions in an easily understandable way, and does not provide translations. The valency lexicon could also be used profitably to look for examples of specific structures for the purpose of writing a dictionary. As it also records the lexical information for nouns that enter in specific constructions with a verb, it is extremely useful for the type of semantic studies that will be described in Section 4.

A note on the Homeric Dependency Lexicon (HoDeL), the only other available verbal valency database for Ancient Greek, as described in Section 2. Since AGVaLex covers a much broader corpus than HoDeL, it makes little sense to directly compare the number of constructions retrievable by the two tools. A comparison between the search for constructions with αἱρέω in AGVaLeX (Table 5) and the same search performed in HoDeL illustrates the difference between these two tools well. HoDeL retrieves 415 occurrences of αἱρέω, with 232 different arguments, and provides information about the case and relation of each of these arguments.

Figure 2

Screenshot from HoDeL illustrating an example of its use. The site was accessed on 08/04/2024.

Citation: Journal of Greek Linguistics 24, 2 (2024) ; 10.1163/15699846-02402003

The user can filter this data using the left-hand side menus in order to look at, for instance, all occurrences of the verb with three arguments, and explore which case these arguments appear in (see Figure 2); individual examples are represented through a dependency tree on the right. This visualisation makes HoDeL an excellent teaching tool, and its user-friendliness is way higher than the usual standards in the field. On the other hand, downloading data from HoDeL is effectively impossible, making it a less useful tool for studies that aim to cover all examples of a phenomenon, as illustrated in Section 4. Together with AGVaLex’s wider coverage, this shows how the two tools are complementary in their range and use cases.

4 Case study: semantic variation in TrV+Obj formulae

4.1 Aims and context of the study

The case study introduced here aims to assess the scope of semantic variation in a sample of epic formulae, and then to compare the results with a baseline corpus (for the importance of this step see Wulff 2008). We use Distributional Semantics to quantify semantic variation. The target of analysis is a sample of formulae made of a transitive verb and its direct object in the accusative (from here on, TrV+Obj formulae), selected exclusively on the basis of frequency. These are phrases of the type πάθεν ἄλγεα (pathen[aor.ind.act.3s] algea[acc.pl], ‘suffered ills’, Odyssey 1.4). We look at the semantic range of the objects of these phrases to answer questions about the semantic behaviour of these objects depending on formulaic status: does the existence of a formulaic phrase with a specific object promote the creation of quasi-synonymous or semantically related formulae (which would increase the average similarity of the objects), or does it have a pre-emptive effect, analogous to what we observe for idioms, where the existence of an idiom with a certain meaning actually discourages the creation of synonymous idiomatic phrases (Suttle & Goldberg 2011)?^⁹

The study of formulaic variation has been a major topic in Homeric studies since at least the 1960s (Hoekstra 1965, 1969; Hainsworth 1968; Postlethwaite 1979; Friedrich 2019). Formulae allow for a limited amount of linguistic variation, a trait which they share with idioms and other multi-word expressions in everyday language (Kiparsky 1976). Most recently, the behaviour of formulae has been described under the linguistic framework of Construction Grammar (Goldberg 1995): formulae are indissoluble pairs of form and function, and as such are characterised by restrictions as to their shape as well as their meaning (Bozzone 2014, 2024; Antović & Cánovas 2016).

We use Distributional Semantics to model the range of meaning of the formulae and non-formulaic material in this case study. As a corpus-based approach, Distributional Semantics is particularly suited to the study of dead languages such as Ancient Greek, where no speaker input can be sought. In Distributional Semantics, the meaning of a word is defined as a function of its collocates in a corpus: words that share a linguistic context are also related in meaning (Harris 1954; Fabre & Lenci 2015). Shared linguistic contexts are modelled mathematically via word vectors which encode the frequency of co-occurrences between each word in the corpus and each of the others (with the possible exception of semantically empty ‘stop-words’). These vectors form a distributional space model of meaning (DSM); the distance between the vector associated to each word and the vector associated to another represents the similarity between the words’ meanings.^¹⁰

For this case study, we use a DSM built from the Diorisis Ancient Greek corpus (Vatri & McGillivray 2018) using DISSECT (Dinu, Pham, & Baroni2013). The DSM was optimised specifically for ancient Greek (Rodda, Probert, & McGillivray 2019).

4.2 Data and methods

The data on TrV+Obj formulae was extracted by running a Python script^¹¹ on texts from the Ancient Greek and Latin Dependency Treebank (AGLDT: Bamman & Crane, 2011), a syntactically parsed corpus that is part of the Perseus Project. The TrV+Obj pairs were extracted from the four main archaic Greek epic texts: Homer’s Iliad and Odyssey and Hesiod’s Theogony and Works and Days (from here on, ‘the epic corpus’). Two formular editions (Pavese & Venti 2000; Pavese & Boschetti 2003), which are designed to mark material in the target texts as formulaic or non-formulaic based on their frequency in the texts, were used to establish which of these automatically extracted phrases are properly formulaic, i.e. repeated in the traditional language; we opted to adopt the formular editions’ pre-existing definition of formularity, rather than introduce our own, in order to minimise researcher bias.^¹² Out of the 6764 formulaic TrV+Obj pairs that were thus extracted, only the objects of those verbs that occur at least 50 times in the epic corpus were selected, for a total of 26 verbs and 2703 tokens (ranging from 335 to 50).

The non-formulaic data for comparison was extracted from AGVaLex. All texts from the lexicon’s database were included apart from those which overlap with the epic corpus, i.e. the Iliad and the Odyssey. We looked up each of the 26 target verbs in the lexicon, and manually selected the accusative objects from the existing data.

The analysis below is not on tokens, but on types (for the reasons see Barðdal 2008), i.e. on unique object lexemes of each transitive verb. We therefore discarded any verbs that had less than 10 object types in either the epic corpus or the comparison corpus, which reduces the sample to 15 verbs. The final list of verbs, with their type frequency, is provided in Table 6.

Table 6

Target verbs and their object types in the epic and baseline corpus, ordered by token frequency in the epic corpus (not by type frequency)

	Verb	Epic	Baseline
1	ἔχω ekhō ‘have’	91	485
2	αἱρέω haireō ‘take’	56	80
3	δίδωμι didōmi ‘give’	58	90
4	εἶπον eipon ‘say’	28	49
5	φέρω pherō ‘carry’	41	120
6	βάλλω ballō ‘throw’	49	24
7	εἶδον eidon ‘see’	43	87
8	τίθημι tithēmi ‘put’	45	101
9	οἶδα oida ‘know’	29	53
10	χέω kheō ‘pour’	13	13
11	ἄγω agō ‘lead’	33	92
12	λύω lyō ‘loosen’	14	31
13	τίκτω tiktō ‘give birth’	13	25
14	ἵημι hiēmi ‘send out’	19	13
15	ἀκούω akouō ‘hear’	11	48

For each verb, therefore, we have a list of object types in the epic corpus and one in the baseline corpus, for a total of 30 lists. To assess their semantic similarity, we measured the cosine distance between the objects in each list and their respective centroids in the semantic space (see again Rodda, Probert, & McGillivray 2019 for another example of this approach). This gives us 30 distributions of distances, which can be compared to each other or assessed for the influence of other factors.

4.3 Results

To assess the relationship between formulaic and non-formulaic verb phrases, we compared the semantic range of objects in the epic corpus vs. the baseline corpus for each verb. The results of this comparison are detailed in the boxplot in Figure 3.^¹³

The distributions of distances for each pair were compared using the Kolmogorov-Smirnov test in R (R Core Team 2017). Two significance thresholds were set: p < 0.05 for high significance (**) and p < 0.1 for low significance (*). The results are summarised in Table 7.

There are relatively few significant differences here, even with a higher than usual significance threshold. The four verbs that show a significant difference are ἔχω ekhō ‘have’ (our most frequent verb), χέω kheō ‘pour’, τίκτω tiktō ‘give birth’, and ἵημι hiēmi ‘send out’ (three verbs with much lower type and token frequency). For the first three, the median similarity is higher in the baseline than the formulaic corpus; the variance is always higher in the formulaic corpus.

In other words, there is only a very limited effect of formularity on semantic range, but as far as an effect can be observed, it appears to go in the direction of constructional pre-emption: objects of formulaic phrases tend to show lower semantic similarity. This is somewhat surprising, as discussions of formulaic systems (starting with Parry 1930, 1932) have stressed the poetic utility of having a range of expressions that are similar in meaning but have different metrical shapes, a result which could be easily obtained by varying lexical items and using synonyms or near-synonyms. It is possible that the definition of formularity adopted in this study, which was based on simple repetition and did not take metre into account, does not capture subtleties in the actual relation between verbs and objects which could help explain our results. It is also possible that a different approach to the data analysis would reveal a different pattern—for instance, if we set out to look for individual clusters of closely-related words among the objects of a formulaic verb, rather than measure their semantic proximity to a centroid in the semantic space. As one reviewer suggested, the broad semantic range of many of the verbs in this study probably plays a role; it is suggestive that more semantically restricted verbs such as χέω or τίκτω produce statistically significant differences. All of these avenues remain open for further analysis. What we can say for certain is that there appears to be more to be explored when it comes to the semantics of verbal constructions in early Greek epic.

Figure 3

Box-and-whiskers plots of object similarities in non-formulaic (white) vs. formulaic (dotted) TrV+Obj pairs

Citation: Journal of Greek Linguistics 24, 2 (2024) ; 10.1163/15699846-02402003

Table 7

Comparison between formulaic and non-formulaic distributions of objects in the semantic space

Verb	Median similarity		Variance		Significance
	Formulaic	Baseline	Formulaic	Baseline
ἔχω	0.427	0.482	0.0109	0.0100	* (p = 0.084)
αἱρέω	0.407	0.455	0.0172	0.0123	(p = 0.240)
δίδωμι	0.447	0.430	0.0077	0.0130	(p = 0.911)
εἶπον	0.324	0.329	0.0116	0.0188	(p = 0.538)
φέρω	0.422	0.468	0.0091	0.0086	(p = 0.414)
βάλλω	0.446	0.416	0.0139	0.0147	(p = 0.716)
εἶδον	0.371	0.445	0.0137	0.0103	(p = 0.106)
τίθημι	0.410	0.439	0.0133	0.0093	(p = 0.723)
οἶδα	0.414	0.394	0.0070	0.0227	(p = 0.537)
χέω	0.364	0.472	0.0096	0.0027	** (p = 0.034)
ἄγω	0.392	0.456	0.0095	0.0099	(p = 0.329)
λύω	0.314	0.470	0.0080	0.0072	(p = 0.168)
τίκτω	0.270	0.405	0.0122	0.0062	* (p = 0.052)
ἵημι	0.453	0.307	0.0104	0.0027	* (p = 0.053)
ἀκούω	0.413	0.375	0.0059	0.0118	(p = 0.787)

5 Discussion and conclusions

We have presented AGValex and illustrated, via some examples and the case study in Section 4, how it can be used to explore crucial issues in Ancient Greek linguistics. For example, we have shown how AGVaLex allows researchers to retrieve summary data by construction, which cannot be done with existing dictionaries, as well as to look for examples of specific structures, including with lexical information for nouns in specific constructions with a verb. The case study focused on Homeric formularity, a topic which is primarily of interest to literary scholars, who are particularly likely to appreciate a pre-compiled dataset that can be directly applied to their work without the need for significant knowledge of the relevant databases and/or programming languages. The limited space devoted to the application of AGVaLex in Section 4 should not obscure the fact that the existence of the database in practice enabled this research in the first place: gathering the data on TrV + Obj constructions in Homer and Hesiod required weeks of work, which would have needed to be scaled up to the entirety of the baseline corpus, a practically insurmountable task. While the results of the case study should be seen as preliminary when it comes to furthering our understanding of semantic variation in formulae, they show the promising value of the Distributional Semantics approach and of the use of a comparison database to assess how formulaic behaviour differs from non-formulaic usage.

A resource such as AGVaLex, if maintained and kept up to date, can enable research that would otherwise require more time and computational power than the average literature scholar can be expected to apply. As the availability of syntactically annotated corpora expands, these resources can be integrated into AGVaLex, ensuring the widest distribution of the data. AGVaLex fits in a broader trend of making Digital Humanities (DH) resources available to researchers outside of DH, which is also illustrated by resources such as HoDeL and Syntacticus; work on formularity in various Indo-European languages has already started taking advantage of the availability of these resources (Biagetti 2023; Brigada Villa et al. 2023), a trend that the current study hopes to further promote.

Author contributions

BMcG designed and implemented the scripts for the creation of the lexicon, and conducted the quantitative analysis of the lexicon described in section 3; she wrote sections 1, 1.1, and 3. The authors jointly wrote section 2. MAR conducted the comparative analysis of dictionaries, designed the case study and conducted its analysis, and wrote sections 3.1, 4, 4.1, 4.2, 4.3, and 5.

For an introduction to the methodology followed by the TLL, see https://www.thesaurus.badw.de/fileadmin/user_upload/Files/TLL/TLL_Flyer-2012_englisch.pdf. For a description of the typical structure of an entry in TLL see https://www.thesaurus.badw.de/en/hilfsmittel-fuer-benutzer/article-structure.html.

PROIEL has an associated online treebank query tool, Syntacticus (https://syntacticus.org/), which allows users to extract data from a corpus containing a small number of texts in Ancient Greek, Latin, Classical Armenian, Gothic, Old Church Slavonic, Old English, Old French, and Portuguese. Although the tool is very well designed, its coverage for Ancient Greek is minimal, and we do not discuss it further in this article.

https://hodel.unipv.it/hodel-res/

https://su-lab.unipv.it/tasf/wp-content/uploads/2021/01/HoDeL_guidelines.pdf

The FAQ for Myria contain the following explanation: “What is a collostruction? … [sic]”.

For example, in Odyssey 1.2, ἐπεὶ Τροίης ἱερὸν πτολίεθρον ἔπερσεν (“since he razed Troy’s sacred citadel”), the subject is not expressed within the sub-clause but can be identified as Odysseus from context.

Athenaeus, who is not an Ionic author, contains many quotations from Ionic sources. This construction can coordinate with the more common dative + accusative construction, as shown in this example from Herodotus (Histories 1.54, i.e. subdoc 1.54 sentence_id 346 root_id 457705 in AG ValeX): Δελφοὶ δὲ ἀντὶ τούτων ἔδοσαν Κροίσῳ καὶ Λυδοῖσι προμαντηίην καὶ ἀτελείην καὶ προεδρίην, καὶ ἐξεῖναι τῷ βουλομένῳ αὐτῶν γίνεσθαι Δελφὸν ἐς τὸν αἰεὶ χρόνον.

For instance in Homer, Il. 1.323 (subdoc 1.323 sentence_id 2274288 root_id 156142 in AG ValeX): χειρὸς ἑλόντ̓ ἀγέμεν Βρισηΐδα καλλιπάρῃον. It has been suggested to us that this construction could be a partitive genitive, which is not specifically selected by αἱρέω, which would explain why it is missing from the GE entry. GE, however, does not only report constructions that are specifically selected by each verb (other constructions for αἱρέω include ‘with accusative’ and ‘with a participle’), and the example cited does not appear to fall under the category of the partitive genitive, at least in traditional terms, but rather of the genitive used with verbs of contact. Luraghi & Conti 2014 argue that this use of the genitive is itself an example of a partitive construction, but the Homeric example above is difficult to explain under their framework, as it involves neither a part of the object in the genitive, nor a low degree of involvement of the object itself.

For a formulation of this concept in Homeric studies, see Parry, 1930.

We have avoided a detailed discussion of Distributional Semantics here so as not to unnecessarily encumber the case study; interested readers can find accessible explanations in Sahlgren (2006) and Rodda (2021).

All scripts for Section 4 are available at https://github.com/MartinaAstridRodda/dphil-thesis.

Readers can find a useful discussion of definitions of formula, and their implications in a current cognitive linguistics perspective, in Bozzone (2023, 2024).

All figures and tables in this section are reproduced from Rodda (2021).

References

Antović, Mihailo & Cristóbal Pagán Cánovas. 2016. Construction Grammar and Oral Formulaic Theory. Oral poetics and cognitive science, ed. by Mihailo Antović & Cristóbal Pagán Cánovas, 79–98. Berlin: De Gruyter.
- Search Google Scholar
- Export Citation
Bamman, David & Gregory Crane. 2011. The Ancient Greek and Latin Dependency Treebanks. Language technology for cultural heritage: Selected papers from the LaTeCH workshop series, ed. by Caroline Sporleder, Antal Bosch, & Kalliopi Zervanou, 79–98. Berlin: Springer.
- Search Google Scholar
- Export Citation
Barðdal, Jóhanna. 2008. Productivity: Evidence from case and argument structure in Icelandic. Amsterdam: John Benjamins.
- Search Google Scholar
- Export Citation
Bayerische Akademie der Wissenschaften 2002. Thesaurus Linguae Latinae (CDROM). https://thesaurus.badw.de/en/project.html
- Search Google Scholar
- Export Citation
Biagetti, Erica. 2023. Integrare Sanskrit WordNet e Vedic Treebank: Uno studio pilota sulla formularità del RigVeda tra semantica e sintassi. E pluribus unum. Prospettive sull’antico, ed. by Isabella Bossolino & Chiara Zanchi, 45–62. Pavia: Pavia University Press.
- Search Google Scholar
- Export Citation
Biagetti, Erica, Chiara Zanchi, & Silvia Luraghi. 2021. Building new resources for historical linguistics. Pavia: Pavia University Press.
- Search Google Scholar
- Export Citation
Bozzone, Chiara. 2014. Constructions: A new approach to formularity, discourse, and syntax in Homer. PhD dissertation, Indo-European Studies, UCLA. https://escholarship.org/uc/item/6kg0q4cx
- Search Google Scholar
- Export Citation
Bozzone, Chiara. 2023. Chunks, collocations, and constructions: The Homeric formula in cognitive and linguistic perspective. New light on formulas in oral poetry and prose, ed. by David Sävborg & Bernt Ø. Thorvaldsen, 113–139. Turnhout: Brepols.
- Search Google Scholar
- Export Citation
Bozzone, Chiara. 2024. Homer’s living language: Formularity, dialect, and creativity in oral-traditional poetry. Cambridge: Cambridge University Press.
- Search Google Scholar
- Export Citation
Brigada Villa, Luca, Erica Biagetti, Riccardo Ginevra, & Chiara Zanchi. 2023. Combining WordNets with Treebanks to study idiomatic language: A pilot study on Rigvedic formulas through the lenses of the Sanskrit WordNet and the Vedic Treebank. Proceedings of the 12th Global Wordnet Conference, 133–139. Donostia—San Sebastian: Global Wordnet Association.
- Search Google Scholar
- Export Citation
Celano, Giuseppe G.A. 2014. Guidelines for the annotation of the Ancient Greek Dependency Treebank 2.0. https://github.com/PerseusDL/treebank_data/edit/master/AGDT2/guidelines
- Search Google Scholar
- Export Citation
Celano, Giuseppe G.A. 2019. The dependency treebanks for ancient Greek and Latin. Digital classical philology, ed. by Monica Berti, 279–298. Berlin: De Gruyter.
- Search Google Scholar
- Export Citation
Dinu, Georgiana, Nghia The Pham, & Marco Baroni. 2013. DISSECT—DIStributional SEmantics Composition Toolkit. Proceedings of the 51st annual meeting of the Association for Computational Linguistics: System demonstrations, 31–36. Sofia: Association for Computational Linguistics.
- Search Google Scholar
- Export Citation
Eckhoff, Hanne M., Silvia Luraghi, & Marco Passarotti. 2018. The added value of diachronic treebanks for historical linguistics. Diachronica 35.3.297–309.
- Search Google Scholar
- Export Citation
Fabre, Cécile & Alessandro Lenci. 2015. Distributional Semantics today: Introduction to the special issue. Traitement automatique des langues 56.2.7–20.
Friedrich, Rainer. 2019. Postoral Homer: Orality and literacy in the Homer epic. Stuttgart: Franz Steiner Verlag.
- Search Google Scholar
- Export Citation
Goldberg, Adele E. 1995. Constructions: A Construction Grammar approach to argument structure. Chicago: University of Chicago Press.
- Search Google Scholar
- Export Citation
Hackstein, Olav. 2010. The Greek of epic. A companion to the Ancient Greek language, ed. by Egbert J. Bakker, 401–423. Oxford: Wiley-Blackwell.
- Search Google Scholar
- Export Citation
Haug, Dag T.T. 2015. Treebanks in historical linguistics research. Perspectives on historical syntax, ed. by Carlotta Viti, 187–202. Amsterdam: John Benjamins.
- Search Google Scholar
- Export Citation
Hainsworth, John B. 1968. The flexibility of the Homeric formula. Oxford: Clarendon Press.
- Search Google Scholar
- Export Citation
Hajič, Jan, Jarmila Panevová, Eva Buráňová, Zdeňka Urešová, & Alevtina Bémová (in cooperation with) Jiří Kárník, Jan Štěpánek, & Petr Pajas. 1999. Annotations at analytical level: Instructions for annotators. https://ufal.mff.cuni.cz/pdt2.0/doc/manuals/en/a-layer/html/index.html
Happ, Heinz. 1976. Grundfragen einer Dependenz-Grammatik des Lateinischen. Göttingen: Vandenhoeck & Ruprecht.
- Search Google Scholar
- Export Citation
Harris, Zellig S. 1954. Distributional structure. Word 10.2–3.146–162.
- Search Google Scholar
- Export Citation
Haug, Dag T.T. & Marius L. Jøhndal. 2008. Creating a parallel treebank of the Old Indo-European Bible translations. Proceedings of the second workshop on language technology for cultural heritage data (LaTeCH 2008), ed. by Caroline Sporleder & Kiril Ribarov, 27–34. Association for Computational Linguistics.
- Search Google Scholar
- Export Citation
Haug, Dag T.T. 2012. Syntactic conditions on null arguments in the Indo-European Bible translations. Acta Linguistica Hafniensia 44.2.129–141.
- Search Google Scholar
- Export Citation
Hoekstra, Arie. 1965. Homeric modifications of formulaic prototypes: Studies in the development of Greek epic diction. Amsterdam: N.V. Noord-Hollandsche Uitgevers Maatschappij.
- Search Google Scholar
- Export Citation
Hoekstra, Arie. 1969. The sub-epic stage of the formulaic tradition: Studies in the Homeric hymns to Apollo, to Aphrodite and to Demeter. Amsterdam: North-Holland Publishing Company.
- Search Google Scholar
- Export Citation
Keersmaekers, Alek, Wouter Mercelis, Colin Swaelens, & Toon van Hal. 2019. Creating, enriching and valorising treebanks of Ancient Greek: The ongoing Pedalion-project. Available at https://syntaxfest.github.io/syntaxfest19/proceedings/papers/paper_68.pdf .
- Search Google Scholar
- Export Citation
Keersmaekers, Alek. 2020. A computational approach to the Greek papyri: Developing a corpus to study variation and change in the post-classical Greek complementation system. PhD thesis, KU Leuven. https://lirias.kuleuven.be/retrieve/590983 .
- Search Google Scholar
- Export Citation
Keydana, Götz & Silvia Luraghi. 2012. Definite referential null objects in Vedic Sanskrit and Ancient Greek. Acta Linguistica Hafniensia 44.2.116–128.
- Search Google Scholar
- Export Citation
Kiparsky, Paul. 1976. Oral poetry: Some linguistic and typological considerations. Oral literature and the formula, ed. by Benjamin A. Stolz & Richard S. Shannon, 73–125. Ann Arbor: University of Michigan.
- Search Google Scholar
- Export Citation
Liddell, Henry G., Robert Scott, Henry S. Jones, &amp; Roderick McKenzie, eds. 1996. A Greek-English lexicon. Oxford: Clarendon Press.
- Search Google Scholar
- Export Citation
Luraghi, Silvia. 2003. Definite referential null objects in Ancient Greek. Indogermanische Forschungen 108.169–196.
- Search Google Scholar
- Export Citation
Luraghi, Silvia. 2020. Experiential verbs in Homeric Greek. Leiden: Brill.
- Search Google Scholar
- Export Citation
Luraghi, Silvia & Luz Conti. 2014. The ancient Greek partitive genitive in typological perspective. Partitive cases and related categories, ed. by Silvia Luraghi & Tuomas Huumo, 443–476. Berlin: De Gruyter.
- Search Google Scholar
- Export Citation
Mayrhofer, Manfred. 1980. Zur Gestaltung des etymologischen Wörterbuchs einer ‘Großcorpus-Sprache’. Vienna: Österr. Akademie der Wissenschaften, Phil-Hist. Klasse.
- Search Google Scholar
- Export Citation
McGillivray, Barbara, Marco Passarotti, & Paolo Ruffolo. 2009. The Index Thomisticus Treebank project: Annotation, parsing and valency lexicon. TAL—Traitement Automatique Des Langues 50.2.103–127.
- Search Google Scholar
- Export Citation
McGillivray, Barbara & Marco Passarotti. 2012. Accessing and using a corpus-driven Latin valency lexicon. Latin linguistics in the early 21st century. Acts of the 16th international colloquium on Latin linguistics, Uppsala, June 6th–11th, 2011, ed. by Gerd V.M. Haverling. Uppsala: Uppsala Universitet.
McGillivray, Barbara. 2014. Methods in Latin Computational Linguistics. Leiden: Brill.
- Search Google Scholar
- Export Citation
McGillivray, Barbara. 2021. Ancient Greek valency lexicon (AGVaLex). Figshare dataset. 10.6084/m9.figshare.14316251
- Search Google Scholar
- Export Citation
McGillivray, Barbara & Alessandro Vatri. 2015. Computational valency lexica for Latin and Greek in use: A case study of syntactic ambiguity. Journal of Latin Linguistics 14.1.101–126.
- Search Google Scholar
- Export Citation
Montanari, Franco. 2015. The Brill dictionary of Ancient Greek. English edition, ed. by Madeleine Goh & Chad Schroeder. Leiden: Brill.
- Search Google Scholar
- Export Citation
Parry, Milman. 1930. Studies in the epic technique of oral verse-making I: Homer and Homeric style. Harvard Studies in Classical Philology 41.73–148.
- Search Google Scholar
- Export Citation
Parry, Milman. 1932. Studies in the epic technique of oral verse-making II: The Homeric language as the language of an oral poetry. Harvard Studies in Classical Philology 43.1–50.
- Search Google Scholar
- Export Citation
Passarotti, Marco, Berta González Saavedra & Christophe Onambele. 2016. Latin Vallex. A treebank-based semantic valency lexicon for Latin. Proceedings of the Tenth International Conference on Language Resources and Evaluation ({LREC}’16), 2599–2606. Portorož, Slovenia: European Language Resources Association (ELRA).
- Search Google Scholar
- Export Citation
Pavese, Carlo O. & Federico Boschetti. 2003. A complete formular analysis of the Homeric poems. Amsterdam: Hakkert.
- Search Google Scholar
- Export Citation
Pavese, Carlo O. & Paolo Venti. 2000. A complete formular analysis of the Hesiodic poems: Introduction and formular edition. Amsterdam: Hakkert.
Postlethwaite, Norman. 1979. Formula and formulaic: Some evidence from the Homeric hymns. Phoenix 33.1–18.
- Search Google Scholar
- Export Citation
R Core Team. 2017. R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. https://www.R-project.org/ .
- Search Google Scholar
- Export Citation
Rodda, Martina A. 2021. A corpus study of formulaic variation and linguistic productivity in early Greek epic. PhD thesis, University of Oxford. https://ora.ox.ac.uk/objects/uuid:1e682001-b916-4322-adc3-52857d93b92b/files/d2514nk879
- Search Google Scholar
- Export Citation
Rodda, Martina A., Philomen Probert, & Barbara McGillivray. 2019. Vector space models of ancient Greek word meaning, and a case study on Homer. TAL—Traitement Automatique Des Langues 60.3.
- Search Google Scholar
- Export Citation
Sahlgren, Magnus. 2006. The word-space model: Using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces. SICS Dissertation Series 44. Stockholm: Dept. of Linguistics, Stockholm Univ.
- Search Google Scholar
- Export Citation
Sausa, Eleonora & Chiara Zanchi. 2015. Non-accusative null objects in the Homeric Dependency Treebank. Proceedings of the workshop on corpus-based research in the Humanities, ed. by Marco Passarotti, Francesco Mambrini, & Caroline Sporleder, 107–116. Warsaw: Institute of Computer Science of the Polish Academy of Sciences.
- Search Google Scholar
- Export Citation
Sgall, Petr, Eva Hajičová, & Jarmila Panevová. 1986. The meaning of the sentence in its semantic and pragmatic aspects. Dordrecht: D. Reidel.
- Search Google Scholar
- Export Citation
Smyth, Herbert Weir. 1926. Aeschylus, with an English translation (in two volumes). 1. Persians. Cambridge, MA. Harvard University Press.
Stefanowitsch, Anatol. 2013. Collostructional analysis. The Oxford handbook of Construction Grammar, ed. by Thomas Hoffmann & Graeme Trousdale, 290–306. Oxford: Oxford University Press.
- Search Google Scholar
- Export Citation
Suttle, Laura & Adele E. Goldberg. 2011. The partial productivity of constructions as induction. Linguistics 49.1237–1269.
- Search Google Scholar
- Export Citation
Tesnière, Lucien. 1969. Éléments de syntaxe structurale. 2nd edition. Paris: Klincksieck.
- Search Google Scholar
- Export Citation
Untermann, Jürgen. 1983. Indogermanische Restsprachen als Gegenstand der Indogermanistik. Le lingue indoeuropee di frammentaria attestazione. Die indogermanischen Restsprachen. Atti del convegno della Societa italiana di glottologia e della Indogermanische Gesellschaft. Udine, 22–24 settembre 1981, ed. by Edoardo Vineis, 11–28. Pisa: Società Italiana di Glottologia, Giardini.
- Search Google Scholar
- Export Citation
Vatri, Alessandro & Barbara McGillivray. 2018. The Diorisis Ancient Greek corpus. Research Data Journal for the Humanities and Social Sciences 3.55–65.
- Search Google Scholar
- Export Citation
Wulff, Stephanie. 2008. Rethinking idiomaticity: A usage-based approach. London: Continuum International Publishing.
- Search Google Scholar
- Export Citation
Zanchi, Chiara, Eleonora Sausa, & Silvia Luraghi. 2018. HoDeL, a dependency lexicon for Homeric Greek: Issues and perspectives. Formal representation and the Digital Humanities, ed. by Paola Cotticelli-Kurras & Federico Giusfredi, 221–246. Cambridge: Cambridge Scholars Publishing.
- Search Google Scholar
- Export Citation
Zanchi, Chiara. 2021. The Homeric Dependency Lexicon: What it is and how to use it. Journal of Greek Linguistics 21.263–297.
- Search Google Scholar
- Export Citation

Title:: Computational valency lexica and Homeric formularity

Article Type:: Research Article

DOI:: https://doi.org/10.1163/15699846-02402003

Language:: English

Pages:: 264–289

Keywords:: Ancient Greek; verb valency; computational lexicon; formula studies; semantic flexibility; Distributional Semantics

In:: Journal of Greek Linguistics

In:: Volume 24: Issue 2

Received:: 19 Jun 2023

Accepted:: 09 Apr 2024

Publisher:: Brill

E-ISSN:: 1569-9846

Print ISSN:: 1566-5844

Subjects:: Indo-European Languages, Languages and Linguistics, Greek & Latin Literature, Classical Studies

ProCite

RefWorks

Reference Manager

BibTeX

Zotero

EndNote

View raw image

Figure 1

A selection of six entries from AGVaLex

View raw image

Figure 2

Screenshot from HoDeL illustrating an example of its use. The site was accessed on 08/04/2024.

View raw image

Figure 3

Box-and-whiskers plots of object similarities in non-formulaic (white) vs. formulaic (dotted) TrV+Obj pairs

	All Time	Past 365 days	Past 30 Days
Abstract Views	0	0	0
Full Text Views	584	284	34
PDF Views & Downloads	716	334	25

Computational valency lexica and Homeric formularity

In: Journal of Greek Linguistics

Authors:

Barbara McGillivray

Barbara McGillivray King’s College London Department of Digital Humanities London UK

Search for other papers by Barbara McGillivray in
Current site
Google Scholar
PubMed

https://orcid.org/0000-0003-3426-8200

and

Martina Astrid Rodda

Martina Astrid Rodda Merton College Oxford UK

Search for other papers by Martina Astrid Rodda in
Current site
Google Scholar
PubMed

https://orcid.org/0000-0002-5214-8037

View More View Less

Online Publication Date:: 14 Nov 2024

Download PDF Download Citation Get Permissions

Abstract

Keywords: Ancient Greek; verb valency; computational lexicon; formula studies; semantic flexibility; Distributional Semantics