1 Introduction
This chapter adds to the theme of lexicographic databases and language processing pipelines by illustrating how the programming framework ‘Shiny’ can help lexicographers take direct control of the digital workflows that go from raw corpora and lexical databases all the way up to the publication of online dictionaries. More generally, it contributes to the field of electronic historical lexicography by making a case for increasing the involvement of lexicographers in the process of designing and prototyping digital dictionaries and lexicographic data pipelines, and by offering practical examples of how Shiny can facilitate this involvement.
In particular, this chapter reports on how this technology has been used in the development of two historical dictionaries, A Visual Dictionary and Thesaurus Buddhist Sanskrit and A Visual Dictionary of Tibetan Verb Valency (Lugli et al. 2019–2022 and 2021). While the discussion focusses on evaluating the advantages and limitations of Shiny in the context of these two dictionary projects, the chapter aims to introduce the potential of this framework for historical digital lexicography in general.
The chapter first briefly presents Shiny and explains how its simplicity has widened access to web-development, thus spurring the creation of a variety of academic web-applications. It then reports on how this programming framework has been used within the two afore-mentioned dictionary projects (sections 3 and 4) and proceeds to consider to what extent this technology is sufficiently mature for the publication of full-scale online dictionaries (section 5). Finally, it concludes with a short outline of possible future developments for widening the use of Shiny in lexicography (section 6).
2 Shiny and Agile Development in Academic Web Applications
Shiny is an open-source framework for developing web applications in the programming language R (Chang et al. 2021). It was first released by RStudio (now Posit, posit.co) in 2012, with a view to allow R users to create interactive applications using only R functions, without the need to master the complex ecosystem of programming languages and technologies typically required for web development. This drastically reduces the time and skills needed to prototype and develop a digital application. It also simplifies application maintenance, as a programmer only needs to keep track of updates related to R, while the Shiny maintainers take care of keeping Shiny compatible with the fast-evolving landscape of web technologies (e.g. JavaScript, HTML and CSS frameworks).
The process of deploying and maintaining online Shiny applications was further simplified in 2015, when RStudio introduced their own web-hosting service, shinyapps.io (
In the subsequent years, Shiny has kept evolving, spurring the creation of a host of satellite R packages aimed at improving its performance and scalability (see section 3), with the result that it is today possible to develop commercial-grade applications using this technology (e.g. Zyla et al. 2022). Still, the area that has probably benefitted the most from Shiny is non-commercial academic research, where this framework has been employed to create resources as diverse as medical imagining software, gene sequencing tools and corpus analysis apps (e.g. Badgeley et al. 2019; Malagoli-Tagliazucchi and Taccioli 2020 and
While the immediate appeal of Shiny in academia is the simplification it brings to the process of developing, deploying and maintaining interactive applications, its real impact lies elsewhere. By lowering the entry barrier to web development, Shiny has enabled researchers in a variety of fields to engage directly in the online presentation of their findings and methods. This has had at least three effects.
First, it has greatly widened and democratized online dissemination of research. Given that no extra costs are involved in creating Shiny apps, an increasing number of academic projects can now afford a fairly sophisticated web presence, regardless of funding.
Second, by limiting the need for software engineers and IT professionals, Shiny significantly reduces team size and speeds up the cycle between creating new knowledge and bringing it to the intended audiences online. Combined with the previous point, this was found to have had a positive impact on data-sharing and on the adoption of advanced methodologies, especially across disciplines that rely on extensive data analysis—a domain in which the R programming language excels (Kasprzak et al. 2021).
Third, by giving researchers direct control over the development of their digital research products, Shiny fosters creative solutions uniquely tailored to the problems relevant to specialised domains; that is, problems researchers are typically better qualified to address than web developers. The ease of prototyping with Shiny also allows experts to respond quickly to peers’ feedback and emerging needs in their fields. In other words, it helps render digital research projects more ‘agile’ (for definitions of agile development, see Laanti et al. 2013, Conboy and Fitzgerald 2004: 110 and Ambers 2007). It is this last aspect of Shiny that makes it especially promising as a tool for developing historical dictionaries, an area where this technology has yet to gain traction.
3 From Professional Software Development to Shiny. The View from a Sanskrit Dictionary Project
According to a 2019 eLexis survey (Kallas et al. 2019: 529), there is a tendency in lexicographic projects to outsource software development, including the design of dictionary interfaces. The downside of this practice is reported to include high development costs, delays, difficulties in communicating lexicographic needs to developers and generally problems in bridging the knowledge gap between technical and lexicographic staff.
In 2018, the Buddhist Translators Workbench, a Sanskrit dictionary project hosted by the Mangalam Research Center (
This was intrinsically problematic, because the project was not quite traditional in nature. Its primary goal was to facilitate translation of Buddhist Sanskrit vocabulary, especially by clarifying sense relationships in polysemic words and highlighting differences between near-synonyms. With hindsight, there were no grounds to assume that classic text-based dictionary entries that combine a narrative description of headwords’ meanings with a list of examples would prove the most effective way to reach this goal.
The project workflow, too, diverged from mainstream lexicography. At the time there was no pre-processed corpus available for Buddhist Sanskrit material and the lexicographic team was therefore unable to take advantage of most of the corpus techniques and automated workflows that characterise contemporary lexicography. After a few years of slow manual work based exclusively on close reading of unprocessed Sanskrit passages, it became evident that the survival of the project depended on finding a more sustainable workflow that integrated a degree of automation (Lugli 2019). It was expected that devising an effective semi-automated workflow would require extensive trial and error. Tasking the project engineer with the amount of prototyping this would entail was not a viable option. The engineering costs associated with simply maintaining the software infrastructure that had been created so far were already straining the project resources; further prototyping would have left no budget for lexicographic work. More importantly, automating dictionary entries without a pre-processed corpus was an uncommon challenge and required creative solutions. Such solutions seemed easier to achieve if the dictionary curators, who were well acquainted with the challenges posed by the language and its lexicographic documentation, could engage in prototyping directly, without going through the cumbersome process of communicating requirements to the software engineer and wait for his output to evaluate whether an idea worked well in practice.
Shiny made this possible. Like many linguists, the dictionary curators were conversant in R through statistical training. Thanks to Shiny, they could write an interactive application using solely R. The prospective of drastically reducing the learning curve necessary to bring prototyping directly in the hands of the lexicographic team was the not the only factor behind the adoption of Shiny as the new web-development framework for this dictionary project. R and Shiny are an especially good fit for data-intensive applications, and corpus dictionaries are fundamentally such, insofar as they bring to the public corpus data and the lexicographers’ analyses of those data (Lugli 2021). Finally, contrary to other tools that enable non-programmers to create web applications (including dictionaries, e.g. Měchura 2017, see next section), Shiny is very powerful and extremely flexible. It would thus allow the dictionary curators to prototype the dictionary interface as well as the data-pipeline necessary for the automatic population of our dictionary entries, with no engineering costs.
At first the Shiny app created in this project was intended for internal use only. It started as a simple application meant to facilitate the editorial review of sets of semantically annotated Sanskrit citations prepared by lexicographers. The annotations included semantic tags that specify the sense instantiated by a headword in a citation, as well as its syntactic dependencies, conceptual relations, semantic prosody and other linguistic information. The app displayed the annotated data in the form of interactive graphs accompanied by citations. This allowed the team to visualise all the annotations related to a headword and check for their consistency and accuracy. Gradually, this initial app grew to encompass features that could also prove useful for the dictionary target audience. In the course of one year it was developed to the point of serving as a prototype for an entirely new version of the dictionary (Lugli 2019). The development was driven by cycles of experimentation and feedback gathering from a pool of prospective users of the dictionary, which helped identify a set of charts and features that would address the needs of the dictionary audience while also being easy to automate. The Shiny dictionary prototype marked a clear departure from the previous version of the dictionary insofar as it brought about a shift from manually crafted narrative entries to automated data-visualisations based on corpus annotations (Lugli 2021). While the new dictionary prototype could have been built with any web development framework, the simplicity of Shiny was instrumental to its evolution as it afforded the team freedom to experiment directly with different solutions without the costly mediation of professional developers.
Following the development of the dictionary prototype, Shiny was used to develop a dictionary curation pipeline as well, in the form of an annotation tool. Until then, the lexicographic team had used custom-made software created by the project engineer, which assisted them in annotating un-pre-processed Sanskrit sentences with the semantic and syntactic information required for the dictionary. In 2018 the project team had started creating a pre-processed corpus of Buddhist Sanskrit texts that could be used to automate part of the annotations (Lugli et al. 2019 and 2022). This corpus, however, had not been proofread and contained many segmentation and lemmatisation errors. The Shiny tool was created to address this issue by integrating the annotation workflow necessary for creating dictionary entries with systematic proofreading of the automatically lemmatised corpus (see figures 6.1 and 6.2).
This enabled us to progress on our lexicographic work and improve our corpus and corpus processing tools at the same time. Thanks to Shiny’s simplicity, all this was achieved at no extra cost to the project and with relatively little expenditure of time and effort (the first iteration of our Shiny annotation pipeline was designed, coded and tested in less than a week).
Originally, the role of Shiny in this project was planned to be limited to prototyping. It was expected that the project engineer would then build a public-facing application based on the Shiny prototype using better-established web technologies (see section 3). While this has not been discarded as a possible future solution, for the time being the Shiny prototype has been published online as a first iteration of the dictionary (
So far, this prototype has received decidedly more positive feedback than the original dictionary entries had (for the original entries, see 10.5281/zenodo.3605420). This is down to various reasons. First, by providing an interactive interface to access the corpus and visualise the semantic annotations the lexicographers encoded in it, the Shiny dictionary offers a more engaging user-experience than the original text-based entries. It also proves more versatile. Instead of offering a fixed narrative about a word, it enables researchers to explore our corpus and annotations independently from the lexicographers’ interpretation and reach their own conclusions. This is more than a simple change in entry design. It stems from a re-conceptualisation of digital dictionaries as data-centred applications, where the lexicographers’ interpretation of each corpus attestation is made explicit and associated to a data-point that the users can actively explore, critique and re-use in their own analyses. This re-conceptualisation is the fruit of a synergy between Shiny’s proclivity for data-analytics and the direct involvement of Buddhist Sanskrit experts in the process of dictionary prototyping. Like most historical sources, Buddhist Sanskrit literature poses a host of interpretative problems, ranging from philological issues and lexical ambiguity to philosophical intricacies and doctrinal nebulosity. Deep awareness of these issues steered the process of prototyping towards the creation of a resource and workflow that could lead to more transparent and reproducible lexicographic analyses (Lugli 2021).



Figure 6.1
The ‘proof-reader’ view of the Shiny annotation tool



Figure 6.2
The ‘annotator’ view of the Shiny annotation tool
On a more granular level, the success of the Shiny Sanskrit dictionary stems from offering an array of features that are uniquely attuned to its audience. Thanks to Shiny’s simplicity it is possible to respond to feedback and feature requests very rapidly, often in matters of hours. This brought on an explosion of functionalities. Only a minority of these survived in later prototypes, and still fewer will be kept in further iterations of the dictionary. Yet, no matter how ephemeral, most of the features prototyped proved valuable insofar as they prompted fresh reflection on what information to include in the dictionary, how to gather it efficiently and how to present it to the public.
In brief, rapid prototyping impacted the design of the Sanskrit dictionary entries in three main respects. First, it helped the team gain a fresh perspective on well-established lexicographic practices and, when necessary, abandon them to match the specific needs of the project. The presentation of senses in the entries is perhaps the most salient example of this. The original entries followed the widespread practice of describing each sense narratively in a way similar to traditional dictionary definitions. This resulted in rather verbose paragraphs that were not very engaging for dictionary users to read and extremely time consuming for lexicographers to produce. The plasticity of data-visualisations in Shiny facilitated experimentation with graphical representations of semantics and led to shifting from text to data-visualisations as the primary medium of communication. Currently, tree graphs are used to display word senses. Figure 6.3 reproduces one of these graphs, for the headword prajñapti.



Figure 6.3
Tree graph representation of the semantic spectrum of the word prajñapti in the Visual Dictionary and Thesaurus of Buddhist Sanskrit
Here senses and subsenses are expressed as English cognitive equivalents, grouped by semantic domain. Specifically, from left to right, the nodes in figure 6.3 represent the semantic domain (‘Existence’ and ‘Language’), senses (‘designating for a use’ and ‘verbal expression’) and subsenses of the word. Each sense and subsense is accompanied by information about its dispersion in the corpus, which, depending on the user’s preference, can be viewed as text or represented visually by the size of each node in the tree-graph. By condensing complex information about semantics and dispersion in a single interactive chart, the entries have become quicker and more engaging for users to peruse and much easier to automate. Once the semantic information has been encoded in the corpus, it can simply be visualised using out-of-the-box solutions provided by R packages (e.g. Glur 2020, Wickham 2016). By contrast, automatically generating elegant text-based definitions from semantic annotations would prove quite challenging (Lugli 2021).
Second, rapid prototyping with Shiny has helped the team creatively address well-known problems in Sanskrit lexicography. One such problem is the uncertain and hotly debated chronology of the sources, which makes it near impossible to provide diachronic information about word attestations (Lugli 2018). To allow for diachronic study of vocabulary while also accommodating competing dating of the sources in the corpus, the dictionary includes a metadata editor, which allows users of the dictionary to supply their own periodisation of the sources and visualise the semantic annotations according to that periodisation. The same feature also allows users to modify other metadata encoded in the corpus, such as genre and traditional affiliation. Depending on the users’ feedback, the option of adding metadata not included in our corpus may be offered in the future, to facilitate the exploration of research questions that were not foreseen when designing the corpus. While the metadata editor is meant for dictionary users, the system behind it also lends some flexibility to the periodisation and diachronic analysis curated by the lexicographers. As more manuscripts emerge and the fields of Buddhist and Sanskrit studies progress, new arguments are proposed for dating texts included in the corpus. Hence, the dictionary should be prepared to adapt to an evolving consensus in matters of chronology. The Shiny dictionary and the data architecture behind it allow for easily updating our periodisation and keep the resource abreast of developments in the field (cf. Lugli 2018).
Finally, the ease of prototyping that Shiny affords has enabled the dictionary curators to explore innovative ways to achieve the specific aims of our project. The main goal of the resource is to help translation of Buddhist Sanskrit vocabulary, and an important aspect of this task is to provide guidance as to the differences and similarities between the many near-synonyms that abound in Buddhist Sanskrit literature. In the absence of a clear lexicographic model to follow, it is necessary to design a feature capable of contrasting semantically close words in a way that is both easy for our users to grasp and not too laborious for lexicographers to curate. To this end, after a short bout of experimentation a thesaurus view of the corpus data was devised, simply by exploiting the semantic annotations that were already encoded in the corpus. The thesaurus section of the Shiny dictionary now features tabs devoted to finding and comparing similar words. The comparison includes contrastive examples that illustrate the use of a chosen pair of near-synonyms in similar semantic settings; a tree graph showing points of convergence and divergence in the synonyms’ semantic spectra; and a barchart detailing the proportion of sentences in which each of the selected synonyms appears with a given linguistic feature, such as with certain connotation or in certain grammatical case or number. The figure below illustrates this last feature. It shows how the near-synonyms nāman and nāmadheya, both of which can be translated in English as ‘name’, have almost complementary connotational profiles, with nāmadheya being used mostly in positive contexts (tagged as ‘pos’ semantic prosody and painted darker in the figure), whereas nāman is mostly used in neutral or quasi-negative contexts (tagged ‘neu’ and ‘neu-neg’ and coloured with lighter shades in the figure). This derives from nāmadheya being mostly used to proclaim the name of Buddhas, whereas nāman often occurs in passages that denounce the alleged role of language in hindering enlightenment (Lugli 2010).



Figure 6.4
Comparative barchart of the semantic prosodies of the near-synonyms nāman and nāmadheya
The ability to compare near synonyms in this way was well received, and it was suggested that the dictionary should support comparison between any pair of words in the dictionary, regardless of whether the lexicographers annotated them as near-synonyms or not. This suggestion is aligned with the project philosophy that users should be free to pursue their own lines of inquiry, so the feature was added to the dictionary, with the additional capability of also comparing the distribution of each word and word-sense across the corpus. Thanks to the simplicity of prototyping in Shiny, it was possible to introduce in Sanskrit lexicography an entirely new and rather complex lexical comparison feature that addresses the needs of the project’s target audience and does not require either additional lexicographic labour or engineering costs.
While Shiny is in no way the sole means to achieve these lexicographic solutions, the simplicity of this technology and the fact that many lexicographers may already be familiar with R through their linguistics training makes it a perfect candidate for rapid dictionary prototyping, especially for resources that address unique problems and require innovative features. Historical dictionaries, especially those documenting low-resource languages, are likely to fall in this category, as their corpora pose more problems than those of major contemporary languages and often face specific challenges, such as dating, spelling variants and difficulty of interpretation.
This is why Shiny was the web-development framework of choice also in the context of another historical dictionary project, a diachronic lexicon of Tibetan verb valency (bit.ly/VisualDictionary-TibetanValency). This resource presented a completely different set of challenges, but seemed equally ripe for reaping the benefits of Shiny’s rapid prototyping capabilities.
4 From Lexonomy to Shiny. The View from a Tibetan Valency Dictionary Project
The main goal of the Tibetan verb valency project was to enrich a small diachronic corpus of sentences with verb argument structure and create an online dictionary based on them. Of the two, annotation was both the more time consuming and the greater desideratum in the field of Tibetan linguistics. It was therefore given primacy over lexicographic development, which was allocated comparatively little resources. In need of a swift and easy way to build an online dictionary out of the syntactically annotated corpus, the project team turned to Lexonomy (
Lexonomy proved indeed very easy to use. It provided an intuitive xml editor through which users could specify the structure of their own entries and preview the result. Yet, its user-friendliness came at the cost of design flexibility and control over the data pipeline—two shortcomings that eventually led to the decision of switching to Shiny instead.



Figure 6.5
An entry of our Tibetan verb valency lexicon in Lexonomy
The project team tried Lexonomy from 2017 to 2019. In that period it offered very limited design options, with no easy way to customise menus or include data-visualisations. This posed a challenge for the project. Valency is complex and difficult to communicate engagingly in a dictionary entry. The text-only template offered by Lexonomy made for rather dull entries. The dictionary microstructure was organised by argument and provided frequency information and examples for each argumentation pattern governed by a verb, divided by period. The resulting Lexonomy entries were close to lists of examples and argument patterns (see figure 6.5), with minimal interactivity in the form of links between verbs with similar argumentation patterns—hardly an engaging digital dictionary!
To achieve a more dynamic presentation within the constraints of Lexonomy might have been possible, but further work with this software was not pursued because soon another difficulty arose.
A benefit of Lexonomy is its integration with the popular corpus tool Sketch Engine, which allows to automatically populate a Lexonomy dictionary with corpus data taken from Sketch Engine. In the time period discussed here (2017–2019), however, users could not pull data from their corpora into Lexonomy directly. The team from Sketch Engine had to take care of it, for a reasonable fee. This hindered rapid prototyping. In order to generate a dictionary draft from the annotated corpus it was necessary to provide the Sketch Engine team with precise specifications detailing exactly what corpus information had to populate which slot in the Lexonomy entry template. This is obviously good practice in client-developer relations, but it stifles experimentation and slows down progress. When a first draft of the dictionary was generated according to the agreed specifications, entries that had looked relatively well when previewed in Lexonomy with mock data turned out to be problematic with real corpus data. To address this, the project team tweaked the entry structure in Lexonomy’s xml editor and submitted a new set of requirements to the Sketch Engine team for another draft. This process could have probably continued over several iterations until a satisfactory version of the dictionary were reached. However, this was not an appealing prospect. The inevitable time lag between sending in requirements and receiving a dictionary draft, combined with the limited flexibility that Lexonomy afforded at the time, reduced the lexicographers’ ability to explore design avenues thoroughly within the timeline dictated by public funding.
Lack of flexibility was especially problematic in regard to the data-pipeline. Initially, the Lexonomy template produced for this project only relied on a syntactically annotated corpus that had already been uploaded to Sketch Engine. Generating a dictionary draft for it was straightforward, because it conformed to the default Lexonomy workflow, which required corpus data to be stored in Sketch Engine. The corpus on Sketch Engine however did not contain all the information necessary for the dictionary; notably it was not aligned with an English translation and had no semantic annotation, which precluded the possibility of automatically arranging the examples by sense and providing a translation for them. Curated data was available to enable the automatization of these features, but it was not readily compatible with the Lexonomy ecosystem. To make available translations of the Tibetan examples in the dictionary, a parallel corpus was to be created in Sketch Engine from a dataset of parallel sentences that had been prepared by the project team. This took some time and seemed needlessly laborious, given that the aligned dataset was already available, but it was the addition of semantic data that eventually resulted in the abandonment of Lexonomy in favour of a more flexible pipeline.
Since systematic manual semantic annotation of the entire corpus was beyond the scope of the project, semantic data from external resources had to be accessed to enable the automatic population of dictionary senses. A first layer of semantic information was to be added by incorporating word-senses from a pre-existent Tibetan dictionary (Hill 2010). This solution would quickly augment the dictionary data with senses for all headwords, but it would not connect senses to specific argument structures and examples, which was deemed desirable. To achieve this, a small sample of corpus sentences was to be annotated with semantic information for the most frequent verbs. This complicated the data-pipeline. Now a semantic annotation tool and workflow was needed to integrate the data dictionary not only with information extracted from the syntactically annotated corpus on Sketch Engine, but also with semantically annotated sentences created in a separate tool as well as with word-senses derived from a pre-existent Tibetan dictionary. Additionally, the dictionary was to include short, automated text sections to summarise a headword’s senses, diachronic distribution, arguments and multi-word expressions. This would require integrating a simple text generation feature in the dictionary data-pipeline. While no one of these tasks posed special technical challenges, implementing them through Lexonomy’s pre-packaged data-flow with the assistance of the Sketch Engine team seemed excessively laborious. Assembling the data-pipeline in-house by connecting a Shiny-based semantic annotation tool and all available data to a Shiny dictionary app proved a more efficient approach.
Moreover, as with the Sanskrit dictionary described in the previous section, switching to Shiny opened up the possibility of creating a more engaging dictionary interface by bringing a number of new functionalities to the dictionary. These fall within three main areas.



Figure 6.6
English search of the Visual Dictionary of Tibetan Verb Valency
First, the Shiny dictionary offers more query options to users. While the Lexonomy dictionary only allowed to search by Tibetan headword, the Shiny version includes the option to search by English equivalent as well as by argument. This extends the functionalities of the dictionary. The ‘English search’ section doubles as a quasi-thesaurus, as it groups together all the Tibetan headwords related to the English word a user searches for. It also offers some onomasiological insight, by displaying information about the relative frequency and diachronic distribution of verbs that share the same English equivalent. Figure 6.6 exemplifies this feature showing all the Tibetan verbs in the dictionary that match the meaning of the English verb ‘to read’. It also shows a barchart representing the distribution of these verbs in different diachronic layers of the corpus (from top to bottom the periods are Old, Classical and Contemporary, in the app this information is shown interactively upon hovering on the bars). It is important to note, however, that, while useful, this information is rather tentative, as the English equivalents have been automatically derived from a pre-existent dictionary (Hill 2010) without further human editing. Similarly tentative are the links to an external English patternbank, which are provided in another quasi-thesaurus section of the dictionary, the ‘semantics’ tab, which suggests near-synonyms for each sense of a headword. While English verb valency was well beyond the scope of the project, links to the Erlangen Valency Patternbank (
Second, in the Shiny dictionary the primary vehicle of information is data-visualization, while in Lexonomy it was text. Drawing on the design of the Sanskrit dictionary described in the previous section, a tree graph was deployed to summarise valency information (this can be found under the ‘Valency overview’ tab of the online dictionary). This graph efficiently combines a breakdown of a headword’s arguments with information about the overall frequency and diachronic distribution of both the headword and its arguments. It also allows users to group the data by either valency or period, thus allowing for different avenues of inquiry (figures 6.7 and 6.8).
More information regarding each verb’s arguments is conveyed via a combination wordclouds and tables (under the ‘Explore arguments’ tab in the online dictionary). Both are clickable and thus provide a gateway to the examples that illustrate a chosen word-argument combination, which are listed in the ‘Examples’ section of the dictionary (see figure 6.9).
Interactive barcharts are used to display the diachronic distribution of syntactic and semantic data, whereas a cluster of static visualisations portrays the frequency of a headword in different layers of the corpus and relative to other headwords (under the ‘Frequency’ tab in the online dictionary). Overall, these data-visualisations convey information much more clearly and engagingly than Lexonomy’s text entries.
Third, the Shiny dictionary has more digestible text sections. Instead of the long lists of valency patterns and examples that constituted Lexonomy entries, Shiny entries splits the text sections into two different tabs. The main lexicographic information for a headword is presented in the ‘Overview’ tab. This opens with a brief, automated summary of the frequency and diachronic distribution of a headword and then displays four panes detailing its senses, arguments and multi-word expressions as well as a handful of randomly sampled examples. Figure 6.10 shows the Overview tab for the verb ston.






Figure 6.8
Valency tree-graph grouped by argument
All the corpus examples for a headword are accessible via the ‘Examples’ tab, where they can be filtered according to a variety of criteria, including sense (for the cases where a sample sentence has been semantically annotated), argument, conjugated form and tense. It is also possible to sort examples according to different parameters. The default sorting is based on a ranking inspired by the GDEX paradigm (Kilgarriff et al. 2008). However, to allow users to view examples according to their own needs, it is possible to sort them in several ways, including by prioritising examples that are accompanied by a translation or examples that illustrate complex syntactic constructions, which typically rank low with the default sorting setting (Lugli 2021). Figure 6.10 illustrates a view of the Examples tab for the verb ston, the menus on the right are for filtering and sorting the examples according to a variety of criteria.






Overall, Shiny made possible the creation of engaging online dictionary with limited budget and in a relatively short time (about one year lapsed from first prototype to publication). It also allowed the lexicographic team to create a pipeline for the complete automation of dictionary entries that blends data derived from different sources.
As with the Buddhist Sanskrit project, the Shiny prototype of our Tibetan verb valency lexicon has been published as a first iteration of the dictionary. This iteration is best characterised as a proof of concept, because of the extremely small size of its corpus. Should a larger and suitably annotated corpus become available, this could easily be connected with the existing data-pipeline and online dictionary infrastructure, thus seamlessly yielding an expanded second iteration of this resource. To what extent the current nimble Shiny set-up would prove an appropriate vehicle for a definitive, stable version of the dictionary is an open question.
5 Viability of Shiny for Publishing Historical Digital Dictionaries
There is little doubt that Shiny is a powerful tool for prototyping web applications, including dictionaries. Whether it also a viable choice for bringing to the public the finished product is less clear-cut. There has been some debate in recent years about the suitability of Shiny for production (that is, for releasing a product to the public as opposed to developing and testing it internally). Major issues were raised around scalability, performance, and ease of testing and debugging (see e.g. Kasprzak et al 2021, Konrad 2020). To address these concerns, improved versions of Shiny as well as a number of R packages and extensions have been created, so that now an expanding arsenal of tools is in place to professionalize Shiny apps (e.g. Chang, Csárdi and Wickham 2021, Schloerke et al. 2021, Fay et al 2021 and Zyla et al. 2022).
This is a welcome development, but not without drawbacks. On the one hand, more sophisticated architectures and developing tools can improve the performance of Shiny applications and thus make them a better suited medium for online resources, including dictionaries. On the other, they tend to require more specialized skills and a significant time investment, which detracts from Shiny’s simplicity and agility and may re-introduce the need for engineers and IT professionals. Moreover, it should be noted that, no matter how technically sophisticated, Shiny applications may still not perform as sleekly as resources built with established web technologies in terms on loading time, speed and latency.
This is due to a number of factors. First and foremost, even if Shiny is created for the web, most of the R infrastructure is not. This creates problems at various levels. To begin with, most of the R packages on which typical Shiny apps rely, such as ggplot2 for creating data-visualisations, are unlikely to be optimized for online performance and therefore risk to slow down the application performance. More importantly, R is single threaded. This means that it can only engage in one computation at the time, which risks to create bottlenecks when multiple users are simultaneously connected to the same app. These challenges have now been successfully met and, with the right set-up, it is possible to scale Shiny apps to thousands of concomitant users (e.g. Rodziewicz 2021). The right set up, however, is not trivial. To serve multiple users simultaneously Shiny requires a rather complex server configuration. Out-of-the-box solutions are provided as commercial services, such as Posit’s
Using Shiny involves a trade-off. It sacrifices web performance in exchange for simplifying programming. This simplification is really advantageous insofar as it gives lexicographers direct control over feature development and reduces engineering costs. Projects where resources are sufficient for professional web development and the lexicographic team is unwilling to take up R programming, should probably not consider Shiny as an option for their dictionaries at all. Conversely, cases where budgets are tight and lexicographers proficient in R may benefit greatly from using this framework. In these cases, Shiny is surely a viable option for prototyping, but careful thought should be given to its role in production. At least three options are available to publish digital dictionaries prototyped in Shiny. First, lexicographers could hire a software developer to build and maintain an application based on their Shiny prototype, but using mainstream web technology. This solution is likely to prove costly both to build and to maintain, but it ensures the professional quality of the final product. Second, it is possible to optimize the code and consolidate the architecture of a Shiny app, with or without engineering assistance, and transform it from a prototype to a production-grade application. This requires a substantial time commitment, but not necessarily an expense and would almost certainly improve on the performance of the Shiny prototype. Finally, lexicographers could simply publish their Shiny prototype as is and accept its performance limitations.
Decisions in this regard should be informed not only by the afore-mentioned consideration regarding the trade-off between simplicity and performance, but also by the lexicographers’ expectations of what the use and role of the dictionary will be within its target audience and field. Two factors in particular should be taken into account: the amount of traffic that the dictionary is likely to attract, and whether it is the online dictionary or rather its data that stands to have the greater impact on the target field. If high volumes of traffic are expected, advanced architectural solutions and careful code optimisation are advisable; lest the app becomes unacceptably slow under the load of many concomitant users. Yet, advanced solutions and optimization will inevitably divert resources away from lexicographic curation, by allocating either budget or time (or both) to software development. In the case of many low-resource historical languages the creation of curated lexicographic data may well be more important than a sleek and fast online app. Especially given that the curated data can be made available for download independently from the app. In this case publishing a good-quality prototype seems reasonable.
This is, indeed, the avenue that was pursued by both the Sanskrit and the Tibetan dictionary projects discussed here. Both resources target low-resource languages for which the creation of curated lexical data is the primary desideratum. Both resources are also highly specialised and have limited target audiences. Hence, relatively low traffic is expected. Moreover, both resources are currently still in their proof-of-concept phase, with the Sanskrit dictionary still undergoing active development and the Tibetan one awaiting a larger suitably annotated corpus to become available.
Under these circumstances, it seemed appropriate to limit the cost and effort required to share the dictionaries online, and opt for deploying them as Shiny applications through shinyapps.io hosting service, with minimal code optimization. So far, this solution has worked fine. No noticeable drop in the app performance has been perceived with a few concomitant users; but neither online dictionary has been tested with large numbers of simultaneous connections. Both applications are slow by contemporary standards (1.6 seconds to interactivity with an internet speed of 71MB per second, whereas most internet resources would take less than a second), but a less than ideal loading time appears a reasonable price to pay for the convenience and savings of dispensing with complex software architectures.
Also, this set-up may not be a long-term solution. For the Sanskrit dictionary, once the entry design stabilizes and the lexicographic data is completed, a software engineer could be tasked with building a final version of our dictionary using more mature web technologies, as originally planned (Lugli 2019). After a few years of prototyping in Shiny, however, achieving a ‘final’ version appears less and less desirable. It is probable that, funding permitting, this resource will keep evolving as new sources and research angles emerge in the field of Buddhist Sanskrit. Coupled with the increasing number of solutions available to improve the scalability and robustness of Shiny apps, this points to a different production strategy for this resource: a deep restructuring of the existent dictionary application, possibly with the help of engineering consultants, so that it can be redeployed in the form of a more solidly built Shiny app (e.g. following the architectures proposed by Fay et al 2021 or Zyla et al 2022).
Finally, the server solution for the online deployment of the dictionary will largely depend on the options that will be available at the time. Shinyapps.io is extremely convenient, but somewhat limiting. A number of features, including custom urls and capacity to respond to increases in traffic with larger number of web processes, come at an extra cost. Other hosting solutions offer more competitive deals, but presently require more complex configurations that are best tackled with the assistance of IT professionals. Hopefully, simpler solutions will become available with time and make the deployment of production-grade Shiny apps as effortless as publishing Shiny prototypes.
6 Future Work
For all its simplicity, Shiny still requires coding. This creates a barrier for many lexicographers who are not familiar with programming and cannot set aside the time to learn. A way to lower this barrier would be to offer a web application that allows users to prototype a Shiny-based dictionary through a graphical user interface, thus removing the need for lexicographers to actually write code. This would make rapid dictionary prototyping with Shiny accessible to anyone, and facilitate the development of creative lexicographic solutions to respond to the unique challenges individual lexicographic endeavours may face.
Inevitably, though, such a rapid-prototyping app would also take away much of Shiny’s flexibility, as only data-visualizations and layouts envisaged by the creators of the app would be available for lexicographers to experiment with. This is a common pitfall of ready-made solutions, such as Lexonomy (see section 4 above). Yet, there is scope for a Shiny-based dictionary prototyping app to go beyond this limitation and also have a didactic effect.
By creating a Shiny dictionary, the dictionary-prototyping app would not only output an interactive dictionary application, but also a file with the code powering such application. Given the simplicity of Shiny, this code would be concise and fairly easy to interpret, especially with the help of accompanying documentation. Thus scaffolded by the code written through the app, users could be encouraged to modify the code and customise their digital dictionaries beyond the constraints of the dictionary-prototyping application. Needless to say, such dictionary prototyping application could be itself a Shiny app, thus taking advantage of the simplicity of this framework and allowing its developers to respond rapidly to feedback and requests from lexicographers. In a recent small-scale survey conducted at the end of a practical lexicography workshop at the State University of Sao Paulo (Brazil), all 11 responders stated that such a dictionary-prototyping app would help their lexicographic projects. This led to a funding bid to develop such application. As a result, thanks to a grant from the USA’s National Endowment for the Humanities, work is currently underway to prototype a Shiny-based dictionary building tool (
7 Conclusions
Shiny is a R web-development framework that greatly simplifies the creation of interactive data-centric applications. Since online corpus-based dictionaries are fundamentally data-centric applications and given that many lexicographers are familiar with R programming through statistical analyses, Shiny can offer many dictionary curators a valuable opportunity to take full control over the design and prototyping of their resources. As exemplified by the Sanskrit and Tibetan dictionary projects discussed in this chapter, lexicographers’ engagement in the prototyping process has the immediate benefit of reducing development costs and making the creation of digital dictionaries more affordable and less dependent on major funding. It also holds the potential to lead to the creation of dictionaries more finely attuned to their field’s and audience’s specific needs, because it enables rapid experimentation to address problems specific to a domain of knowledge as well as users’ feedback and newly emerging research questions. This is especially important for historical lexicography projects, as their sources often pose unique challenges that call for creative solutions, which may be difficult to find when outsourcing digital development to professionals outside of the field.
Acknowledgments
A Visual Dictionary of Tibetan Verb Valency has been funded by UKRI (AH/P004644/1), and A Visual Dictionary and Thesaurus of Buddhist Sanskrit by the Mangalam Research Center. Initial planning of a Shiny-based dictionary prototyping app was made possible by a CAPES-PRINT grant (88887.582674/2020–00); the development of a prototype for such app has been funded by the NEH through a Digital Advancement Grant level 1 (HAA-290402-23).
References
Badgeley, M., Liu, M., Glicksberg, B., Shervey, M., Zech, J., Shameer, K., Lehar, J., Oermann, E., McConnell, M., Snyder, Th., Dudley, J. (2019). CANDI: an R package and Shiny app for annotating radiographs and evaluating computer-aided diagnosis, Bioinformatics, 35(9): 1610–1612. http://dx.doi.org/10.1093/bioinformatics/bty855
Chang, W., Cheng J., Allaire, J.J., Xie, Y., McPherson, J. (2021). Shiny: https://CRAN.R-project.org/package=shiny.
Lugli, L. (2021). Dictionaries as collections of data stories: an alternative post-editing model for historical corpus lexicography. In Itzok Kosem, et al. 10.5281/zenodo.3457821Lugli, L., Galasek-Hul, B. Quiñones, L.G. (2019). Segmented Corpus of Buddhist Sanskrit (Proof of Concept). DOI: 10.5281/zenodo.3457821, https://bit.ly/BuddhistSanskritCorpus. Accessed on February 2022.