Publications

Characterizing the Effects of Translation on Intertextuality using Multilingual Embedding Spaces

Published in ArXiv, 2024

Rhetorical devices are difficult to translate, but they are crucial to the translation of literary documents. We investigate the use of multilingual embedding spaces to characterize the preservation of intertextuality, one common rhetorical device, across human and machine translation. To do so, we use Biblical texts, which are both full of intertextual references and are highly translated works. We provide a metric to characterize intertextuality at the corpus level and provide a quantitative analysis of the preservation of this rhetorical device across extant human translations and machine-generated counterparts. We go on to provide qualitative analysis of cases wherein human translations over- or underemphasize the intertextuality present in the text, whereas machine translations provide a neutral baseline. This provides support for established scholarship proposing that human translators have a propensity to amplify certain literary characteristics of the original manuscripts.

Download here

Computational Discovery of Chiasmus in Ancient Religious Text

Published in ArXiv, 2024

Chiasmus, a debated literary device in Biblical texts, has captivated mystics while sparking ongoing scholarly discussion. In this paper, we introduce the first computational approach to systematically detect chiasmus within Biblical passages. Our method leverages neural embeddings to capture lexical and semantic patterns associated with chiasmus, applied at multiple levels of textual granularity (half-verses, verses). We also involve expert annotators to review a subset of the detected patterns. Despite its computational efficiency, our method achieves robust results, with high inter-annotator agreement and system accuracy of 0.80 at the verse level and 0.60 at the half-verse level. We further provide a qualitative analysis of the distribution of detected chiasmi, along with selected examples that highlight the effectiveness of our approach.

Download here

Detecting Narrative Patterns in Biblical Hebrew and Greek

Published in Proceedings of the 1st Workshop on Machine Learning for Ancient Languages, 2024

We present a novel approach to extracting recurring narrative patterns, or type-scenes, in Biblical Hebrew and Biblical Greek with an information retrieval network. We use cross-references to train an encoder model to create similar representations for verses linked by a cross-reference. We then query our trained model with phrases informed by humanities scholarship and designed to elicit particular kinds of narrative scenes. Our models can surface relevant instances in the top-10 ranked candidates in many cases. Through manual error analysis and discussion, we address the limitations and challenges inherent in our approach. Our findings contribute to the field of Biblical scholarship by offering a new perspective on narrative analysis within ancient texts, and to computational modeling of narrative with a genre-agnostic approach for pattern-finding in long, literary texts.

Download here

Published in , 1900

Published in , 1900

Deep Learning Facilitates Alignment of Coordinate-Targeted Superresolution Microscopes

Published in Focus On Microscopy, 2021

In STED microscopy, an additional laser is used to deplete fluorescence and limit emission to a tightly confined, subdiffraction-sized volume around an intensity minimum. In theory, achievable resolution is unlimited and scales with the intensity of the depletion laser. In practice, aberrations, misalignments and scattering deflect light into the intensity minima, deplete the signal and deteriorate signal to noise ratio [1]. STED microscopes are re-aligned regularly to maintain performance, which is a time-consuming task requiring an experienced expert. Recently, machine learning has been successfully combined with microscopy, e.g. for image processing or aberration correction [2]. Here, we demonstrate a neural net capable of recognizing and correcting common misalignments and aberrations. A training pair consists of (1) a weighted combination of Zernike polynomials and (2) images of the aberrated PSF. In contrast to [3], we create our training data in-silico using vector diffraction theory [4]. By using all three orthogonal crosssections of the PSF, we achieve better correction than study [3]. Our workflow can be adapted to other intensity distributions simply by replacing the vortex pattern used for training data generation.

Recommended citation: Jahr, Wiebke. McGovern, Hope. Danzl, Johann Georg (2020). "Paper Title Number 3." Journal 1. 1(3). https://www.focusonmicroscopy.org/past/2021/PDF/1081_Jahr.pdf

A Source-Criticism Debiasing Method for GloVe Embeddings

Published in ICML 2021 Workshop on Theoretic Foundation, Criticism, and Application Trend of Explainable AI, 2021

It is well-documented that word embeddings trained on large public corpora consistently exhibit known human social biases. Although many methods for debiasing exist, almost all fixate on completely eliminating biased information from the embeddings and often diminish training set size in the process. In this paper, we present a simple yet effective method for debiasing GloVe word embeddings (Pennington et al., 2014) which works by incorporating explicit information about training set bias rather than removing biased data outright. Our method runs quickly and efficiently with the help of a fast bias gradient approximation method from Brunet et al. (2019). As our approach is akin to the notion of ‘source crit-icism’ in the humanities, we term our method Source-Critical GloVe (SC-GloVe). We show that SC-GloVe reduces the effect size on Word Embedding Association Test (WEAT) sets without sacrificing training data or TOP-1 performance.

Recommended citation: McGovern, Hope. (2021). "A Source-Criticism Debiasing Method for GloVe Embeddings." ICML 2021 Workshop on Theoretic Foundation, Criticism, and Application Trend of Explainable AI. 1(1). https://arxiv.org/pdf/2106.13382.pdf