Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Pages

Posts

New Resource Released

less than 1 minute read

Published:

I compiled and released an original language corpus of the Greek New Testament on the Huggingface Hub with word-level grammatical annotations, English glosses, and manuscript attestation. You can see it here

Paper Accepted

less than 1 minute read

Published:

Our paper ‘Your Large Language Models are Leaving Fingerprints’ was accepted at the ‘Detecting AI-Generated Content’ Workshop at COLING 2025.

Invited Talk

less than 1 minute read

Published:

‘Will AI Change the World For Good?’ delivered at the FEUER Academic Speakers’ Network.

Invited Talk

less than 1 minute read

Published:

‘AI Innovation and the Limits of Humanity’ delivered as part of the Life in the Round speaker series at the historic Round Church in Cambridge. ~200 in attendance.

portfolio

publications

A Source-Criticism Debiasing Method for GloVe Embeddings

Published in ICML 2021 Workshop on Theoretic Foundation, Criticism, and Application Trend of Explainable AI, 2021

It is well-documented that word embeddings trained on large public corpora consistently exhibit known human social biases. Although many methods for debiasing exist, almost all fixate on completely eliminating biased information from the embeddings and often diminish training set size in the process. In this paper, we present a simple yet effective method for debiasing GloVe word embeddings (Pennington et al., 2014) which works by incorporating explicit information about training set bias rather than removing biased data outright. Our method runs quickly and efficiently with the help of a fast bias gradient approximation method from Brunet et al. (2019). As our approach is akin to the notion of ‘source crit-icism’ in the humanities, we term our method Source-Critical GloVe (SC-GloVe). We show that SC-GloVe reduces the effect size on Word Embedding Association Test (WEAT) sets without sacrificing training data or TOP-1 performance.

Recommended citation: McGovern, Hope. (2021). "A Source-Criticism Debiasing Method for GloVe Embeddings." ICML 2021 Workshop on Theoretic Foundation, Criticism, and Application Trend of Explainable AI. 1(1). https://arxiv.org/pdf/2106.13382.pdf

Deep Learning Facilitates Alignment of Coordinate-Targeted Superresolution Microscopes

Published in Focus On Microscopy, 2021

In STED microscopy, an additional laser is used to deplete fluorescence and limit emission to a tightly confined, subdiffraction-sized volume around an intensity minimum. In theory, achievable resolution is unlimited and scales with the intensity of the depletion laser. In practice, aberrations, misalignments and scattering deflect light into the intensity minima, deplete the signal and deteriorate signal to noise ratio [1]. STED microscopes are re-aligned regularly to maintain performance, which is a time-consuming task requiring an experienced expert. Recently, machine learning has been successfully combined with microscopy, e.g. for image processing or aberration correction [2]. Here, we demonstrate a neural net capable of recognizing and correcting common misalignments and aberrations. A training pair consists of (1) a weighted combination of Zernike polynomials and (2) images of the aberrated PSF. In contrast to [3], we create our training data in-silico using vector diffraction theory [4]. By using all three orthogonal crosssections of the PSF, we achieve better correction than study [3]. Our workflow can be adapted to other intensity distributions simply by replacing the vortex pattern used for training data generation.

Recommended citation: Jahr, Wiebke. McGovern, Hope. Danzl, Johann Georg (2020). "Paper Title Number 3." Journal 1. 1(3). https://www.focusonmicroscopy.org/past/2021/PDF/1081_Jahr.pdf

Published in , 1900

Published in , 1900

Detecting Narrative Patterns in Biblical Hebrew and Greek

Published in Proceedings of the 1st Workshop on Machine Learning for Ancient Languages, 2024

We present a novel approach to extracting recurring narrative patterns, or type-scenes, in Biblical Hebrew and Biblical Greek with an information retrieval network. We use cross-references to train an encoder model to create similar representations for verses linked by a cross-reference. We then query our trained model with phrases informed by humanities scholarship and designed to elicit particular kinds of narrative scenes. Our models can surface relevant instances in the top-10 ranked candidates in many cases. Through manual error analysis and discussion, we address the limitations and challenges inherent in our approach. Our findings contribute to the field of Biblical scholarship by offering a new perspective on narrative analysis within ancient texts, and to computational modeling of narrative with a genre-agnostic approach for pattern-finding in long, literary texts.

Download here

Computational Discovery of Chiasmus in Ancient Religious Text

Published in ArXiv, 2024

Chiasmus, a debated literary device in Biblical texts, has captivated mystics while sparking ongoing scholarly discussion. In this paper, we introduce the first computational approach to systematically detect chiasmus within Biblical passages. Our method leverages neural embeddings to capture lexical and semantic patterns associated with chiasmus, applied at multiple levels of textual granularity (half-verses, verses). We also involve expert annotators to review a subset of the detected patterns. Despite its computational efficiency, our method achieves robust results, with high inter-annotator agreement and system accuracy of 0.80 at the verse level and 0.60 at the half-verse level. We further provide a qualitative analysis of the distribution of detected chiasmi, along with selected examples that highlight the effectiveness of our approach.

Download here

Characterizing the Effects of Translation on Intertextuality using Multilingual Embedding Spaces

Published in ArXiv, 2024

Rhetorical devices are difficult to translate, but they are crucial to the translation of literary documents. We investigate the use of multilingual embedding spaces to characterize the preservation of intertextuality, one common rhetorical device, across human and machine translation. To do so, we use Biblical texts, which are both full of intertextual references and are highly translated works. We provide a metric to characterize intertextuality at the corpus level and provide a quantitative analysis of the preservation of this rhetorical device across extant human translations and machine-generated counterparts. We go on to provide qualitative analysis of cases wherein human translations over- or underemphasize the intertextuality present in the text, whereas machine translations provide a neutral baseline. This provides support for established scholarship proposing that human translators have a propensity to amplify certain literary characteristics of the original manuscripts.

Download here

talks

teaching

Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Workshop, University 1, Department, 2015

This is a description of a teaching experience. You can use markdown like any other post.