The lambda notebook project aims to make this happen. It provides:
flexible typed metalanguage: a superset of first-order predicate logic, together with a rich type theory.
a framework for doing semantic composition on natural language syntax
an api for rendering the structured python objects in layers 1-2 in easy-to-read MathJax (web-oriented LaTeX code) and HTML.
2 Background: Two problems in formal semantics
Type-driven computation could be a lot easier to visualize and check.
Grammar fragments as in Montague Grammar: good idea in principle, hard to use in practice.
A fragment is a complete formalization of sublanguage consisting of the key relevant phenomena for the problem at hand. (Potential problem-points italicized.)
Solution: a system for developing interactive fragments: the “Lambda Notebook” project.
Creator can work interactively with analysis – accelerate development, limit time spent on tedious details.
Reader can explore derivations in ways that are not typically possible in typical paper format.
Creator and reader can be certain that derivations work, verified by the system.
Bring closer together formal semantics and computational modeling.
Inspirations and related projects:
Eijck and Unger (2010): implementation of compositional semantics in Haskell. No interface (beyond standard Haskell terminal); great if you like Haskell. There’s a rich tradition of programmer linguists writing fragments in Haskell following this book.
nltk.sem: implementation of the lambda calculus with a typed metalanguage, interface with theorem provers. No interactive interface.
Champollion, L., J. Tauberer, and M. Romero. 2007. “The Penn Lambda Calculator: Pedagogical Software for Natural Language Semantics.” In Proceedings of the Grammar Engineering Across Frameworks (GEAF) 2007 Workshop, edited by T. Holloway King and E. M. Bender. https://www.ling.upenn.edu/~champoll/lambda-geaf.pdf.
2.1 The role of formalism & fragments in semantics
What does formal mean in semantics? What properties should a theory have?
Mathematically precise (lambda calculus, type theory, logic, model theory(?), …)
Complete (covers “all” the relevant data).
Predictive (like any scientific theory).
Consistent, or at least compatible (with itself, analyses of other phenomena, some unifying conception of the grammar).
Partee, Barbara H., and Herman L. W. Hendriks. 1997. “Montague Grammar.” In Handbook of Logic and Language, edited by Johan van Benthem and Alice G. B. ter Meulen, 5–91. Elsevier; MIT Press. https://doi.org/10.1016/B978-044481714-3/50004-7.
Paper with a fragment provides a working system. (Probably.)
Explicit outer bound for empirical coverage.
Integration with a particular theory of grammar. (To some extent.)
Explicit answer to many detailed questions not necessarily dealt with in the text.
Claim: fragments are a method of replicability, similar to a computational modeller providing their model.
To be clear, a fragment is neither necessary nor sufficient for having a good theory / analysis / paper…
Additional benefit: useful internal check for researcher.
“…But I feel strongly that it is important to try to [work with fully explicit fragments] periodically, because otherwise it is extremely easy to think that you have a solution to a problem when in fact you don’t.” (Partee 1979, p. 41)
2.2 The challenges of fragments
Part 1 of the above quote:
“It can be very frustrating to try to specify frameworks and fragments explicitly; this project has not been entirely rewarding. I would not recommend that one always work with the constraint of full explicitness.” (Ibid.)
Fragments can be tedious and time-consuming to write (not to mention hard).
Fragments as traditionally written are in practice not easy for a reader to use.
Dense/unapproachable. With exactness can come a huge chunk of hard-to-digest formalism. E.g. Partee (1979), the fragment is about 10% of the paper.
Monolithic/non-modular. For the specified sublanguage, everything is specified. Outside the bounds of the sublanguage, nothing is specified. How does the theory fit in with others?
Exact opposite of the modern method – researchers typically hold most aspects of the grammar constant (implicitly) while changing a few key points. (See e.g., the introduction to Portner and Partee 2002 for discussion.)
Partee, Barbara H. 1979. “Constraining Transformational Montague Grammar: A Framework and a Fragment.” In Linguistics, Philosophy, and Montague Grammar, edited by S. Davis and M. Mithun, 51–101. University of Texas Press. https://doi.org/10.7560/746251-004.
Portner, Paul, and Barbara Partee, eds. 2002. Formal Semantics: The Essential Readings. Blackwell Publishing.
Summary: In practice, the typical payoff for neither the reader nor the writer of a fragment exceeded the effort.
3 A solution: digital fragments
Eijck and Unger (2010)’s solution: we can (and perhaps should) specify a fragment in digital form.
Eijck, Jan van, and Christina Unger. 2010. Computational Semantics with Functional Programming. Cambridge University Press. https://doi.org/10.1017/CBO9780511778377.
They use Haskell. Type system of Haskell extremely well-suited to natural language semantics.
(Provocative statement) Interface, learning curve of Haskell not well suited to semanticists (or most people)? At a minimum: reading code is not always easy.
Benefits of digital fragments (in principle)
Interactive.
Easy to distribute, adapt, modify.
Possibility of modularity. (E.g. abstract a ‘library’ for compositional systems away from the analysis of a particular phenomenon.)
Bring closer together the CogSci idea of a ‘computational model’ to the project of natural language semantics.
What sorts of things might we want in a fragment / system for fragments?
Typed lambda calculus.
Logic / logical metalanguage.
Model theory.
Framework for semantic composition.
The Lambda Notebook project aims to provide these tools in a usable, interactive, format, built on type of Python.
3.1 Part 1: flexible typed metalanguage
The metalanguage infrastructure is a set of python classes that implement the building blocks of logical expressions, lambda terms, and various other formal objects, as well as complex formulas built from these pieces. This rests on an implementation of a framework for type systems that matches what semanticists tend to assume.
Preface cell with %%lamb to enter metalanguage formulas directly. The following cell defines a variable x that has type e, and exports it to the notebook’s environment.
%%lamb resetx = x_e
\({x}_{e}\:=\:{x}_{e}\)
x.type
\(e\)
This next cell defines some variables whose values are more complex object – in fact, functions in the typed lambda calculus.
%%lambtest1 = L p_t : L x_e : P_<e,t>(x) & p # based on a Partee et al exampletest1b = L x_e : P_<e,t>(x) & Q_<e,t>(x)t2 = Q_<e,t>(x_e)
These are now registered as variables in the python namespace and can be manipulated directly. A typed lambda calculus is fully implemented with all that that entails – e.g. the value of test1 includes the whole syntactic structure of the formula, its type, etc. and can be used in constructing new formulas. The following cells build a complex function-argument formula, and following that, does the reduction.
Notice that beta reduction works properly, i.e. bound \(x\) in the function is renamed in order to avoid collision with the free x in the argument. (test1b is based on an example illustrating alpha conversion from Partee, Meulen, and Wall 1993.)
Type checking of course is a part of all this. If the types don’t match, the computation will throw a TypeMismatch exception. The following cell uses python syntax to catch and print such errors.
with lamb.errors(): test1(x) # function is type <t,<e,t>> so will trigger a type mismatch.
TypeMismatch: (λ p_t: (λ x_e: (p_t & P_<e,t>(x_e))))/<t,<e,t>> and x_e conflict (Incompatible function-argument types <t,<e,t>> and e)
What is going on behind the scenes? The objects manipulated are recursively structured python objects (of class TypedExpr). Each layer of recursion stores information about the kind of metalanguage expression it is, as well as its parts.
Many straightforward expressions can be parsed. Most expressions are created using a call to TypedExpr.factory, which is abbreviated as “te” in the following examples. The %%lamb magic is calling this behind the scenes.
x =%te x_ex
\({x}_{e}\)
Various convenience python operators are overloaded, including functional calls. Here is an example repeated from earlier in two forms:
Building a function-argument expression builds a complex, unreduced expression. This can be explicitly reduced (note that the reduce_all() function would be used to apply reduction recursively):
catf(te("y_e")).reduce()
\({Cat}({y}_{e})\)
(catf(te("y_e")).reduce()).derivation
${[\lambda{} x_{e} \: . \: {Cat}({x})]}({y}_{e})$
Reduction
1.
${Cat}({y}_{e})$
The metalanguage supports type polymorphism. For example, we can define a function whose input type is a type variable (X) and then combine that function with a concrete type, to force type narrowing:
%lamb ttest = L x_X : P_<X,t>(x)%lamb tvar = y_tttest(tvar)
Other operators not illustrated here include set theoretic expressions, tools for building partial functions and denotations, a rich approach to “meta-meta-language”, restricted quantifiers, and more.
3.1.1 Model theory and evaluation
The lambda notebook supports model theory and evaluation of extensional formulas.
3.2 Part 2: composition systems for an object language
On top of the metalanguage are ‘composition systems’ for modeling (step-by-step) semantic composition in an object language such as English. This is the part of the lambda notebook that tracks and manipulates mappings between object language elements (words, trees, etc) and denotations in the metalanguage.
A composition system at its core consists of a set of composition rules; the following cell defines a simple composition system that will be familiar to anyone who has taken a basic course in compositional semantics. (See among others, Heim and Kratzer 1998; Coppock and Champollion 2024)
# none of this is strictly necessary, the built-in library already provides effectively this system.fa = lang.BinaryCompositionOp("FA", lang.fa_fun, reduce=True)pm = lang.BinaryCompositionOp("PM", lang.pm_fun, commutative=True, reduce=True)pa = lang.BinaryCompositionOp("PA", lang.pa_fun, allow_none=True)demo_hk_system = lang.CompositionSystem(name="demo system", rules=[fa, pm, pa])lang.set_system(demo_hk_system)demo_hk_system
Composition system 'demo system' Operations: { Binary composition rule FA, built on python function 'lamb.lang.fa_fun' Binary composition rule PM, built on python function 'lamb.lang.pm_fun' Binary composition rule PA, built on python function 'lamb.lang.pa_fun' }
Expressing denotations is done in a %%lamb cell, and almost always begins with lexical items. The following cell defines several lexical items that will be familiar from introductory exercises in the Heim and Kratzer (1998) textbook.
%%lamb||cat||= L x_e: Cat_<e,t>(x)||gray||= L x_e: Gray_<e,t>(x)||john||= John_e||julius||= Julius_e||inP||= L x_e : L y_e : In_<(e,e),t>(y, x) # `in` is a reserved word in python||texas||= Texas_e||isV||= L p_<e,t> : p # `is` is a reserved word in python
All object-language representations implement the interface lamb.lang.Composable, including lexical items as well as the complex results shown below. In type-driven mode, composition is triggered by using the ‘*’ operator on a Composable. This searches over the available composition operations in the system to see if any results can be had. Given their types, we expect inP and texas above to be able to compose using the FA rule:
Composition will find all possible paths, but in the current example there are no ambiguities. (Note: the metalanguage by default normalizes the order of conjuncts alphabetically, so the order in the output of PM is independent of what composes with what. This is why the operation is marked “commutative” when defined earlier – so the composition system knows it doesn’t need to bother with both orders.)
INFO (core): Coerced guessed type for 'Fond_t' into <e,t>, to match argument 'y_e'
INFO (core): Coerced guessed type for 'Fond_<e,t>(y_e)' into <e,t>, to match argument 'x_e'
There is support for traces and indexed pronouns, using a version of the Predicate Abstraction (PA) rule (based on the version in Coppock and Champollion 2024).
In many contexts it is natural to just use bottom-up composition, especially given the implementation of the PA rule. However, the lambda notebook supports arbtirarily-sequenced composition in tree structures, with deferred composition when missing bottom-up information. This uses Tree objects with an interface designed after those in the nltk package. In these derivations, you can see some of the support for inference over variable types. This composition system more directly matches the classic one presented in Heim and Kratzer (1998).
Heim, Irene, and Angelika Kratzer. 1998. Semantics in Generative Grammar. Blackwell.
lang.set_system(lang.hk_system)
%%lamb||gray||= L x_e : Gray_<e,t>(x)||cat||= L x_e : Cat_<e,t>(x)
We can do composition in this tree fragment, but the leaf nodes (not being in the lexicon) are treated as placeholders. They are accordingly assigned polymorphic types.
(The type \(\forall X\) is a type that can be anything at all; this gets narrowed by each possible composition rule that could apply to a more specific but still potentially polymorphic type. PM in this system narrows to concrete property types, but each order of FA gives a polymorphic function.)
We can supply lexical entries to form a complete tree fragment. Composition can still happen in any order, e.g. by default it works top down. If we compose one step the leaf node “gray” is not yet looked up; but composing the whole thing is possible.
Tree = lamb.utils.get_tree_class()t = Tree("NP", ["gray", Tree("N", ["cat"])])t
3.3 Part 3: human readable rendering of structured python objects
At this point in the demo, the output of the lambda notebook needs no introduction: it has been used at every stage of the document. The lambda notebook is designed from the ground up so that nearly every python class supports IPython/Jupyter’s rich outputs, rendering in human-readable form to a combination of LaTeX and HTML that can be displayed in Jupyter Notebooks (Kluyver et al. 2016) and on the web.
Kluyver, Thomas, Benjamin Ragan-Kelley, Fernando Pérez, Brian Granger, Matthias Bussonnier, Jonathan Frederic, Kyle Kelley, et al. 2016. “Jupyter Notebooks – a Publishing Format for Reproducible Computational Workflows.” Edited by F. Loizides and B. Schmidt. IOS Press.
LaTeX math mode output in Jupyter is rendered via MathJax (and on google colab, via KaTeX).
For example, here is an example lambda expression from earlier:
pmw_test1 =%te L p_t : L x_e : P_<e,t>(x) & ppmw_test1