August 22, 2023
This is the fifth post in a series on implementing a type checker with the Eqlog Datalog engine. In this post, we’ll extend our type system with generics, or polymorphism, and implement Hindley-Milner type inference.
You can find the other posts here:
In the last post, we completed our type checker for a simply typed language. This type checker accepts a program only if it can assign a uniquely determined, concrete type to every expression and variable in the program. But this can be overly restrictive; our type checker rejects some sensible programs. Consider this example:
function id(x) {
return x;
}
let a = id(5);
let b = id('xyz');
The problem here is a type conflict for the variable x
:
Our type checker infers that x
must have type
number
due to the call id(5)
, but that it must
also have type string
due to the call
id('xyz')
. On the other hand, if we removed the two calls
to id
, so that id
was never called, then our
type checker would reject the program because the type of x
would not be determined.
Our goal for this post is to implement polymorphism, or
generics, which will make the program above type check with
inferred polymorphic type id : (a) => a
where
a
is a type variable. At the two call sites of the
id
function, our new version of the type checker implicitly
instantiates the polymorphic type of id
by plugging in
number
and string
, respectively, for the type
variable a
.
Polymorphism makes it necessary for us to deal with type variables.
We represent these in our Eqlog program as a Type
elements
which cannot be obtained by applying one of the type constructors to
other Type
elements. Our previous checker for simple types
would report an error if it found such Type
elements after
Eqlog evaluation, but we will relax this check later on.
In addition to the monomorphic types or monotypes that we’ve
seen so far, we now also consider polymorphic types or
polytypes. Both monotypes and polytypes are given by an
underlying Type
element.
Sort GeneralType;
Func MonoType : Type -> GeneralType;
Func PolyType : Type -> GeneralType;
The difference between monotypes and polytypes lies in how we
interpret type variables occurring in the underlying Type
element:
A MonoType(ty)
is well-formed only if all type
variables occurring in ty
are part of the ambient type
context. Type contexts are sets of Type
elements that
parametrize a program fragment; more on this below. For example, monotypes occurring in top
level module scope may not contain any type variables at all, and within
the body of the id
function, the only type variable that a
monotype may refer to is the type of the argument
x
.
In PolyType(ty)
, the ty
element
represents a type scheme: We interpret type variables in
ty
that are not in the ambient type context as
bound or universally quantified. The set of bound type
variables in a polytype is often syntactically denoted like so:
id: forall a. (a) => a
The flavor of polymorphism we’re implementing here is usually called
prenex or rank 1 polymorphism. “Prenex” because types
can be (implicitly) quantified over only on the top level, i.e. in
front of the type. Thus forall a. (a) => a
is
valid, whereas
(forall a. a => a) -> (forall b. b => ())
is
invalid. And “rank 1” because our type system has rank 0 types
(monotypes) and rank 1 types (polytypes) which quantify over rank 0
types, but no more: We could also imagine rank 2 types that are allowed
to quantify over rank 1 types, rank 3 types which quantify over rank 2
types and so forth.
Since types of variables can now be polymorphic, we have to change
our VarTypeInX : X -> Type
function family to be valued
in GeneralType
:
Func VarTypeInStmts : StmtListNode -> GeneralType;
Func VarTypeInExpr : ExprNode -> GeneralType;
...
While the axioms propagating variable bindings through syntax nodes can stay the same, we have to decide for each AST node that introduces a variable binding whether the variable has monotype or polytype.
Variables introduced by function statements have polytypes.
Axiom
ConsStmtListNode(_, head, tail)
& FunctionStmtNode(head, func)
& Function(func, var, _, _ ,_)
& ty = PolyType(FunctionNodeType(func))
=>
VarTypeInStmts(var, tail) = ty
;
Variables introduced by let statements have monotypes.
Axiom
ConsStmtListNode(_, head, tail)
& LetStmtNode(head, var, ty_annot, expr)
& ty = SemanticOptType(ty_annot)
& expr_type = ExprType(expr)
& mono_expr_type = MonoType(expr_type)
=>
expr_type = ty
& VarTypeInStmts(var, tail) = mono_expr_type
;
We’re thus following Rust’s type system, where variables introduced
in let statements must have monotypes, too. TypeScript, on the other
hand, allows such variables to have polytypes. Since our let bindings
have monotypes, our expressions have monotypes as well. This allows us
to reuse most of the existing ExprType
machinery that we’ve
worked on in the last post, but with some straightforward adaptations we
could also implement TypeScript’s let bindings instead.
Function argument variables have monotypes.
Axiom
ConsArgListNode(_, var, otn, tail)
& ty = MonoType(SemanticOptType(otn))
=>
VarTypeInArgList(var, tail) = ty
;
This is a general limitation of rank 1 polymorphism: The types of function argument variables appear in the domain of a function type. So if function argument types were polytypes, then this function type would be rank 2.
Inside the argument list (and hence body) of a function literal, the function being defined has monotype.
Axiom
Function(func, var, arg, _ ,_)
& ty = MonoType(FunctionNodeType(func))
=>
VarTypeInArgList(var, arg) = ty
;
This means that recursive calls may not instantiate the function being defined with different type arguments.
Which type variables in a polytype are bound depends on the context of the polytype: If the polytype appears in the global module scope, then all type variables in it are bound. But if a polytype appears in the body of function, then only those type variables that are not already introduced by arguments of the containing function are bound. Consider the following convoluted way of implementing the identity function:
function id(x) {
function const_x(y) {
return x;
}return const_x(5);
}
Suppose that the type of x
is MonoType(a)
and that the type of y
is MonoType(b)
for type
variables a
and b
. Then the type of
id
is PolyType((a) => a)
, and since
id
appears in the global module scope, a
is
bound. The type of const_x
is
PolyType((b) => a)
, but since it appears in the body of
the id
function, so that the ambient type context contains
a
, only b
is bound.
Because of local functions such as the one above, we need to keep track of which type variables were already introduced in the body of a function, and we need to propagate this information into relevant subnodes. This is the purpose of type contexts, which we define in Eqlog as follows:
Sort TypeContext;
Pred TypeInContext : Type * TypeContext;
Pred TypesInContext : TypeList * TypeContext;
Think of TypeContext
elements as (non-extensional)
sets of Type
elements, and the TypeInContext
predicate as the membership relation. We add axioms (here omitted) so
that TypesInContext(tys, ctx)
holds if and only if every
type in the tys
type list is in ctx
.
Next we introduce total functions to associate type contexts to various syntax nodes:
Func ModuleTypeContext : ModuleNode -> TypeContext;
Func FunctionTypeContext : FunctionNode -> TypeContext;
Func StmtTypeContext : StmtNode -> TypeContext;
Func ExprTypeContext : ExprNode -> TypeContext;
...
Axiom mn : ModuleNode => ModuleTypeContext(mn)!;
Axiom fn : FunctionNode => FunctionTypeContext(fn)!;
...
The type context we associate to an AST node usually agrees with the type context of its parent node. For example, the axiom
Axiom
EqualsExprNode(expr, lhs, rhs)
& expr_ctx = ExprTypeContext(expr)
& lhs_ctx = ExprTypeContext(lhs)
& rhs_ctx = ExprTypeContext(rhs)
=>
expr_ctx = lhs_ctx
& lhs_ctx = rhs_ctx
;
propagates type contexts through equality comparisons.
Crucially, though, the type context associated to function definition literals is not the same as the type context of its parent node but an extension of it: A type context that contains all the types of the base type context it extends, but that may contain additional types. We axiomatize type context extensions as follows:
Pred ContextExtension : TypeContext * TypeContext;
Axiom
ContextExtension(base, ext)
& TypeInContext(sigma, base)
=>
TypeInContext(sigma, ext)
;
We can then enforce our constraints on the type contexts of functions as follows:
Axiom
FunctionStmtNode(stmt, func)
& ambient_ctx = StmtTypeContext(stmt)
& func_ctx = FunctionTypeContext(func)
=>
ContextExtension(ambient_ctx, func_ctx)
;
Within the scope of a function literal, we want to treat the types of function arguments as valid monotypes, so we add them to the type context of the function:
Axiom
ConsArgListNode(arg_list, _, head_ty_node, _)
& ctx = ArgListContext(arg_list)
& head_ty = SemanticOptType(head_ty_node)
=>
TypeInContext(head_ty, ctx)
;
Eqlog does not support negations, but even if it did, Datalog engines
can for fundamental reasons support negations only for rules that don’t
apply recursively. Because of this, we cannot check that a
Type
element occurring inside a polytype is a bound type
variable during Eqlog evaluation: That would require us to hypothesize
that a type element is not the result of applying a type
constructor, and that it is not in the ambient type context.
Fortunately, we will only need to check whether a type element is
unbound, i.e., that the type element is either a type variable
in the ambient type context, or that it is the result of applying a type
constructor to unbound types. To prepare for this, we close type
contexts under type constructors:
Axiom gamma: TypeContext & sigma = VoidType() => TypeInContext(sigma, gamma);
...
Axiom
kappa = FunctionType(dom, cod)
& TypesInContext(dom, gamma)
& TypeInContext(cod, gamma)
=>
TypeInContext(kappa, gamma)
;
We also want type contexts to be closed under inverses of
type constructors. This is relevant for functions with arguments whose
types involve a type variable that they are not equal to. For example,
consider the apply
function:
function apply (f, x) {
return f(x);
}
The apply
function should have type
((a) => b, a) => b
. Our rules for arguments of
functions imply that the types of f: (a) => b
and
x: a
are in the type context of the body of
apply
, but only the following rule, which closes type
contexts under inverses of the function type constructor, lets us
conclude that then also b
must be in the type context:
Axiom
TypeInContext(FunctionType(dom, cod), gamma)
=>
TypesInContext(dom, gamma)
& TypeInContext(cod, gamma)
;
With all rules about type contexts in place, we can now assume that a type element in a polytype scheme is unbound if and only if it is in the ambient type context.
Our function VarTypeInExpr : Var -> GeneralType
can
result into a polytype, whereas we still expect the function
ExprType : ExprNode -> Type
to result into a type in the
current context, i.e., into a monotype. This means that we need to
revisit our typing rules for variable usages.
In case the variable has monotype, we can continue to equate the type of the variable expression with the type of the variable:
Axiom
VariableExprNode(expr, var)
& VarTypeInExpr(var, expr) = MonoType(ty)
=>
ExprType(expr) = ty
;
However, if the variable has PolyType(ty)
, then we need
to instantiate the type scheme ty
first, i.e.,
replace the type variables in ty
by suitable types in the
current context. We introduce the following machinery for that
purpose:
Sort Instantiation;
Func Instantiate : Instantiation * Type -> Type;
Func InstantiateList : Instantiation * TypeList -> TypeList;
Each Instantiation
element represents a (partial,
non-extensional) map Type -> Type
, and the
Instantiate
function represents the application of such
maps to Type
elements. As usual, we add rules that enforce
that the InstantiateList
function is given by
Instantiate
on each element of a TypeList
.
We can now associate an Instantiation
element to every
usage of a variable with polytype and equate the expression type with an
instantiation of the polytype scheme.
Func ExprInstantiation : ExprNode -> Instantiation;
Axiom
VariableExprNode(expr, var)
& VarTypeInExpr(var, expr) = PolyType(_)
=>
ExprInstantiation(expr)!
;
Axiom
VariableExprNode(expr, var)
& VarTypeInExpr(var, expr) = PolyType(sigma)
& inst = ExprInstantiation(expr)
& expr_ty = ExprType(expr)
=>
Instantiate(inst, sigma) = expr_ty
;
The typing rules that we’ve added in the previous post constrain the expression types of usages of polymorphic variables based on where the variables appear. For example, if a variable is used in position of the function in an application expression, then the instantation of the variable’s type must be a function type. However, we haven’t encoded any constraints based on the type scheme of the variable’s polytype so far, which is our next task.
The instantiation of a function type should be a function type, the
instantiation of the string type should be the string type, and so
forth. In other words, Instantiate
should commute
with type constructors in its Type
argument:
// Instantiate commutes with the number type constructor.
Axiom
instance_number = Instantiate(_, NumberType())
=>
NumberType() = instance_number
;
// If a funnction type is instantiated, then also the domain and
// codomain are instantiated, and instantiate commutes with the
// function type constructor.
Axiom
Instantiate(inst, FunctionType(dom, cod))!
=>
InstantiateList(inst, dom)!
& Instantiate(inst, cod)!
;
Axiom
dom_instances = InstantiateList(inst, dom)
& cod_instance = Instantiate(inst, cod)
& func_instance = Instantiate(inst, FunctionType(dom, cod))
=>
FunctionType(dom_instances, cod_instance) = func_instance
;
...
Note that commutativity of the Instantiate(inst, -)
operation with type constructors for fixed inst
implies
that the operation is uniquely determined by its value on type
variables.
Unbound type variables cannot be freely instantiated because they are
fixed by the ambient type context of the polytype already. In case of
the convoluted identity function above, this means that every
instantiation of const_x : (b) => a
must map
a
to itself.
Since we’re not allowing higher-ranked types, we don’t allow returning values with polytypes from functions; even if we return a variable with polymorphic type, our type checker implicitly instantiates the variable’s type scheme into a monotype first. As a consequence, polytypes cannot escape the ambient type context in which they were defined: The type context of a usage site of a variable with polytype is always an iterated extension of the ambient type context of the where the polytype was defined. It follows that at every usage site of a polytype, a type element occuring in the polytype scheme is unbound if and only if it is in the ambient type context of a usage site.
Func InstantiationTarget : Instantiation -> TypeContext;
Axiom
VariableExprNode(expr, var)
& VarTypeInExpr(var, expr) = PolyType(_)
& instance = ExprInstantiation(expr)
& ctx = ExprTypeContext(expr)
=>
InstantiationTarget(instance) = ctx
;
Axiom
sigma_instance = Instantiate(instance, sigma)
& TypeInContext(sigma, InstantiationTarget(instance))
=>
sigma = sigma_instance
;
Our introduction of polymorphism makes it necessary to revisit two error categories: Undetermined type errors and conflicting type constraints.
Recall that in the last post we introduced a predicate
Pred DeterminedType : Type;
and added rules so that DeterminedType(ty)
holds if and
only if ty
does not contain any type variables. If our type
checker found a Type
element that did not satisfy
DeterminedType
after Eqlog evaluation, it would report an
error due to the undetermined type.
Now that we’ve added support for polymorphism, this check is overly restrictive, since type variables can now appear in polytypes and the definitions of polymorphic functions. However, we cannot remove the check entirely. Consider the following program:
function absurd() {
return absurd();
}
let x = absurd();
The type of absurd
is () => b
, so its
return type is entirely unrestricted. This means that also the type of
x
is undetermined (i.e., a type variable), but we require
that all let bindings must have monotype.
To adapt the DeterminedType
predicate to polymorphism,
we add the following rules mirroring our rules for type contexts: The
argument and return types of functions are always determined, and
determined types are closed also under inverses of type constructors
(for cases such as the apply
function above).
Axiom
ConsArgListNode(_, _, arg_type, _)
& ty = SemanticOptType(arg_type)
=>
DeterminedType(ty)
;
Axiom
Function(_, _, _, ret_type, _)
& ty = SemanticOptType(cod_type)
=>
DeterminedType(ty)
;
Axiom
DeterminedType(FunctionType(dom, cod))
=>
DeterminedTypes(dom)
& DeterminedType(cod)
;
Consider the following function definiton:
function foo (x) {
x(x);
}
Our type checker infers that x
must be have function
type (a) => b
. Since x
is applied to
itself, we infer a type equality a = (a) => b
. Every
instantiation of foo
in which the argument type is a
concrete type (i.e., it does not contain type variables) would result
into a type conflict. But if foo
is never used, then our
type checker will not report an error, because our rules for type
conflicts fire only for concrete types, not for type variables.
To detect errors such as the foo
function, we introduce
a predicate SmallerType : Type * Type
such that
SmallerType(sigma, tau)
holds if and only if
sigma
is structurally strictly smaller than
tau
.
// SmallerType is transitive.
Axiom
SmallerType(sigma, tau)
& SmallerType(tau, kappa)
=>
SmallerType(sigma, kappa)
;
// A function type is structurally greater than all of its domain types and
// its codomain type.
Axiom
kappa = FunctionType(sigmas, tau)
=>
SmallerTypes(sigmas, kappa)
& SmallerType(tau, kappa)
;
Here SmallerTypes : TypeList * Type
is a predicate that
holds if and only if each type in the first argument is a
SmallerType
than the second argument.
Now consider again the example of the function foo
above. Our rules imply that SmallerType(a, (a) => b)
holds, hence due to the type equality a = (a) => b
, we
have SmallerType(a, a)
. Thus, the rule
Axiom SmallerType(sigma, sigma) => ConflictingTypes();
lets us detect this and similar errors.