August 06, 2023
This is the fourth post in a series on implementing a type checker with the Eqlog Datalog engine. In this post, we impose constraints on types of expressions and functions. By the end, we’ll have a full type checker for simple types. You can find the other posts here:
As usual, the code we discuss in this post is available as a branch in the associated repository.
In the previous post on the Type
sort we introduced the
following:
BooleanType : Type
and
FunctionType : TypeList -> Type
.ConflictingTypes
predicate if we equate types that cannot be equal, for example if
BooleanType() = NumberType()
.Type
elements to various
AST nodes, in particular the ExprType : ExprNode -> Type
and FunctionNodeType : FunctionNode -> Type
functions.
As of now, there are no laws these functions: The output of these
functions is always an opaque Type
element.Our main task for this post is to impose constraints on the
ExprType
and FunctionNodeType
functions.
We begin with the constraints on the ExprType
function.
Here are the rules for constants:
Axiom FalseExprNode(expr) & et = ExprType(expr) => et = BooleanType();
Axiom TrueExprNode(expr) & et = ExprType(expr) => et = BooleanType();
Axiom
StringLiteralExprNode(expr, _)
& et = ExprType(expr)
=>
et = StringType()
;
// ...
And this is the rule for equality comparisons:
Axiom
EqualsExprNode(eq, lhs, rhs)
& eq_type = ExprType(eq)
& lhs_type = ExprType(lhs)
& rhs_type = ExprType(rhs)
=>
eq_type = BooleanType()
& lhs_type = rhs_type
;
Thus, if eq
is the expression lhs == rhs
,
then eq
is of boolean
type, and the types of
lhs
and rhs
agree.
Note that we do not “check”, “verify” or even “demand” that the types
of lhs
and rhs
agree — we simply assert that
the types are equal. So what happens if there is an equality
expression such as true == "asdf"
where the types of the
two operands do not match? As usual, Eqlog will enforce that our rules
hold and thus equate the types of true
and
"asdf"
. This means that now
BooleanType() == StringType()
holds, which will make the
rule
Axiom BooleanType() == StringType() => TypeConflict();
fire, so that TypeConflict
holds. Our Rust glue code
then finds that the TypeConflict
predicate is populated
after evaluation and reports an error.
Recall the VarTypeInExpr : Var * ExprNode -> Type
function: VarTypeInExpr(var, expr)
is defined if and only
if there is a variable in scope for expr
, and if so the
result of VarTypeInExpr
is the type of the variable. With
this function at hand, the typing rule of variable usage is as
follows:
Axiom
VariableExprNode(expr, var)
& VarTypeInExpr(var, expr) = sigma
=>
ExprType(expr) = sigma
;
Note that it is important that we only hypothesize, but not assert,
that VarTypeInExpr
is defined on var
and
expr
. If we moved the VarTypeInExpr
atom from
the premise to the conclusion, then every usage of a variable would
introduce a variable binding, which would make it impossible to
detect usages of undeclared variables.
The most complex axiom governing ExprType
is the rule
for function applications. If AppExprNode(expr, func, args)
holds, then expr
is an expression of the form
func(args)
. In this situation func
must be a
function type with domain and codomain matching the types of
args
and expr
. Here it comes in handy that
we’ve encoded injectivity of FunctionType
by adding inverse
functions DomainType
and CodomainType
: If
either of those functions is defined on a type kappa
, then
also the other function is defined, and
kappa = FunctionType(DomainType(kappa), CodomainType(kappa))
.
We can thus encode typing constraints arising from function application
as follows:
Axiom
AppExprNode(expr, func, args)
& res_ty = ExprType(expr)
& func_ty = ExprType(func)
& arg_tys = ExprTypes(args)
=>
arg_tys = DomainTypes(func_ty)
& res_ty = CodomainType(func_ty)
;
Some of the constraints on expressions also arise from usage in
statements. For example, the expression used in the condition for a
while
loop or an if
statement must be
Boolean.
Next we consider the constraints on
FunctionNodeType : FunctionNode -> Type
. To recap, the
FunctionNode
sort represents function literals, for example
this source code fragment:
function foo(x: number, y): boolean {
return x == y;
}
Subnodes of the function literals are available via the
Function
predicate:
Pred Function : FunctionNode * Var * ArgListNode * OptTypeNode * StmtListNode;
// Function(func_node, name, args, result_type, body)
Our first constraint on FunctionNodeType
is that it is
always a FunctionType
, which is equivalent to it having
DomainTypes
and a CodomainType
:
Axiom kappa = FunctionNodeType(_) => DomainTypes(kappa)! & CodomainType(kappa)!;
The domain type of a function literal is given by the types of its arguments:
Axiom
Function(func, _, args, _, _)
& dom = DomainTypes(FunctionNodeType(func))
=>
SemanticArgTypes(args) = dom
;
Recall from the last post that
SemanticArgTypes : ArgListNode -> TypeList
is given by
mapping the SemanticOptType
function on the type
annotations of each argument. In that post we also added a rule to
equate the SemanticOptType
of a argument variable
declaration with the VarTypeInArgList
for that
variable:
// Every function argument introduces a variable.
Axiom
ConsArgListNode(_, var, ty_annot, tail)
& ty = SemanticOptType(ty_annot)
=>
VarTypeInArgList(var, tail) = ty
;
Since we propagate the result of the VarTypeInArgList
function into the function body, this means that the type of an argument
as given by FunctionNodeType
is tied to the type of the
argument variable inside the function body.
The return type of a function should match an explicit return type annotation:
Axiom
Function(func, _, _, cod_annot, _)
& func_ty = FunctionNodeType(func)
& cod_ty = SemanticOptType(cod_annot)
=>
CodomainType(func_ty) = cod_ty
;
We need rules that tie the return type of a function to the type of expressions that are returned from the function body. For example, we should detect that the return type of the function
function foo () { return 5; }
is number
even though the return type annotation is
missing. If a function body contains multiple return statements with
expressions of different types, then we should report a type conflict
error, for example here:
function bar (c) {
if (c) {
return 5;
else {
} return 'xyz';
} }
The body of a function is a StmtListNode
element, so we
introduce a ReturnsType : StmtListNode -> Type
function
for this analysis: We should have
ReturnsType(stmts) = sigma
if and only if one of the
(sub)statements in stmts
is an explicit return statement
with an expression of type sigma
. The functionality axiom
for ReturnsType
enforces that there can be at most one such
type sigma
. This means that conflicting types in return
statements surface the same way as for expressions: They lead to an
equality such as BooleanType() == NumberType()
, which will
then trigger one of our axioms that populates the
ConflictingTypes
predicate.
Our AST uses different nodes for return statements with or without an expression:
return 5; // Returns the expression `5`.
return; // Equivalent to `return ();`.
The axiom relating ReturnsTypes
to return statements
with expressions is
Axiom
ConsStmtListNode(stmts, head, _)
& ReturnStmtNode(head, return_value)
& et = ExprType(return_value)
=>
ReturnsType(stmts) = et
;
and the axiom for return statements without expressions is similar.
We also need axioms that propagate ReturnsType
through
statement nodes:
// If the tail of a statement list that can return a type, then the full
// statement list can also return that type.
Axiom
ConsStmtListNode(stmts, _, tail)
& rt = ReturnsType(tail)
=>
ReturnsType(stmts) = rt
;
// An if statement can return a type if at least one of its two branches
// returns that type.
Axiom
ConsStmtListNode(stmts, head, _)
& IfStmtNode(head, _, true_branch, _)
& rt = ReturnsType(true_branch)
=>
ReturnsType(stmts) = rt
;
Axiom
ConsStmtListNode(stmts, head, _)
& IfStmtNode(head, _, _, false_branch)
& rt = ReturnsType(false_branch)
=>
ReturnsType(stmts) = rt
;
// ...
We can now enforce that the return type of a function matches the type of returned expressions like so:
Axiom
Function(func, _, _, _, body)
& ft = FunctionNodeType(func)
& rt = ReturnsType(body)
=>
rt = CodomainType(ft)
;
void
returnIf control flow reaches the end of a function body, then the function
implicitly returns ()
, so the return type of the function
must be void
. Unfortunately, as with most questions
concerning control flow, it is undecidable whether this rule applies to
a given function. But we can over-approximate an answer: There will be
cases where our analysis believes that control flow can reach the end of
a function when that is not actually the case. Our simple heuristic is
based on the following rules:
if
statement if it can
flow through at least one of the two branches.In particular, we don’t attempt to analyze branching conditions to infer whether some branch is unreachable.
Our analysis works with the
Pred CanProceedStmt : StmtNode;
Pred CanProceedStmts : StmtListNode;
predicates: We should have CanProceedStmts(stmt)
if
control can flow past stmt
, and
CanProceedStmts(stmts)
if control can flow past all the
statements in stmts
:
Axiom LetStmtNode(stmt, _, _, _) => CanProceedStmt(stmt);
Axiom ExprStmtNode(stmt, _) => CanProceedStmt(stmt);
// ... and so forth for most other statement nodes but not return statements.
Axiom
IfStmtNode(stmt, _, true_branch, _)
& CanProceedStmts(true_branch)
=>
CanProceedStmt(stmt)
;
// ... and similarly for `false_branch`.
Finally, we enforce an implicit void
return type on a
function if control can flow past its body:
Axiom
Function(func, _, _, _, body)
& CanProceedStmts(body)
& cod = CodomainType(FunctionNodeType(func))
=>
cod = VoidType()
;
Our type checker is now almost complete: In case there are
conflicting typing constraints on an expression or function, then the
ConflictingTypes
is populated. However, we have not
considered undetermined types yet. Keeping in mind that our
language does not have generics yet, which types should x
and id
have in the following program?
function id(x) {
return x;
}
To detect situations such as this one and report an error, we introduce predicates
Pred DeterminedType : Type;
Pred DeterminedTypes : TypeList;
which should hold only for types that are either base types or obtained from base types with type constructors:
Axiom sigma = VoidType() => DeterminedType(sigma);
Axiom sigma = BooleanType() => DeterminedType(sigma);
Axiom
sigma = FunctionType(dom, cod)
& DeterminedTypes(dom)
& DeterminedType(cod)
=>
DeterminedType(sigma)
;
...
If Eqlog supported negations, we could now add an axiom such as
Axiom NOT DeterminedType(_) => UndeterminedType();
and check after Eqlog evaluation whether
UndeterminedType
holds. As Eqlog does not have negations
(yet?), we have to encode this axiom directly in Rust:
fn has_undetermined_type(p: &Program) -> bool {
.iter_type().any(|sigma| !p.determined_type(sigma))
p}
This concludes the implementation of our first version our type
checker. Thanks for following along! But the series is not done yet: The
next post will introduce generics, and we’ll implement Hindley-Milner
type inference in Eqlog. With generics, the function id
above will have inferred type forall a. (x: a) => a
, so
we won’t report an error anymore.