August 05, 2023
This is the third post in a series on implementing a type checker with the Eqlog Datalog engine. This post deals with our semantic model of the type system, and we’ll finally see some of Eqlog’s distinctive features in action. You can find the other posts here:
As usual, the code we discuss in this post is available as a branch in the associated repository.
Our goal is to define a semantic model of our type system
that complements the syntactical TypeNode
AST
nodes we’ve already discussed in the first post on parsing. Our parser
generates fresh TypeNode
elements for each piece of source
code representing a type. As a consequence, there can be many different
TypeNode
elements representing the same type. For example,
the AST of the program
function foo (num): boolean {
let e: boolean = num == 5;
return e;
}
contains two different BooleanTypeNode
elements, one for
each occurrence of the string boolean
. For type checking
and inference we will eventually have to verify that certain types
match, so TypeNode
elements are unsuitable for this.
Instead, we introduce a new sort Type
to represent
semantic types. In order to be useful for type checking and
inference, semantic types should have the following properties:
Type
element that
represents boolean
; similarly for string
,
number
and so forth.Type
element,
e.g. TypeNode
, ExprNode
(the type of an
expression) and FunctionNode
(the type of a function
literal).Type
element.All of these requirements are functional in nature: For
example, requirement 2 says that there should be a total function
TypeNode -> Type
.
Eqlog has a Func
keyword that lets us declare (partial)
functions, for example in the following Eqlog program:
Sort M;
Func Mul : M * M -> M;
Eqlog represents functions as graph
relations, i.e., as sets of value-result tuples. Thus, the binary
Mul
function is represented in the same way as a ternary
predicate MulGraph : M * M * M
, and
MulGraph(a, b, c)
holds if and only if
Mul(a, b) = c
. However, not every relation corresponds to a
function: This is only so if the last component is uniquely determined
by the other components. For the ternary MulGraph
predicate, this means that the following axiom must hold:
Axiom MulGraph(x, y, z_0) & MulGraph(x, y, z_1) => z_0 = z_1;
Eqlog implicitly adds such functionality axioms for every function declaration.
The conclusion of a functionality axiom is an equality, and
equalities are not part of standard Datalog either. Eqlog implements
equalities as follows. Recall that Eqlog repeatedly matches the premise
of each axiom on current data in the model and then adds the conclusion
of the axiom for each match. An equality x = y
in the
premise restricts matches to those where x
and
y
are interpreted as the same element.
Equalities in the conclusion are more interesting: Eqlog models
contain union-find data structure to track equality among elements. When
Eqlog has matched the premise of an axiom and there is an equality
x = y
in the conclusion, then it merges the equivalence
classes of the elements that are matched for x
and
y
.
Crucially, Eqlog matches premises with respect to the equivalence
relations stored in these union-find data structures. For example, there
are two occurrences of the variable x
in the premise
MulGraph(x, y, z_0) & MulGraph(x, y, z_1)
of our
functionality axiom above. A match of this premise is allowed to
interpret these two occurrences by distinct (in the sense that they are
represented by different IDs) elements as long as the two elements are
in the same equivalence class. This means that more matches become
possible after an equality has been found.
For example, suppose we have a model of our theory above given by
four distinct elements a, b, c, d
and the following
MulGraph
entries:
MulGraph(a, b, c)
MulGraph(a, b, d)
MulGraph(c, a, b)
MulGraph(d, a, a)
Let’s walk through Eqlog evaluation. At first, the only non-trivial
(i.e. z_0 != z_1
) match for the premise of the
functionality axiom is (a, b, c)
, (a, b, d)
.
The conclusion of the functionality axiom then asserts
c = d
. This means that (c, a, b)
,
(d, a, a)
is a valid match now. From this it follows that
b = a
must hold. At this point there are only trivial
matches, where the conclusion already holds, so evaluation stops.
We return to defining our semantic model of types. Our type system has the following type constants, which we declare as nullary functions:
Func VoidType : Type;
Func BooleanType : Type;
Func NumberType : Type;
Func StringType : Type;
The functionality axiom for nullary functions asserts that there is
at most one such constant. For example, if
sigma = VoidType()
and tau = VoidType()
, then
sigma = tau
.
The only type constructor with non-trivial arguments in our toy language is the constructor of function types: It takes as parameters a list of types for the domain (the types of arguments) and a type for the codomain (the result type). To model a list of semantic types, we use a similar trick as for syntax nodes:
Sort TypeList;
Func NilTypeList : TypeList;
Func ConsTypeList : Type * TypeList -> TypeList;
However, here we’re using Eqlog functions instead of predicates to
represent empty lists and the cons operation. This has the effect that
Eqlog considers all NilTypeList()
elements as equal, and
that two type lists agree if they have equal heads and equal tails. The
analogous properties would be undesirable for AST nodes because in the
AST we want to distinguish NilTypeListNode
elements that
that appear at different locations in source code.
We can now declare the function type constructor:
Func FunctionType: TypeList * Type -> Type;
One of our requirements on the semantic Type
sort was
that there should be functions associating Type
elements to
various AST nodes. To that end, we declare the following functions:
Func SemanticType : TypeNode -> Type;
Func SemanticOptType : OptTypeNode -> Type;
Func SemanticArgTypes : ArgListNode -> TypeList;
// The types of expressions and function literals:
Func ExprType : ExprNode -> Type;
Func ExprTypes : ExprNodeList -> TypeList;
Func FunctionNodeType : FunctionNode -> Type;
However, Eqlog’s Func
keyword introduces
partial functions, whereas we required total
functions.
To enforce that these functions are total, we can use Eqlog’s
exclamation mark operator !
like so:
Axiom tn: TypeNode => SemanticType(tn)!;
Axiom otn: OptTypeNode => SemanticOptType(otn)!;
Axiom ag: ArgListNode => SemanticArgTypes(ag)!;
Axiom expr: ExprNode => ExprType(expr)!;
Axiom exprs: ExprListNode => ExprTypes(exprs)!;
Axiom func: FunctionNode => FunctionNodeType(func)!;
These axioms make use of another Eqlog construct that we haven’t seen
before: Sort quantifiers, for example
tn: TypeNode
, match every element of a given sort. Sort
quantifiers are needed in situations where a variable is not constrained
by a predicate or function in the premise.
You can read the exclamation mark operator as “is defined”. During
evaluation, when Datalog has matched the premise of an axiom and
encounters an atom t!
in the conclusion, it first checks
whether the expression t
and all its subexpressions are
already defined. If they are, then nothing needs to be done. If some
subexpression or t
itself is not defined yet, then Eqlog
adjoins fresh elements to the model to represent them. For example, the
axiom for SemanticType
causes Eqlog to adjoin a fresh
Type
element t
to the model for each
TypeNode
element tn
and insert the tuple
(tn, t)
into the graph of SemanticType
.
We have to be very careful about axioms involving the exclamation mark operator though, because such axioms can lead to non-termination. Consider for example this Eqlog program that encodes the natural numbers:
Sort N;
Func Z : N;
Func S : N -> N;
Axiom => Z()!;
Axiom n : N => S(n)!;
Evaluation on even the empty model structure does not terminate for
this Eqlog program, since Eqlog will adjoin more and more elements
Z(), S(Z()), S(S(Z())), ...
without halting. In our case,
termination is guaranteed because the number of Type
elements adjoined during evaluation is bounded by the number of AST node
elements, and AST node elements are only added to the model before Eqlog
evaluation, by the parser.
Our SemanticType
, SemanticOptType
and
SemanticArgTypes
functions associate semantic type (list)
elements to certain AST nodes. However, we haven’t set up any rules
governing these functions. For example, we need an axiom that enforces
that the semantic type associated to a BooleanTypeNode
is
the semantic BooleanType
. The following axiom accomplishes
this:
Axiom
BooleanTypeNode(tn)
& sigma = SemanticType(tn)
=>
BooleanType() = sigma
;
There are analogous axioms that encode the relationship between
NumberTypeNode
and NumberType
,
VoidTypeNode
and VoidType
and so forth. The
semantic type of a function type AST node depends recursively on the
semantic types associated to domain and codomain AST nodes:
Axiom
FunctionTypeNode(tn, args, codomain)
& dom_types = SemanticArgTypes(args)
& cod_type = SemanticType(codomain)
& function_type = SemanticType(tn)
=>
FunctionType(dom_types, cod_type) = function_type
;
The OptTypeNode
sort represents optional type node
elements. Our parser uses these nodes to represent type optional
annotations. The SemanticOptType
function is total and
gives us a Type
element from such nodes, regardless of
whether the OptTypeNode
corresponds to an actual
TypeNode
or not. One way to think about this is that the
resulting Type
element represents a type that is
constrained by the type annotation. If the type annotation is missing,
then the Type
element is entirely unconstrained, i.e., not
related to any type constructor. But on SomeOptTypeNode
elements, which are given by an actual type node,
SemanticOptType
should agree with
SemanticType
:
Axiom
SomeOptTypeNode(otn, tn)
& sigma = SemanticType(tn)
=>
SemanticOptType(otn) = sigma
;
The SemanticArgTypes : ArgListNode -> TypeList
should
be given by mapping the SemanticTypeOpt
function on
optional type annotations, which we enforce as follows:
Axiom
NilArgListNode(al)
& semantic_types = SemanticArgTypes(al)
=>
NilTypeList() = semantic_types
;
Axiom
ConsArgListNode(al, _, head_type, tail)
& semantic_head = SemanticOptType(head_type)
& semantic_tail = SemanticArgTypes(tail)
& semantic_types = SemanticArgTypes(al)
=>
semantic_types = ConsTypeList(semantic_head, semantic_tail)
;
In the previous post on variable binding, we set up predicates
VarInX : Var * X
for various AST node sorts X
,
for example ExprNode
, and we added axioms that enforce
VarInX(var, node)
holds precisely when the variable
var
is accessible within node
. For type
checking, this information is not enough: We need to know which type the
variable has.
Thus, we change the VarInX
predicates to functions
VarTypeInX : Var * X -> Type
: These functions should be
defined on a pair of (var, node)
if var
is in
scope for node
, and if so, then
VarTypeInX(var, node)
is the type of the variable. We also
update our rules that introduce variable bindings so they take type
annotations into consideration. For example, the axiom for function
argument variables is now as follows:
Axiom
ConsStmtListNode(stmts, _, tail)
& sigma = VarTypeInStmts(var, stmts)
=>
VarTypeInStmts(var, tail) = sigma
;
The axioms responsible for propagating existing variable bindings
also need updating so that they propagate the type of variables. For
example, the axiom that propagates variable bindings through
StmtNodeList
elements now looks like this:
Axiom
ConsStmtListNode(stmts, _, tail)
& sigma = VarTypeInStmts(var, stmts)
=>
VarTypeInStmts(var, tail) = sigma
;
In most type systems, type constructors have disjoint
ranges, and each type constructor is separately injective.
The disjoint range property means that types obtained using different
type constructors do not agree. In our case, this means for example that
the boolean
type is different from the string
type, and that the void
type is different from all function
types. Injectivity is only relevant for type constructors with arguments
(so no type constants such as number
). For our toy
language, injectivity thus only applies to function types, where it says
that if two functions types are equal, then they must have the same
domain types and the same codomain types.
Evaluation of our Eqlog program can currently not violate these
properties, but this will change in the next posts due to typing
constraints on expression or variables. For example, we will enforce
that the type of a variable that is used as condition for an
if
statement must be of type boolean
, and that
it must also be of type string
if it is initialized with a
string literal. We enforce these constraints by imposing equalities on
the ExprType
of expressions, which by the functionality
axioms can lead to a violation of the disjoint range or injectivity
properties. These violations only happen for ill-typed programs, so our
type checker should report a type error in these situations. Similarly
to the VariableShadowing
predicate that we used to report
variable shadowing, we introduce a predicate
Pred ConflictingTypes: ();
that our Eqlog program should populate if it detects a violation of injectivity or the disjoint ranges property.
Detecting a violation of the disjoint ranges property is fairly straightforward:
Axiom VoidType() = BooleanType() => ConflictingTypes();
Axiom VoidType() = NumberType() => ConflictingTypes();
Axiom VoidType() = StringType() => ConflictingTypes();
Axiom VoidType() = FunctionType(_, _) => ConflictingTypes();
Axiom BooleanType() = NumberType() => ConflictingTypes();
// ...
We could encode injectivity of the function type constructor like this:
Axiom
FunctionType(sigmas_0, tau_0) = FunctionType(sigma_1, tau_1)
=>
sigmas_0 = sigmas_1
& tau_0 = tau_1
;
Instead, we’ll go with an approach that will help us also later:
Axiomatizing inverse functions to FunctionType
in each
argument.
When we collect constraints on the types of expressions, we’ll
eventually have the need to assert that some type kappa
is
a function type, i.e., that there should exist
domain
and codomain
such that
kappa = FunctionType(domain, kappa)
. However, Eqlog does
intentionally not support existential quantification, since this would make Eqlog
evaluation semantically ill-behaved. To work around this, we
introduce functions DomainTypes : Type -> TypeList
and
CodomainType : Type -> Type
as inverse functions for the
two arguments of FunctionType
:
Axiom DomainTypes(tau)! => CodomainType(tau)!;
Axiom CodomainType(tau)! => DomainTypes(tau)!;
Axiom
kappa = FunctionType(sigmas, tau)
=>
DomainTypes(kappa) = sigmas
& CodomainType(kappa) = tau
;
Axiom
sigmas = DomainTypes(kappa)
& tau = CodomainType(kappa)
=>
FunctionType(sigmas, tau) = kappa
;
To assert that a given type is a function type, it now suffices to
assert that DomainTypes
or CodomainType
is
defined on it. Since functions are injective if and only if they have an
inverse function on their range, the presence of
DomainTypes
and CodomainType
makes the
explicit injectivity axiom above redundant.
Due to injectivity of
FunctionType : TypeList * Type -> Type
, it can happen
that we equate TypeList
elements during evaluation. If type
lists are equal, then they should contain the same types in the same
order. Our Eqlog program equates type lists with different lengths only
if the input source code contains a variable number mismatch, e.g. a
call of a function with the wrong number of arguments, so we want to
report a type error in this case. We can enforce these condition on type
lists like so:
Axiom NilTypeList() = ConsTypeList(_, _) => ConflictingTypes();
Axiom
ConsTypeList(head_0, tail_0) = ConsTypeList(head_1, tail_1)
=>
head_0 = head_1
& tail_0 = tail_1
;