A New Kind of Type System
Code Completion as Programmer GPS
To cope with large libraries of abstractions, programmers increasingly on code completion to explore the library by leveraging context, such as type information, of the edited code. Unfortunately, code completion is often limited, rather than enhanced, by type systems that obscure options that are not directly applicable to some value; e.g. some method of a class is not listed in the code completion menu because the target value has yet to extend that class. We should design our languages to support code completion; more specifically that types can enhance the code completion menus that help us write code rather than just ensuring that we write type safe code; i.e.
Ask not what your code can do for your types, but what your types can do for your code.
a.M is typically typed in an OO language for some method
M defined in class
C: the type checker emits an error if
a is not defined to already extend
C. What I instead propose is that if
a can extend
C, this can just be inferred rather than flagged as an error. Code completion for
a. can then list all members that could be accessed from
a if it could extend some more traits, not just the members supported by its currently known type.
See Escape from the Maze of Twisty Classes for more ideas on how types can augment code completion. This note supersedes the type system described in that paper with a better one that is more well-defined, supporting both general programming and encapsulation. This type system is named after the YinYang language that is being designed and implemented around it.
Names are for Programmers, not Compilers
Given a completely inferred type system, name binding cannot be relied on to resolve references to symbols, such as methods, fields, and so on, because types do not pre-exist to establish an appropriate name scope context, and anyways, as Edwards observes in his Subtext work: names are too rich in meaning to waste on talking to compilers. Instead, abstractions can be provided directly via the editor that deals with symbol presentation and input as a separate concern. This note assumes all public symbols have rich globally unique names. However, they may also have pragmatic short overloaded names that can be disambiguated by the editor, possibly with programmer intervention; e.g. consider:
def Foo(a,b) | return a + b
In this code,
+ could target ints, doubles, complex numbers, big integers, longs, floats, or anything else where addition makes sense. The editor will then require that the programmer to select one of the available
+ methods in a code completion menu, which will drive type inference on
b, and the type of
Foo. A lack of pre-existing type-based context to resolve name bindings is not unique to YinYang: the Hindley-Milner type inference algorithm must also deal with it. Because the type system is designed specifically to support code completion, however, explicit programmer intervention is easily leveraged to avoid arbitrary symbol selections.
Inference turns the typical constraint of what an object can do depends on the class it extends around into what class an object extends depends on what it does. This constraint can be generalized significantly by swapping classes for traits that are similar to classes but support mixin-style linearized multiple inheritance, so what traits an object extends depends on what it does. Given that traits are mixins, they support a ﬁner-grained modularization of functionality than classes but, because of inference, traits do not need to be extended explicitly and client programmers can often be oblivious to their existence. Niche functionality can be packaged up into traits that are then inferred for an object when their functionality is selected for the object. Consider:
trait Widget | ... trait Bordered | this : Widget | var Thickness : double trait Button | this : Widget | def Pressed : bool | | ... var w := new : Widget if (w.Pressed) | w.Thickness := 10
In this example, the object bound to
w is inferred to extend both
Button, which are completely compatible with each other. Notes on syntax: YinYang uses indentation, like Python, to indicate blocks, which are clarified here with bars (
|). The colon
: operator indicates explicit trait extension; e.g.
this : Widget in the above code means that
Bordered objects must extend the
Widget trait. Objects are created via a
new operator with an optional list of traits to be explicitly extended. Mutable
var variables are initialized and re-assigned with
:=, which captures the asymmetry of inference as described later.
Before traits can completely replace classes however, the flexibility of trait combinations must somehow be tempered: if all traits can be combined freely, then there is nothing to prevent the type system from inferring a value as extending both
double, and in fact, there is no such thing as a type error at all! Non-nonsensical trait combinations are prohibited by defining trait compatibility through the implementation of abstract members: in any given object, all abstract members introduced by a trait it extends must be implemented either directly or by another trait it extends exactly once. Consider:
trait Animal | - def Sound : String trait Cat | this : Animal | + def Sound := "meow" | def DoCatThing() | | ... trait Dog | this : Animal | + def Sound := "ruff" | def DoDogThing() | | ...
Note on syntax: The
- operator preceding a member definition indicates that it is abstract, while the
+ operator preceding a member definition indicates that the definition is implementing (not just overriding) an inherited member. In this example, a value cannot be both a
Dog and a
Cat given that any object assigned to that value would have to extend both, causing a conflict as both traits implement
Animal.Sound. The abstract member implemented by incompatible traits need not be executable; e.g. the
double traits implement the same abstract non-executable member related to representation so that they are incompatible.
Typing Based on Need and Assignment
Now let’s get to the fun part. YinYang tracks the type that a value requires, or needs, in contrast to Hindley-Milner which tracks the type that a value provides: more precisely for an assignment
a := b, YinYang propagates requirements from the assigned value
a to the assignee value
b; constrast this to Hindley-Milner systems where the type that
b provides would be propagated to
a instead. Given need-based typing, YinYang lacks principal types and does not rely on unification to implement type checking. Instead, types are represented in a graph with expression terms as nodes and assignment between these terms as directed edges. As an example, consider:
var b := new var c := new var a := b if ... do | a := c a.DoDogThing() // OK, a, b, and c are dogs c.DoCatThing() // error, c is already a dog!
Note on syntax: the
new operator creates new objects. In this example,
c are assigned to
a so that the
Dog trait requirement created by the
a.DoDogThing() call propagates to both of them.
As a result, when
c.DoCatThing() creates a
Cat trait requirement on
c, a type error is flagged since
c cannot be both a
All assignments are considered without respect to control flow; e.g. the order of the
a := b and
a := c assignments is ignored, as is the condition that guards the
a := c assignment.
By propagating requirements to assignees, YinYang is exploiting an interesting relationship between subtyping and assignment. In typical type systems, the assignment
a := b is only permitted if the type of
b is type compatible with
a; in an OO language we would say that the type of
b must be a subtype of the type of
a. However, in YinYang, the assignment
a := b creates the subtype relationship; essentially it declares that
b is a subtype of
a, meaning that subtyping and assignment have equivalent behaviors!
Slots I: A New Hope
Programs consist of multiple objects whose collective structure can determine the capabilities of each; e.g. a WPF
FrameworkElement object supports manual positioning with
Canvas attribute properties. Unfortunately, code completion in C# is not useful in discovering this functionality because the enabling methods are static in
Canvasand not instance methods of the element being positioned. Additionally, no type relationship in C# exists between an element object and the panel object that contains it—setting the position of an object that is not contained in a Canvas object is silently ignored!
Objects generally have well-deﬁned relationships with other objects; e.g. a panel contains a widget, or a line connects two points. Such relationships are often encoded through ﬁeld assignment and then ignored by the type system. Fields, and other similar kinds of slots, are included in the YinYang type system by functioning as inferrable generic type parameters. Consider:
trait Panel trait Canvas | this : Panel trait Widget | var ParentPanel : Panel trait WidgetInCanvas | ParentPanel : Canvas | def SetPosition(pos : Position) | | ... var p := new var w := new w.ParentPanel := p w.SetPosition(new(X := 20, Y := 30))
In this example,
w is inferred to extend
Widget because its
ParentPanel field is assigned, while
p likewise is inferred to extend
Panel because it is assigned to
ParentPanel. However, because
SetPosition is called on
w, not only does
w also extend
p also extends
ParentPanel to extending
Canvas. Slots are quite powerful in allowing type relationships to span a graph objects in ways that are similar and more flexible than virtual classes and other kinds of family-type polymorophism.
Slots II: Revenge of the Alias
However, the power of slots comes with much complexity as propagating traits between slots when their parents are assigned to each other is far from trivial! Consider:
trait Kennel | var Content trait Wild | this : Animal | def DoWildThing() | | ... var a var b := new var c := new b.Content.DoDogThing() c.Content.DoCatThing() a := b if (...) | a := c a.DoWildThing(); var d := new a.Content := d /* error! d cannot be both a Dog and Cat */
In this example, the
c values all extend the
Kennel trait, while
c.Content each extend
c are assigned to
a, what is then inferred type of
a.Content? Since YinYang typing is by need, one might expect it to be just
DoWildThing() is accessed on
a.Content. However, given the way assignment intrinsically works,
a.Content is an alias of
c.Content, so any assignment to
a.Content must propagate the trait requirements of both. Indeed, a type error must occur at the
a.Content := d assignment because
d cannot extend both
Unfortunately aliasing is also bidirectional! As a result, trait requirements also flow from
c.Content, both requiring the
Wild trait propagated from
a.Content. However, if
a.Content is an alias of
c.Content (and vice versa), does that mean that
c.Content are aliases of each other? No, of course not! Although both
c.Content are aliases of
a.Content, intuitively they are not aliases of it at the same time since
a can never be assigned to
c at the same time. In other words, aliasing is not exactly transitive, wreaking havok on the good simple graph properties that YinYang’s type system depends on on.
Slots III: Return of the Graph
I spent a lot of time trying to solve this problem. To make a long story much shorter, the insight here is that while aliasing is not exactly transitive, the alias relationships created by a parent term assignment on a child term can be split into two relationships where one travels along the normal direction of assignment while the other travels in the opposite direction.
Formally, terms are split in the assignment graph into hi and lo components each with their own edges and trait requirements; an explicit assignment
y := x then translates into
x@hi <: y@hi ++ y@lo (
<: indicates IsA propagation from right to left,
++ is union). Slot (and field) relationships are defined as follows between related values
y that share a slot
(1) x@hi <: y@hi ==> x.s@hi <: y.s@hi (2) x@hi <: y@hi ++ y@lo ==> y.s@lo <: x.s@hi ++ x.s@lo (3) x@lo <: y@hi ++ y@lo ==> x.s@lo <: y.s@hi ++ y.s@lo (4) x@lo <: y@hi ==> x.s@lo <: y.s@hi
Notes that an
<: edge to
y@lo can only exist with
y@hi, while relationship (4) is not triggered in code directly but comes into play when encapsulation is considered. Also, a slot
s defined in trait
t is only visible in some value
t is in
x@hi; explicit access to
s will add
x@hi automatically, as will the YinYang statement
x : t.
According to the aboverelationships, when an alias relationship is created by the assignment
a := b above in our example, traits propagated in both directions between
b.Content but those traits that propagate from
a.Content never propagate back down to other aliases induced by other assignments like
a := c given that
c.Content@hi <: a.Content@hi does not include
a.Content@lo! In fact, the lo requirements of a value only come into play on a direct assignment to the value; e.g. when an assignment is made to
a.Content, both its hi and lo traits propagate to the assignee, leading to the type incompatibility error on
d in the above code.
In general, trait incompatibilities in the
lo component of a term are not flagged as errors, and instead only incompatibilities in
hi are relevant and checked.
Phew! Although this scheme seems very complicated, and it is, it really does conform to our intuitions about aliasing. Of course, because intuitions can be wrong, incomplete, or misunderstood, some good theory needs to be developed to reason about whether this scheme is actually correct, and perhaps obtain a more simple explenation to describe what is going on here. Or perhaps such an explenation already exists and someone can point it out to me.
Encapsulation and Separate Compilation
Encapsulation is a very important feature that hides details from client programmers so implementations can evolve independently without breaking client code. In our case, encapsulation is also important for efficiency reasons as the type system described so far can easily degrade into an intractable global analysis without it; i.e. we must support separate compilation. Finally, for this type system to be usable, inference must compute types according to values that a programmer can see from their current scopes; ditto for describing type errors as no one wants to track down a type error that involves someone else’s code! Before we describe encapsulation, let’s first look at a more complete picture of what a YinYang program looks like:
trait TraitA | var PublicFieldB | private var PrivateFieldC | def PublicMethodD(e,f,g) | | var LocalH | | ... /* code */ ... | private def PrivateMethodK(l,m) | | var LocalN | | ... /* code */ ... trait TraitO | ... /* like TraitA */ ... def PublicProcedureP(q,r,s) | var LocalT | ... /* code */ ... def PublicProcedureU(v,w) | var LocalX | ... /* code */ ...
The three sorts of code-containing constructs in this example are:
- Traits that exist at a top-level and are used to build objects;
- Methods that are defined by traits; and
- Procedures that exist at the top-level along side traits.
No other construct nestings are currently possible; e.g. traits cannot be nested inside other traits. Only the members of a trait, which includes methods and fields, can be explicitly declared private. Traits, methods, and procedures each define an encapsulation scope, where method scopes are nested inside trait scopes. Only methods and procedures can contain code; code that exists at the trait level is implicitly located in a constructing method. Only code located directly in the scope of a method or a procedure can see the scope’s local terms, which include anonymous intermediate expression terms, while code in any scope can see procedures, any trait, and the public methods and fields of any trait. Additionally, the code of a method can see the private methods and fields of its containing trait. More specifically, multiple signatures are computed during inference to satisfy the needs of different scopes:
A trait’s private signature includes its private methods and fields, but not the local terms of its method’s code; this signature is used by code defined inside the trait’s methods that refers to the trait.
A trait’s public signature elides private methods and fields from its private signatures; this signature is used for code that refers to the trait outside of the trait’s definition.
A procedure’s public signature elides the procedure’s local variables and is used for wherever the procedure is called.
Also, each trait and procedure has an initial unit assignment graph that is formed directly by processing the code of a trait or procedure (aka units), which involves apply signatures for various procedures called or traits extended. Elision is then performed on the unit graph to compute a trait’s private signature or procedure’s public signature.
Signatures are graphs that preserve connectivity of the graph that they encapsulate while hiding terms that are not visible through them. Consider a simple top-level procedure:
def P(dict,key,cell) | var d := dict.Get(key) | dict.Set(key, cell.Value) | cell.Value := d
This code simply swaps a dictionary entry and a value in a cell. The public signature computed for
P will only refer to
cell as roots. Assuming reasonable public signatures for
Cell, the signature of
P is equivalent to a graph formed from the following simple assignments:
dict.KeyT := key dict.ElementT := cell.Value cell.Value := dict.Element
When a procedure is called, the procedure’s graph is copied into the calling context substituting the arguments of the call for the procedure’s argument roots (likewise for return values, if any). Since the signature graph is copied, top-level procedures are completely generic in YinYang, meaning their types can vary depending on how they are called. Trait signatures are also applied via graph copying, substituting the
this root for the value extending the trait; again, this means traits are also generic like procedures. Unfortunately, things are not so pretty for methods.
No Generic Methods for You
Procedures have signatures while methods defined inside traits do not; this is not a mistake: it is very hard, and perhaps even impossible, to support shared encapsulation and genericity at the same time. Consider:
trait T | private var V | def M0(a) | | this.V := a.V | def M1(b) | | b.M0(this) | def M2 | | return this.V | def M3(c) | | this.V := c.Count
Assuming methods had signatures, what would public signatures for
M2 even look like without referring to
V? We could try relating methods argument and results together, but graph copying, necessary for genericity, complicates this: how to attach the various calls in the resulting graph, especially when the relationships involve multiple objects? I could not figure it out, although not for lack of trying. I think I’m hitting fundamental limitations in type inference that have already been explored in Hindley-Milner and System F, but more investigation is necessary.
Lacking their own signatures, method arguments and return values in YinYang are instead modeled in trait signatures as slots accessed through the this root of the trait in the same way that fields are. As a result, the public signature of
T for the above code resembles the graph formed by the following assignments:
this.M2#ret := this.M3#c.count this.M2#ret := this.M0#a.M3#c.Count this.M1#b.M2#ret := this.M3#c.Count
As copying is not applied when a method is called, they are not generic; however, method calls still play a role in the defining trait’s overall genericity, which is sufficient for expressing various kinds of generic collections.
Procedures, which are completely generic, can then be used to express generic collection methods like Map or Filter by calling only the public methods of the collections they operate on. For reasons similar to that of methods, method and procedure function arguments are treated like trait methods and cannot be called generically.
Other Important Issues and Features
Alias and encapsulation concerns are so far the biggest innovations of the YinYang type system. Other interesting details are as follows.
- An object (or constant) that is hidden by a signature cannot be caused to require more traits through the uses of the signature. Any public values of the signature that the hidden object is assigned to cannot require traits incompatible with the object; i.e. they are restricted in what the can extend. Consider:
def Foo(a,b) | return a + b var c := Foo(10, 20) c.DoUltraIntStuff() // error, c cannot extend UltraInt
The last operation on
c is not permitted because the implementation of
+ defines a new and hidden object to capture the result of the addition.
Given all the graph copying going on, recursion is at least a problem with respect to efficiency, if not decidability. This problem is currently solved in an extremely restrictive way: for any value that appears in an assignment graph, a slot can never appear in that value twice; e.g.
a.t.s.tis equivalent in type to
a.tand all assignments to and from the former become assignments to and from the latter.
This restriction has a couple of major consequences: all recursively linked structures, such as trees, in YinYang must have uniform node types, which is not so bad that being precise with such types would be very unwieldy; and it is impossible to define lists of lists or dictionaries of dictionaries, which is actually pretty bad. I definitely need to think about this more.
Abstract methods are supported but must be defined code that never executes. A special type member exists as a slot that can be assigned to and from abstractly in defining abstract method definitions; e.g. here is the definition of an abstract
trait Dictionary | type ElementT | type KeyT | -def Get(k) | | KeyT := k | | return ElementT | -def Set(k,e) | | KeyT := k | | ElementT := e
An implementation of dictionary abstract methods must then completely preserve the connectivity defined by their abstract implementations, although they are free to add new relationships to values that are not visible in the public signature of the trait with the abstract method being extended. To preserve connectivity, type members of the extended trait must often be equated to encapsulated data in the implementing trait; e.g.
trait HashBucket | var KeyX | var ElementX trait HashDictionary | this : Dictionary | private var Buckets : ListB | Buckets.ElementL.KeyX :=: KeyT | Buckets.ElementL.ElementX :=: ElementT | +def Get(k) | | var idx := k.Hash % Buckets.Count | | if (Buckets[idx].KeyX == k) | | | return Buckets[idx].ElementX | | ....
Dictionary with an encapsulated list of hash entry buckets. Given that implementations of
Set manipulate these buckets, their key and element fields should map directly to
ElementT, which is accomplished with the
:=: operator that indicates an abstract bi-directional assignment.
- Although this type system is extremely flexible, dynamic casting still must be supported. There is a cool way to express this: a dynamic cast of some term
eto some trait
tpropagates a requirement that is only enforced for assignees that are known to extend
t. Such a scheme does not work for frozen terms, however. TBD, but this feature is probably not for V1.
TBD: a couple of sentences to close off the note.
blog comments powered by Disqus