Quadratic Micropass Type Inference

Hacker NewsMarch 27, 20261 min read0 views

Comments

For the proof of concept implementation, see this repository

Introduction

Languages that allow plentiful of type inference can often produce confusing error messages. Type inference makes assumptions about what the user intends types to be, and if there’s a type error, there’s a high risk one of those assumptions were wrong. When the compiler then creates an error message it’ll use those types that may have been inferred from false assumptions.

I propose a new kind of type inference algorithm that always prioritises the type unifications the end-user is most likely to care about, independent from the types order in source code.

The goal being to not have the end-user need to reverse engineer what causes the compiler to infer an incorrect type, and instead work with the users pre-existing assumptions.

Background

Languages with simpler inference such as Go can infer inside-out. Meaning we can figure out the type of an expression/statement by looking at its kind plus the types of its subexpressions.

func main() {  x := someFunction() // the type of the right-hand side is always known }   func someFunction() string {...}

func main() {  x := someFunction() // the type of the right-hand side is always known }   func someFunction() string {...}

But this approach quickly falls apart.

fn main() {  let f = |x| {  // we don't know the type of

fn main() {  let f = |x| {  // we don't know the type of

yas the type ofx

is not yet known  let y = (x, x);  return y;  };  // we don't know the type of

is not yet known  let y = (x, x);  return y;  };  // we don't know the type of

f either.

f("Hello!"); // it's not until here we know the type of x, y and f

. }

. }

A more powerful form of inference is type variable unification. With such a system we can leave types partially uninferred, and have later expressions/statements fill in the gaps.

Lets take the same example but use type variable unification this time. We start inferring types inside-out like in the first example. But; for the types that are not yet known we put a type variable. (annotated as ’N)

fn main() {  let f: fn(ˈ1) -> ˈ2 = |x| {  let y: (ˈ1, ˈ1) = (x, x);   return y; // '2 = ('1, '1)  };

fn main() {  let f: fn(ˈ1) -> ˈ2 = |x| {  let y: (ˈ1, ˈ1) = (x, x);   return y; // '2 = ('1, '1)  };

... }`

As the name “type variable” suggests, they are variables and thus can be re-assigned.

...    f("Hello!"); // the literal unifies '1 with string, assigning '1 = string }

...    f("Hello!"); // the literal unifies '1 with string, assigning '1 = string }

After that type variable assignment the final types are what we’d expect.

fn main() {  let f: fn(string) -> (string, string) = |x| {  let y: (string, string) = (x, x);  return y;  };

fn main() {  let f: fn(string) -> (string, string) = |x| {  let y: (string, string) = (x, x);  return y;  };

f("Hello!"); }`

However this approach creates a new problem. Since we still unify types inside-out, we can end up making a type variable assignment which later on turns out to be much more clearly described as being intended to be assigned to something else.

If we look at this example.

fn example(x) -> Point[int] {  log(["origin: ", x]);  return Point(x, x);  ^ ^ error: expected int, got string }

fn example(x) -> Point[int] {  log(["origin: ", x]);  return Point(x, x);  ^ ^ error: expected int, got string }

By following the type variable unification steps from before x would be inferred as a string since it’s being used in a list with a string eventually creating the error message shown. But when a developer looks at this function they would be much more likely to expect x to be inferred as int since it’s more clearly being used as an int in the return statement.

If we could instead unify types in the order the end-user deems most important, we could create error messages that more strongly correlate to the developers thought process.

Quadratic Micropass Type Inference

Core Concepts

Instead of one large unification pass traversing the code top to bottom and inside-out, inference is split into the following ordered passes.

known_applications
known_assignments
known_return_types
known_same_as_unifications
known_record_fields
resolve_records
less_known_functions
default_numbers
default_unknown_to_unit_or_lift

See Passes for a description of each pass

With this ordering the higher up the list you go the closer to the users thought process you are. This way types are inferred by what the user is likely to consider the most important rather than by their position in source code.

Each pass performs its minimal task leaving most types untouched. After a pass is completed, it re-runs all previous passes before it. Hoping the changes it made will facilitate earlier passes to perform their role. With this design it’s logical that the lower down the list you go the more aggressive passes are allowed to be with their inference. As its implied less new type information will be made available.

Since each type variable may be unified many times by the same pass, error generation is not reasonably able to work during inference. Therefore passes ignore errors and only touch the types it knows it can and should work with. Error generation is instead delegated to type checking.

After type inference has finished we perform a finalization step which creates a map from type variables to known static types. Since these don’t contain any type variables we can perform type checking with plain equality operations.

Examples

Let’s see what happens if we now try the snippet that generated the misleading error message again.

fn example(x) -> Point[int] {  log(["origin: ", x]);  ^ error: expected string, got int  return Point(x, x);  // ^ ^

fn example(x) -> Point[int] {  log(["origin: ", x]);  ^ error: expected string, got int  return Point(x, x);  // ^ ^

known_applications

pass runs first,  // assigning the type variable of

pass runs first,  // assigning the type variable of

xtoint

Much better!

Let’s walk through a more feature-complete example step-by-step to see how this algorithm reaches its result.

fn main(a, b) -> _ {  let point = Point(a, 200);  let list = [point.x, 300];  let record = { x = 100, y = 200 };  return point.y == record.y; }_

fn main(a, b) -> _ {  let point = Point(a, 200);  let list = [point.x, 300];  let record = { x = 100, y = 200 };  return point.y == record.y; }_

struct Point[T] { x: T, y: T }`

In this imaginary language’s number literals can infer into any int size and are not assumed to be int. That’s why the first change isn’t the 200 literal being applied to Point.

(known application) Point(a, 200): Point[_] -> Point[{numeric}]
(known assignment) point: _ -> Point[_]
(known assignment) list: _ -> [_]
(known assignment) x = 100: _ -> {numeric}
(known assignment) y = 200: _ -> {numeric}
(known assignment) main return type: _ -> bool
(known application) a: _ -> {numeric}
(known application) point.y: _ -> {numeric}
(known application) record.y: _ -> {numeric}
(known same as) point.x: _ -> {numeric}
(known assignment) list: [_] -> [{numeric}]
(resolve record) { x = 100, y = 200 }: _ to Point[{numeric}]
(known assignment) record: _ -> Point[{numeric}]
(known record field) unifies record.y with y = 200
(default numbers) all {numeric} default to int
(default unit or lift) main declares type parameter T, b: _ -> T

Here’s a visualisation of those inference steps

Passes

known_applications

Infer the type of a parameter given to a function if the parameters type isn’t known but the expected type is.

known_assignments

Infer the type for the target of an assignment if the assigned value has a known type but the target does not.

This is also used when returning from a function, with the expected return type of the function being used as target.

known_return_types

If the same function is called in multiple places, and the return type of the called function is known, then for each call if the returned type is not known infer it to be the called functions known return type.

known_same_as_unifications

Unify all types that are used in the same list or branching expression with each other. Such as [x, y] and if true { x } else { y }.

known_record_fields

If the type of a record is known, then for each instance where a field from that record is accessed, unify the type of the accessed field with the known field type from the record.

resolve_records

If the type of a record is not known, use its field names to figure out its type.

less_known_functions

If a type which has not been inferred is called as a function, infer the type to be a function with the signature matching its first call.

default_numbers

If a type is not known but must be numeric, default it to be the default integer size.

default_unknown_to_unit_or_lift

If a type is not known, infer it to either the unit type if its created from an expression or an implicitly declared type parameter if its from the functions type signature.

Benchmarks

for _ in 0..iterations {  defaulting();  list();  if_expr();  field_inference();  article_feature_complete_example_go();  article_showcase_example_go();  static_function_application();  generic_function_application(); }

for _ in 0..iterations {  defaulting();  list();  if_expr();  field_inference();  article_feature_complete_example_go();  article_showcase_example_go();  static_function_application();  generic_function_application(); }

Intel i7 13700k @ 5.3GHz

100 iterations

7.79ms

50000 iterations

2.78s

The benchmark is biased towards an Ocaml-like language where projects consist of many smaller functions, which reduces the severity of the algorithms quadratic nature. But with cached translation units the algorithm performs well enough that it should be viable regardless.

This proof of concept implementation sacrifices some performance for the sake of code clarity.

Original source

Hacker News

https://articles.luminalang.com/a/micropass-inference/

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 181 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

Quadratic Micropass Type Inference

Introduction

Background

Quadratic Micropass Type Inference

Core Concepts

Examples

Passes

Benchmarks

Daily AI Digest

Knowledge Map

Connected Articles — Knowledge Graph

Discussion

More in Analyst News

Batavia students turn creativity into comics through AI workshop - WGRZ

Utah Allows AI To Renew Psychiatric Prescriptions - Let's Data Science

OpenAI to ‘drive AI conversation’ with purchase of tech talk show TBPN - Silicon Republic

The world’s wealthiest families adopt these 7 key habits for success, according to JPMorgan