Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Borrows in generic code can violate uniqueness constraints #713

Closed
yorickpeterse opened this issue May 17, 2024 · 12 comments
Closed

Borrows in generic code can violate uniqueness constraints #713

yorickpeterse opened this issue May 17, 2024 · 12 comments
Assignees
Labels
bug Defects, unintended behaviour, etc compiler Changes related to the compiler
Milestone

Comments

@yorickpeterse
Copy link
Collaborator

yorickpeterse commented May 17, 2024

Please describe the bug

When using generic code, it's possible to borrow a generic type parameter that ends up being assigned a uni T value, which then allows the borrow to outlive the unique value and violate the uniqueness constraints.

Please list the exact steps necessary to reproduce the bug

Consider this code:

class Example[T] {
  let @value: Option[T]
  let @borrow: Option[ref T]

  fn mut set(value: T) {
    @borrow = Option.Some(ref value)
    @value = Option.Some(value)
  }
}

class async Main {
  fn async main {
    let ex = Example(value: Option.None, borrow: Option.None)

    ex.set(recover [10, 20])

    let x = ex.value := Option.None
  }
}

Here ex ends up being of type Example[uni Array[Int]], with the borrow field storing a ref Array[Int], violating the constraints. We can then exploit that using ex.value := Option.None, giving us an Option[uni Array[Int]] while the Example value still stores a borrow to the uni Array[Int].

I'm not sure yet what the solution here is. If we introduce something similar to the mut constraint for type parameters to allow unique values, then we basically end up not being able to use them anywhere or we won't be able to use borrows. Basically it ends up being a massive pain.

Operating system

Fedora 40

Inko version

main

Rust version

1.77.2

@yorickpeterse yorickpeterse added bug Defects, unintended behaviour, etc compiler Changes related to the compiler labels May 17, 2024
@yorickpeterse
Copy link
Collaborator Author

In theory something like this would work: when type-checking a method body, we set some sort of "borrowed" flag for type parameters whenever they're borrowed, either directly or by passing it to an argument of another method that borrows it. When we then assign a uni T to such a type parameter, we disallow this at the call site.

The problem with this approach is that it requires at least two passes over all method bodies: one to set the flag for direct borrows (e.g. through ref x expressions), and one to set the flag for indirect borrows. I'm also not sure this analysis would always be complete.

@yorickpeterse
Copy link
Collaborator Author

Here's another way this can be triggered: if you have some uni T and do that_value.some_field.iter, the returned Iter produces values of type ref Something, allowing you to violate the uniqueness constraints by storing those ref values somewhere.

@yorickpeterse
Copy link
Collaborator Author

Another similar case: we allow pattern matching against e.g. Option[uni ref Something], which then lets you introduce a borrow by binding the uni ref Something to a match variable:

match something {
  case Some(v) -> {
   move_unique_value_to_another_process(unique_value)
   do_something_with_the_uni_ref_we_still_have(v)
  }
  ...
}

To prevent this from happening we have to disallow binding uni ref T and uni mut T values to match variables, similar to how you can't assign them to locals.

@yorickpeterse
Copy link
Collaborator Author

To see how many places there are where we explicitly use ref x / mut x with x being a type parameter, I modified the compiler as follows:

diff --git a/compiler/src/type_check/expressions.rs b/compiler/src/type_check/expressions.rs
index 6573bb59..31567c34 100644
--- a/compiler/src/type_check/expressions.rs
+++ b/compiler/src/type_check/expressions.rs
@@ -3069,6 +3069,16 @@ impl<'a> CheckMethodBody<'a> {
     ) -> TypeRef {
         let expr = self.expression(&mut node.value, scope);
 
+        // TODO: remove
+        if expr.is_type_parameter(self.db()) {
+            self.state.diagnostics.warn(
+                DiagnosticId::InvalidType,
+                "ref with type parameter that might violate uni T",
+                self.file(),
+                node.location.clone(),
+            );
+        }
+
         if !expr.allow_as_ref(self.db()) {
             self.state.diagnostics.error(
                 DiagnosticId::InvalidType,
@@ -3112,6 +3122,16 @@ impl<'a> CheckMethodBody<'a> {
 
         let expr = self.expression(&mut node.value, scope);
 
+        // TODO: remove
+        if expr.is_type_parameter(self.db()) {
+            self.state.diagnostics.warn(
+                DiagnosticId::InvalidType,
+                "mut with type parameter that might violate uni T",
+                self.file(),
+                node.location.clone(),
+            );
+        }
+
         if !expr.allow_as_mut(self.db()) {
             self.state.diagnostics.error(
                 DiagnosticId::InvalidType,

Running the test suite then produces the following warnings:

src/std/array.inko:135:16 warning(invalid-type): ref with type parameter that might violate uni T
src/std/array.inko:635:15 warning(invalid-type): ref with type parameter that might violate uni T
src/std/array.inko:726:15 warning(invalid-type): mut with type parameter that might violate uni T
src/std/iter.inko:275:20 warning(invalid-type): ref with type parameter that might violate uni T

I was expected a lot more warnings, so perhaps requiring the equivalent of T: ref to allow borrowing might not be that bad after all.

A challenge here is when we deal with closures and captures. Take this code for example:

import std.stdio (STDOUT)
import std.string (ToString)

class Person {
  let @name: String
}

impl ToString for Person {
  fn pub to_string -> String {
    @name
  }
}

fn example[T: ToString](value: T) {
  fn {
    let out = STDOUT.new

    out.print(value.to_string)
    nil
  }
    .call
}

class async Main {
  fn async main {
    let alice = recover Person(name: 'Alice')

    example(alice)
  }
}

Here the closure captures value by borrowing, violating the uniqueness constraints of alice in the process. Requiring T: ref just to be able to capture it is likely going to make working with closures and generics an absolute pain in the ass, as pretty much every such case will need T: ref and through that forbid the use of uni T values.

@yorickpeterse yorickpeterse added this to the 0.17.0 milestone Aug 16, 2024
@yorickpeterse yorickpeterse self-assigned this Aug 16, 2024
@yorickpeterse
Copy link
Collaborator Author

Pony's approach to this is to introduce a variety of reference capabilities and ways to convert between them. However, the resulting complexity of this makes the language difficult to use, which is likely why it never really caught on.

@yorickpeterse
Copy link
Collaborator Author

We could potentially remove the need for explicit T: mut, T: move, and T: ref through inference. The rules would (roughly) be as follows:

Given a generic type T:

  • If it's passed to a ref, used as the value for a ref expression
  • If it's passed to a mut, used as the value for a mut expression, is the receiver of an fn mut method, or is captured by an fn closure, it's inferred as T: mut
  • If it's the receiver of an fn move, it's inferred as T: move

Closures here complicate things: since they may mutate the captured data we can't infer T: ref and instead must infer T: mut, which may be annoying. The actual inference in turn would require multiple passes over all methods, as inferring the rules for method B may affect its caller A.

@yorickpeterse
Copy link
Collaborator Author

If we go down the path of inference, we can avoid using multiple passes using an approach like this:

We maintain a map of Type Parameter => [Type Parameters] where the key is a type parameter used in an argument, and the values the type parameters passed to that argument. When inferring rules for a type parameter that's a key, we propagate those to the type parameters in the value array. So given code like this:

fn foo[A](a: A) {
  bar(a)
}

fn bar[B](b: B) {
  baz(b)
}

fn baz[C](c: C) {
  mut c
}

The mapping would be:

C => [B]
B => [A]

When processing mut c we note we're operating on C and flag it as C: mut. We then look it up in the map and find [B]. We then iterate over those values, flag them accordingly, and repeat this process until we run out of work.

The downside is that this won't work reliable when dealing with type parameters inside generic types, such as this:

fn foo[A](a: A) {
  bar(a)
}

fn bar[B](b: B) {
  baz([b])
}

fn baz[C](c: Array[C]) {
  mut c.pop.get
}

Here the issue is that we don't pass B directly to c. This means we'd have to traverse the type graph and build the mapping as we go (e.g. as part of type checking).

The other approach is to process methods multiple times and record method dependencies. That is, when we encounter a call to a method we've yet to process, we store the caller as a dependency of the callee (e.g. bar => [foo] in the above example). When we finish processing, we check for any methods that depend on it and reschedule them, removing them from the dependency list in the process. This requires some form of caching such that we can skip work in a rescheduled method (e.g. foo) that we've already performed.

@yorickpeterse
Copy link
Collaborator Author

I read through Functional Ownership through Fractional Uniqueness hoping it would perhaps provide some extra insight, but alas it felt like a paper with a lot of words but not a lot of actually useful content, though perhaps I just don't grok it.

@yorickpeterse
Copy link
Collaborator Author

Also read through Flexible recovery of uniqueness and immutability, but it basically just describes the Pony model using different terms and some extra fluff.

@yorickpeterse
Copy link
Collaborator Author

We'll need to fix this for #780 if we want to have any chance at allocating data on the stack, as doing so will introduce another scenario in which we may violate uni T constraints.

One issue with adding T: ref is that it results in T itself being rather useless, because you can only move it around. Not even fn methods could be used, because they too might borrow their receiver for a duration longer than the call.

We can handle most cases by disallowing uni ref T / uni mut T from appearing in argument and return types as a result of inference/type arguments. This however leaves us with a case where a method mutates its receiver in such a way that it stores another alias to an uni T value. This is something we can't detect on the caller's side.

@yorickpeterse
Copy link
Collaborator Author

yorickpeterse commented Dec 2, 2024

For inference, I think we can take a simpler approach:

  • We add support for T: ref similar to T: mut and T: inline.
  • When a type parameter T defines one or more trait requirements that in turn define an fn or fn mut method, we automatically infer T as T: ref
  • We can do the same for T: mut if a trait requirement defines one or more fn mut methods
  • For the few cases where we want to borrow some T without additional requirements, one can use impl X if T: ref { ... }. Such cases will be rare

The nice thing about this approach is that we should be able to cut down the amount of explicit T: mut instances, as e.g. T: mut + Read can now be collapsed into T: Read, because Read defines fn mut read and thus implies T: mut.

@yorickpeterse
Copy link
Collaborator Author

yorickpeterse commented Dec 3, 2024

Some additional observations:

  1. If a type ends up containing uni ref T or uni mut T values as a result of inference, we can detect and deny such types from being constructed in the first place
  2. If a method returns an uni ref T or uni mut T (or something wrapping such a type), we can detect this easily and disallow the call
  3. If a uni ref T/uni mut T escapes through an argument (i.e. it's pushed into an Array[uni ref Whatever]), the first observation should protect against this

These checks would need to be performed after type checking a method body, such that all types in the body are complete.

What this doesn't guard against is aliasing inside the method body though, i.e. something like this:

impl Array {
  fn example(arg: Something) {
    get(0).whatever(arg)
  } 
}

This however might in fact be fine: based on observation 1, if Something were to store an uni ref T we'd notice it by checking constructed types at the end of a method. We don't need to worry about the method spawning a process either, because ref T and mut T values aren't sendable, meaning we don't have to worry about sending an uni ref T to another process by accident.

The downside is that for each call we need to recursively check all argument and return types for the presence of a uni ref T or uni mut T. This is likely to be more expensive compared to checking if a type meets the T: ref requirement. With that said, we already sort of pay that cost for checking if types are fully inferred when lowering to MIR, so we might be able to just reuse that traversal.

@yorickpeterse yorickpeterse self-assigned this Dec 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Defects, unintended behaviour, etc compiler Changes related to the compiler
Projects
None yet
Development

No branches or pull requests

1 participant