Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implicit source positions #52

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

OlivierNicole
Copy link

@OlivierNicole OlivierNicole commented Feb 21, 2025

Rendered version

Introduction of the RFC:

A new type of arrow is introduced. Syntactically, it looks like:

val f : loc:[%call_pos] -> ...

in a signature, and

let f = fun ~(loc : [%call_pos]) ... ->
  ...

in an implementation.

The label name doesn’t matter, but there must be a label. The function parameter thus defined has type Lexing.position and, if not specified at a call site, is automatically filled with the position of the call site.

This can serve several purposes:

  • When chaining monadic binds, the usual stack backtraces are very noisy and less useful. You can replace (>>=) : 'a t -> ('a -> 'b t) -> 'b t with (>>=) : loc:[%call_pos] -> 'a t -> ('a -> 'b t) -> 'b t to automatically track binding locations (and then implement a tracing facility).
  • It can be used to improve the usefulness of testing frameworks (see e.g. Best-effort automatic reporting of the location of Alcotest.check mirage/alcotest#366) or tracing systems (such as meio[^1]), or error reporting in embedded languages.

History

A first PR[^2] has been proposed ten years ago by @let-def. After some design amendments, that PR was stalled for a while, until renewed interested was expressed in 2023 by @TheLortex. The feature was then discussed during a developer meeting and gathered consensus on the principle[^3]. It was subsequently implemented at Jane Street and has been in use there for several months.

The goal of this RFC is therefore to present the details of the design in use at Jane Street to make sure there is agreement to port the feature upstream. Jane Street has asked Tarides for help in the summarizing work that led to this RFC.

@OlivierNicole
Copy link
Author

cc @goldfirere @MisterDA

@goldfirere
Copy link
Contributor

This feature has widespread use already within Jane Street. It was originally implemented by @vivianyyd, with much help getting it into production by @Enoumy.

Copy link

@nojb nojb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I am in favour. Incidentally, we had such a facility in LexiFi's compiler fork since 2008 and recently we removed it in order to reduce the diff with upstream. Will be happy to get it back as an official feature! :)


### Controlling file paths in source positions

When implementing this feature, Jane Street also introduced a new compiler flag, `-directory`. When specified, `-directory some/path` causes the filename in automatically inserted positions to be prepended with `some/path`. Jane Street’s build system uses that flag to control the file paths reported by `[%call_pos]`. For example, this can be used to give more informative file paths than just a single `./filename.ml`, all the while ensuring a reproducible result and avoiding to include machine-specific paths like `/home/user/build/.../filename.ml` in the compiled program.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part seems orthogonal and indeed probably best left out of an initial PR. Also, note that in Dune, for example, all compilation invocations are made from the workspace root, so that the paths appearing in compiler locations are relative to the workspace root and not bare filenames.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We found that the feature was not very useful without this -directory bit, hence its inclusion here. But maybe our Dune is different from the upstream Dune in this way and that upstream Dune will work without -directory.

@gasche
Copy link
Member

gasche commented Feb 22, 2025

As mentioned previously, there is broad maintainer consensus in favor of the feature. I suppose that the purpose of the RFC is more to discuss the proposed syntax and the practical details.

The proposed syntax looks fine to me.

The -directory bit (and the partial overlap with BUILD_PATH_PREFIX_MAP) stands out as worth investigating: if we implement the feature without this, how does it work in practice in multi-directories projects? Does it work well with Dune? What needs to be changed for it to work better?

I think that we may need a PR that implements the language-level feature to answer these questions, unless you can use the Jane Street compiler with upstream dune to make tests and report on their behavior.

@dbuenzli
Copy link
Contributor

The -directory bit (and the partial overlap with BUILD_PATH_PREFIX_MAP) stands out as worth investigating: if we implement the feature without this, how does it work in practice in multi-directories projects?

Note that it's hard to answer these questions without fixing BUILD_PATH_PREFIX_MAP first.

My gut feeling is that with a correctly working BUILD_PATH_PREFIX_MAP, you don't want to add yet another compiler flag to manipulates what gets into __FILE__. The specification of __FILE__ should simply be: the file path as specified on the cli as transformed by the BUILD_PATH_PREFIX_MAP active during the invocation.

@nojb
Copy link

nojb commented Feb 22, 2025

Does it work well with Dune?

As mentioned, all Dune compiler invocations are done from the workspace root, so compiler paths are paths relative to the workspace root as well, so I think the answer to this question is "yes".

ocamlbuild displays a similar behaviour, except that the paths are relative to the directory from which ocamlbuild is invoked.

@gasche
Copy link
Member

gasche commented Feb 22, 2025

Thanks! This suggest that we could forget about the -directory part of the RFC, unless I am missing something.

@goldfirere
Copy link
Contributor

Yes, this does suggest we could drop the -directory bit. I'm happy to. The worst case scenario is that we've miscalculated here and that directories won't always show up correctly in some scenarios, in which case we can take another stab at fixing this -- not so bad.

in an implementation.

The label name doesn’t matter, but there must be a label. The function parameter thus defined has type `Lexing.position` and, if not specified at a call site, is automatically filled with the position of the call site.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find RFC quite unclear as to what positions you concretely get in the resulting Lexing.position values and if it is possible to specify multiple implicit source positions. Basically the notion of "position of the call site" is quite unclear to me.

E.g. if I have:

let f ~(loc0 : [%call_pos]) a0 ~(loc1 : [%call_pos]) a1 = …

And call

let () = f "hey" "ho" 

what do loc0 and loc1 point to in f. This:

let () = f "hey" "ho" 
           ^     ^
           |     |
           loc0  loc1

or something else ?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think loc0 and loc1 will be equal and refer to the location of the whole expression f "hey" "ho".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And what do I get if I do:

let g = f "hey" 
let () = g "ho"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would guess that loc0 will be the location of f "hey" and loc1 that of g "ho".

Copy link

@jberdine jberdine Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A question I would have as a user of this feature is how the presence of an implicit source position parameter might affect code transformations the compiler might perform. For example, I would suppose with the example f above that f "hey" "ho" and (f "hey") "ho" might lead to different values of loc0 and loc1. This would make me worry that the implicit source position parameter might block something like contification. Such transformations aren't explained as part of the source language, so are probably out of scope for the RFC, but just as a data point, I would need to figure this out before using the feature.

Copy link
Contributor

@dbuenzli dbuenzli Mar 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that if it's possible to access the location of the application here without fuss (if that's where the magic happens), I would rather suggest to change the terminology to locations (i.e. [%call_loc]) and perhaps simply type those as string * int * int * int to be understood as fname, line_num, start_byte_offset, end_byte_offset with offset relative to the first byte of line_num1; in other words the (multiline aware) format of OCaml error messages.

Having the whole range of the application enables quite a few more tricks :–)

Footnotes

  1. It's not the best format as you have to count lines in the source string to find the line first byte position, but it has the same memory footprint, if people care about that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming and the representation choice came from the fact that this is how the Lexing module presents the information, both in terms of content and name ("position"). I think we should consider a change to the stdlib if we are going to chance the specification of this feature similarly.

Copy link
Contributor

@dbuenzli dbuenzli Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I'm not complaining about the representation choice or information presentation, I'm complaining about the actual bits of information you report: one position versus two positions (a "text location").

In any case I'm not sure I see why a change to the stdlib is needed.

The whole idea of using the Lexing module (which is a runtime module for lexers generated by ocamllex) was a bit odd in the first place. For example the magic debugging variable confusingly named __POS_OF__ simply uses a structural type to report a location.

Copy link
Member

@gasche gasche Mar 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify: having two positions lets us denote a span/interval within the source, rather than just a point. This is the difference between Lexing.position, which is just one point, and Location.t which is an interval.

type t = Warnings.loc =
  { loc_start: position; loc_end: position; loc_ghost: bool }

I would also expect a source location/span/interval as opposed to a source position. (I have the impression that the choice of a source position goes back to ocaml/ocaml#126 .)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FTR Printexc does use a record type for reporting the same kind of information, see Printexc.location. Perhaps something to consider consolidating.

Note that the semantics is not better documented there though. There's a lot to unpack in the format of OCaml error messages –– namely byte start and end positions relative to the first byte of a line identified by one-based line number; with positions understood as defined in the introduction of the OCaml string module. It would be a good idea to once solve ocaml/ocaml#8978 and make all the mecanisms that report this kind of information refer to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants