-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implicit source positions #52
base: master
Are you sure you want to change the base?
Implicit source positions #52
Conversation
This feature has widespread use already within Jane Street. It was originally implemented by @vivianyyd, with much help getting it into production by @Enoumy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I am in favour. Incidentally, we had such a facility in LexiFi's compiler fork since 2008 and recently we removed it in order to reduce the diff with upstream. Will be happy to get it back as an official feature! :)
|
||
### Controlling file paths in source positions | ||
|
||
When implementing this feature, Jane Street also introduced a new compiler flag, `-directory`. When specified, `-directory some/path` causes the filename in automatically inserted positions to be prepended with `some/path`. Jane Street’s build system uses that flag to control the file paths reported by `[%call_pos]`. For example, this can be used to give more informative file paths than just a single `./filename.ml`, all the while ensuring a reproducible result and avoiding to include machine-specific paths like `/home/user/build/.../filename.ml` in the compiled program. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part seems orthogonal and indeed probably best left out of an initial PR. Also, note that in Dune, for example, all compilation invocations are made from the workspace root, so that the paths appearing in compiler locations are relative to the workspace root and not bare filenames.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We found that the feature was not very useful without this -directory
bit, hence its inclusion here. But maybe our Dune is different from the upstream Dune in this way and that upstream Dune will work without -directory
.
As mentioned previously, there is broad maintainer consensus in favor of the feature. I suppose that the purpose of the RFC is more to discuss the proposed syntax and the practical details. The proposed syntax looks fine to me. The I think that we may need a PR that implements the language-level feature to answer these questions, unless you can use the Jane Street compiler with upstream dune to make tests and report on their behavior. |
Note that it's hard to answer these questions without fixing My gut feeling is that with a correctly working |
As mentioned, all Dune compiler invocations are done from the workspace root, so compiler paths are paths relative to the workspace root as well, so I think the answer to this question is "yes".
|
Thanks! This suggest that we could forget about the |
Yes, this does suggest we could drop the |
in an implementation. | ||
|
||
The label name doesn’t matter, but there must be a label. The function parameter thus defined has type `Lexing.position` and, if not specified at a call site, is automatically filled with the position of the call site. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find RFC quite unclear as to what positions you concretely get in the resulting Lexing.position
values and if it is possible to specify multiple implicit source positions. Basically the notion of "position of the call site" is quite unclear to me.
E.g. if I have:
let f ~(loc0 : [%call_pos]) a0 ~(loc1 : [%call_pos]) a1 = …
And call
let () = f "hey" "ho"
what do loc0
and loc1
point to in f
. This:
let () = f "hey" "ho"
^ ^
| |
loc0 loc1
or something else ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think loc0
and loc1
will be equal and refer to the location of the whole expression f "hey" "ho"
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And what do I get if I do:
let g = f "hey"
let () = g "ho"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would guess that loc0
will be the location of f "hey"
and loc1
that of g "ho"
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A question I would have as a user of this feature is how the presence of an implicit source position parameter might affect code transformations the compiler might perform. For example, I would suppose with the example f
above that f "hey" "ho"
and (f "hey") "ho"
might lead to different values of loc0
and loc1
. This would make me worry that the implicit source position parameter might block something like contification. Such transformations aren't explained as part of the source language, so are probably out of scope for the RFC, but just as a data point, I would need to figure this out before using the feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that if it's possible to access the location of the application here without fuss (if that's where the magic happens), I would rather suggest to change the terminology to locations (i.e. [%call_loc]
) and perhaps simply type those as string * int * int * int
to be understood as fname, line_num, start_byte_offset, end_byte_offset
with offset relative to the first byte of line_num
1; in other words the (multiline aware) format of OCaml error messages.
Having the whole range of the application enables quite a few more tricks :–)
Footnotes
-
It's not the best format as you have to count lines in the source string to find the line first byte position, but it has the same memory footprint, if people care about that. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming and the representation choice came from the fact that this is how the Lexing
module presents the information, both in terms of content and name ("position"). I think we should consider a change to the stdlib if we are going to chance the specification of this feature similarly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that I'm not complaining about the representation choice or information presentation, I'm complaining about the actual bits of information you report: one position versus two positions (a "text location").
In any case I'm not sure I see why a change to the stdlib is needed.
The whole idea of using the Lexing
module (which is a runtime module for lexers generated by ocamllex
) was a bit odd in the first place. For example the magic debugging variable confusingly named __POS_OF__
simply uses a structural type to report a location.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify: having two positions lets us denote a span/interval within the source, rather than just a point. This is the difference between Lexing.position
, which is just one point, and Location.t
which is an interval.
type t = Warnings.loc =
{ loc_start: position; loc_end: position; loc_ghost: bool }
I would also expect a source location/span/interval as opposed to a source position. (I have the impression that the choice of a source position goes back to ocaml/ocaml#126 .)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just FTR Printexc
does use a record type for reporting the same kind of information, see Printexc.location
. Perhaps something to consider consolidating.
Note that the semantics is not better documented there though. There's a lot to unpack in the format of OCaml error messages –– namely byte start and end positions relative to the first byte of a line identified by one-based line number; with positions understood as defined in the introduction of the OCaml string module. It would be a good idea to once solve ocaml/ocaml#8978 and make all the mecanisms that report this kind of information refer to it.
Rendered version
Introduction of the RFC: