-
-
Notifications
You must be signed in to change notification settings - Fork 319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fortran scanner misses comments in some cases #4454
Comments
To follow up on the final paragraph of the issue, in order to strip comments, we'd need to understand the syntax involved. The complication, as always, is context: if it's in a string (though Fortran does not use that terminology) it should not be considered a comment indicator. From the standard:
Note that the existing regex does not attempt to recognize the fixed form "C or * in column 1". and
And for the scanner issue, which incolves inline statement termination, here's the wording:
|
This is to document, in an issue, a limitation of the Fortran Scanner. The scanner itself does describe the limitation in comments, but not why it exists.
In searching for
USE
andINCLUDE
statements when scanning Fortran source, regexes are used, which cannot deal with the combination of multiple semicolon-separated statements on one line which includes comment marks. These lines from the unit test (SCons/Scanner/FortranTests.py
) all give the wrong result:The regex considers the semicolon as the start of a new bit of text to scan, so in each case, what comes before has no effect. The regexes are applied in multiline mode, FWIW. It doesn't matter where the comment mark is or how much whitespace there is, the comment appears to "end" at the semicolon. The fifth example should (apparently) be considered a syntax error (and thus ignored?), but as scanning starts on the blank after the semicolon, there is no complaint.
Python's
re
module allows only fixed look-behind patterns, so there is no legal way to express "semicolon if not preceded by!
and possibly some other stuff", doing so will produce an error from the re module. There also doesn't seem to be a simple way to express "if the line begins with a comment, don't do anything more with it". It's not hard to write a bit of regex that says ignore from a character until the line ending, but interleaving that with the already fairly complex regex in use is something else.From some digging around the internet, it seems that it's possible to encode this without a look-behind, at the cost of creating a considerably more complex regex pattern - that might be something to explore; we seem to be lacking that level of expertise. The non-stdlib
regex
module can reportedly do look-behind with a variable-length pattern; a simple attempt to code it for there gave no error, but didn't help.Probably we should find a place to document this and suggest that the easiest workaround, if it causes problems (it's not clear there's a real-world problem here, just a broken test when a change was made in the scanner module), is to "not do that" - so instead of:
do:
One suggestion from Discord was to pre-process the file to scan to remove comments, not sure how easy this is. Can comment marks appear in a line inside some other construct such that they are not considered comment indicators?
The text was updated successfully, but these errors were encountered: