Skip to content

Commit

Permalink
Let Location::str() handle \r correctly (#127)
Browse files Browse the repository at this point in the history
The observed problem was that, on Windows, error messages had a strange `\r` in reported source locations that was inconsistent with both other platforms, and other parts of the same output string (you would see `\n` once then `\r\n` further along the same string).

The trouble was traced back to `source.h`, which was separating lines based on only the `\n` character, and assuming that line breaks were only 1 char wide. This PR rewrites the 3 line/columns handling functions that did this, so that they will accept `\r\n`, `\n`, and `\r` as line delimiters while reporting correct line lengths (and so forth) on all platforms.

The main change of substance is that the lines field of `SourceDef` changes from a vector of `size_t` to a vector of `pair<size_t, size_t>`, because it is no longer possible to derive where a line ends from where another line begins (we might have to add/subtract either 1 or 2 chars). Rather than try to detect some file-wide line break width (and possibly have problems due to corrupted files mixing line ending types), this rewrite just looks at which line ending variant each line had and records span information to match.

Additionally, this commit adds exhaustive generative testing of this changed feature.
The test code is commented to explain logically how it builds all possible permutations of line endings and line contents up to a certain length (albeit using just a as contents since this code is not sensitive to string contents).
As configured in the `CMakeLists.txt`, it runs 729 test cases up to length 6 (6 chars, 6 line endings, and everything in between).
  • Loading branch information
fhackett authored Aug 22, 2024
1 parent 7ddc757 commit 467157a
Show file tree
Hide file tree
Showing 5 changed files with 534 additions and 16 deletions.
79 changes: 65 additions & 14 deletions include/trieste/source.h
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,13 @@ namespace trieste
}

private:
// Semantics note:
// The code here only looks for \n and is not intended to be
// platform-sensitive. Effectively, sources operate in binary mode and leave
// encoding issues to the language implementation. There are however some
// cosmetic fixes in error printing, such as in Location::str(), which
// ensure that control characters don't leak into Trieste's output in that
// case.
void find_lines()
{
// Find the lines.
Expand Down Expand Up @@ -157,29 +164,73 @@ namespace trieste
return {};

std::stringstream ss;
auto write_chars_skipping_r = [&ss](const std::string_view& str) -> void {
for (char ch : str)
{
if (ch != '\r')
{
ss << ch;
}
}
};
auto write_indexed_skipping_r =
[&ss](const std::string_view& str, auto fn) -> void {
size_t idx = 0;
for (char ch : str)
{
if (ch != '\r')
{
ss << fn(idx);
}
++idx;
}
};

auto [line, col] = linecol();
auto [linepos, linelen] = source->linepos(line);

if (view().find_first_of('\n') != std::string::npos)
{
auto cover = std::min(linelen - col, len);
std::fill_n(std::ostream_iterator<char>(ss), col, ' ');
std::fill_n(std::ostream_iterator<char>(ss), cover, '~');

auto [line2, col2] = source->linecol(pos + len);
auto [linepos2, linelen2] = source->linepos(line2);
linelen = (linepos2 - linepos) + linelen2;

ss << std::endl << source->view().substr(linepos, linelen) << std::endl;

std::fill_n(std::ostream_iterator<char>(ss), col2, '~');
auto line_view_first = source->view().substr(linepos, linelen);
size_t col_last;
std::string_view interim_view;
std::string_view line_view_last;
{
auto [line2, col2] = source->linecol(pos + len);
auto [linepos2, linelen2] = source->linepos(line2);
line_view_last = source->view().substr(linepos2, linelen2);
col_last = col2;

// Find the lines in between first and last to insert, if there are
// any such lines. If the lines are adjacent, this creates a 1 char
// line view with the new line between the two.
size_t interim_pos = linepos + linelen;
interim_view =
source->view().substr(interim_pos, linepos2 - interim_pos);
}

write_indexed_skipping_r(line_view_first, [&ccol = col](size_t idx) {
return idx < ccol ? ' ' : '~';
});
ss << std::endl;
write_chars_skipping_r(line_view_first);
write_chars_skipping_r(interim_view);
write_chars_skipping_r(line_view_last);
ss << std::endl;
write_indexed_skipping_r(
line_view_last.substr(0, col_last), [&](size_t) { return '~'; });
ss << std::endl;
}
else
{
ss << source->view().substr(linepos, linelen) << std::endl;
std::fill_n(std::ostream_iterator<char>(ss), col, ' ');
std::fill_n(std::ostream_iterator<char>(ss), len, '~');
auto line_view = source->view().substr(linepos, linelen);
write_chars_skipping_r(line_view);
ss << std::endl;

assert(pos >= linepos);
write_indexed_skipping_r(
line_view.substr(0, pos - linepos + len),
[&ccol = col](size_t idx) { return idx < ccol ? ' ' : '~'; });
ss << std::endl;
}

Expand Down
5 changes: 4 additions & 1 deletion parsers/yaml/event_writer.cc
Original file line number Diff line number Diff line change
Expand Up @@ -423,7 +423,10 @@ namespace trieste
auto current = node->at(i)->location().view();
auto next = node->at(i + 1)->location().view();
os << escape_chars(current, escape);
if (!std::isspace(current.front()) && !std::isspace(next.front()))
// an empty string view does not start with a space
if (
(current.empty() || !std::isspace(current.front())) &&
(next.empty() || !std::isspace(next.front())))
{
os << " ";
}
Expand Down
2 changes: 1 addition & 1 deletion parsers/yaml/to_json.cc
Original file line number Diff line number Diff line change
Expand Up @@ -212,7 +212,7 @@ namespace
[](Match& _) {
Location loc = _(Key)->location();
auto view = loc.view();
if (view.front() == '"' && view.back() == '"')
if (!view.empty() && view.front() == '"' && view.back() == '"')
{
loc.pos += 1;
loc.len -= 2;
Expand Down
7 changes: 7 additions & 0 deletions test/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,10 @@ if(CMAKE_CXX_COMPILER_ID STREQUAL "Clang" AND NOT TRIESTE_SANITIZE)
endif()

add_test(NAME trieste_intrusive_ptr_test COMMAND trieste_intrusive_ptr_test WORKING_DIRECTORY $<TARGET_FILE_DIR:trieste_intrusive_ptr_test>)

add_executable(trieste_source_test
source_test.cc
)
target_link_libraries(trieste_source_test trieste::trieste)

add_test(NAME trieste_source_test COMMAND trieste_source_test --depth 6 WORKING_DIRECTORY $<TARGET_FILE_DIR:trieste_source_test>)
Loading

0 comments on commit 467157a

Please sign in to comment.