Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_entry_matching_value does not handle ignore-case properly. #648

Open
haiqi96 opened this issue Jan 2, 2025 · 6 comments
Open

get_entry_matching_value does not handle ignore-case properly. #648

haiqi96 opened this issue Jan 2, 2025 · 6 comments
Labels
bug Something isn't working

Comments

@haiqi96
Copy link
Contributor

haiqi96 commented Jan 2, 2025

Bughttps://github.com/y-scope/clp/issues/new/choose

Our current implementation of get_entry_matching_value() assumes only 1 value in the dictionary will match the input string.

This assumption is correct when match is case_sensitive (since the the input value won't contain wildcard).
However, when ignore_case is switched on, it's possible more than one dictioanry variables will match the input value. (for example, all of "VAR1", "var1" and "vAr1") matches input "var1", but only one of them will be returned.
As a result, CLP won't return matching messages with the other variables.

I believe the proper fix to this issue is to ether:

  1. update the return type of the method, so it returns a vector of entry. For case-sensitive case, the vector will always contain only one entry (or 0 if no matches). For ignore-case, the vector could have more than one elements.
  2. Create two different methods, one for case-senstive match, one for non-case sensitive match so they can have their dedicated interface.

CLP version

329edf6

Environment

unrelated to OS version.

Reproduction steps

  1. Compress the attached log.
  2. run clg search with ./cmake-build-release/clg -i output/ "My custom log two var1 num"

test.log

@haiqi96 haiqi96 added the bug Something isn't working label Jan 2, 2025
@aestriplex
Copy link

aestriplex commented Jan 15, 2025

@haiqi96 I saw this issue with no one assigned, and thought I would try to develop a fix (this one). I chose to keep the boolean parameter ignore_case. Currently, all tests pass, but I haven't implemented the ones specific to my changes yet. Should I put them in the file test-EncodedVariableInterpreter.cpp, or is it better if I create a dedicated file?

@haiqi96
Copy link
Contributor Author

haiqi96 commented Jan 16, 2025

@aestriplex Thank you very much for developing the fix. The vector apporach makes sense to me.
However, there's one bug in the implementation at line. The code now adds multiple variables to the logtype and subqueries but it is only supposed to add one variable. (since it only tries to match one)

One potential way to fix this is to use an imprecise var instead of a precise var. Unfortunately I am very busy this week so won't have time to dive down into details. Can you investigate first and let me know if you have any question or concern?

@aestriplex
Copy link

Thanks @haiqi96. Yes, I will investigate and try to fix it on my own. I'll let you know

aestriplex added a commit to aestriplex/clp that referenced this issue Jan 18, 2025
@aestriplex
Copy link

@haiqi96 I changed the implementation of the encode_and_search_dictionary method. Since I noticed that the SubQuery's add_imprecise_dict_var expects two unordered sets, I changed the return type of my previous fix to get_entry_matching_value from vector to unordered set. This way we can avoid verbose casting in encode_and_search_dictionary.
Both

LogTypeDictionaryEntry::add_dict_var(logtype);

and

sub_query.add_imprecise_dict_var(encoded_vars, entries);

are called just once now.
All the unit tests (that did not test this specific feature) still work, and I executed your "Reproduction steps" as well, here's the result

teo@Mac build % ./clg -i /out_log "My custom log two var1 num"
/log/test.log:2024-12-03 12:06:37.096 My custom log two var1 num 12(StaticText)
/log/test.log:2024-12-03 12:06:37.096 My custom log two VAR1 num 12(StaticText)
/log//test.log:2024-12-03 12:06:37.096 My custom log two VaR1 num 12(StaticText)

Tomorrow I'll implement a new section in test-EncodedVariableInterpreter.cpp containing some test cases for my changes

aestriplex added a commit to aestriplex/clp that referenced this issue Jan 21, 2025
@aestriplex
Copy link

@haiqi96
I should probably have put two more test cases like

REQUIRE(var_dict_reader.get_entry_matching_value("string_not_present_in_var_strs", true).empty());
REQUIRE(var_dict_reader.get_entry_matching_value("string_not_present_in_var_strs", false).empty());

If it is okay with you, I will open a pull request with this issue linked

@haiqi96
Copy link
Contributor Author

haiqi96 commented Jan 21, 2025

@haiqi96 I should probably have put two more test cases like

REQUIRE(var_dict_reader.get_entry_matching_value("string_not_present_in_var_strs", true).empty());
REQUIRE(var_dict_reader.get_entry_matching_value("string_not_present_in_var_strs", false).empty());
If it is okay with you, I will open a pull request with this issue linked

Sorry, was dragged into some work today, I will closely look at your change tomorrow, as long as no unexpected task comes up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants