Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for regex in dictionaries #3632

Open
spetrosi opened this issue Feb 3, 2025 · 6 comments
Open

Allow for regex in dictionaries #3632

spetrosi opened this issue Feb 3, 2025 · 6 comments

Comments

@spetrosi
Copy link

spetrosi commented Feb 3, 2025

I am considering using codespell to search for non-inclusive terms in our code. codespell has a built-in dictionary in https://github.com/codespell-project/codespell/blob/main/codespell_lib/data/dictionary_usage.txt.
However, codespell cannot search words from dictionary as a substring. For example, cosider this line:
timemaster is a program that uses ptp4l and phc2sys in combination with chronyd or ntpd to synchronize the system clock to NTP and PTP time masters.

Codespell would report the use of masters but it ignores timemaster.
I wonder if it is possible to allow passing regular expressions in the dictionary. So that I can add .*master.* to a dictionary and expect codespell to catch all strings that match that regex.

@spetrosi spetrosi changed the title Add an option to search for substrings Allow for regex in dictionaries Feb 3, 2025
@DimitriPapadopoulos
Copy link
Collaborator

Codespell processes words. The regex that splits text into words is documented here:

parser.add_argument(
"-r",
"--regex",
action="store",
type=str,
help="regular expression that is used to find words. "
"By default any alphanumeric character, the "
"underscore, the hyphen, and the apostrophe are "
"used to build words. This option cannot be "
"specified together with --write-changes.",
)

@spetrosi
Copy link
Author

spetrosi commented Feb 3, 2025

@DimitriPapadopoulos I didn't get what --regex does exactly when was reading it today. Do you know how I can use this option to get what I need?

@spetrosi
Copy link
Author

spetrosi commented Feb 3, 2025

The best way to use --regex that I found is like this:

$ codespell tasks/main.yml --builtin usage --regex "master|slave|blacklist|dummy|whitelist" -C0
  • with --regex I specify all the words that I want to search for separated by regex's OR - |
  • -C0 to print the line because the word found may be different from what codespell outputs.

I can also get all the words to --regex from the usage dictionary into a variable:

$ myregexvar=$(sed 's/->.*//g' /home/spetrosi/.local/lib/python3.12/site-packages/codespell_lib/data/dictionary_usage.txt | sed 's/$/|/' | tr -d '\n')
$ codespell tasks/main.yml --builtin usage --regex $myregexvar -C0

This works, but is not a clean solution. The initial request of this issue was the ability to add regular expressions to dictionaries. For example, to be able to add /.*master.*/ to my dictionary so that codespell search not for this word but for this regex.
I do not have permissions to re-open this issue.

@DimitriPapadopoulos
Copy link
Collaborator

The answer is that it's currently not possible to add regexes to dictionaries. I can reopen this as an enhancement request, no problem.

But then, given the requirement of matching .*master.*, why not use plain old grep/sed?

@spetrosi
Copy link
Author

spetrosi commented Feb 3, 2025

I thought that it's possible in some way because you closed as completed. Having it open as enhancement request works for me. Will it be hard to implement?

Do not want to use grep to have all the functionality around the check - to be able to provide dicts, set which files to test and to ignore, use inline comments to ignore a specific rule etc.

@DimitriPapadopoulos
Copy link
Collaborator

I think it will be very hard to implement. Using grep is your best bet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants