-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lib: rewrite manifest2po in Python #21532
base: main
Are you sure you want to change the base?
Conversation
Extracting translatable strings from manifests is easy enough to do in pure Python and removes an dependency on node_modules. The differences in the Python implementation versus the JavaScript version: - no longer relies on global state to gather the msgid's - doesn't encode duplicate filenames for the same msgid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I like the structure in general. See the usual bag of small nags.
import json | ||
import pathlib | ||
import re | ||
from typing import Any, DefaultDict, Dict, Iterable, List, Set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is part of the build system, so we don't have to worry about compatibility with old Python versions:
- for
Dict
,List
, andSet
, use the built-in equivalents (dict
,list
,set
). - for
Iterable
usecollections.abc.Iterable
- for
DefaultDict
usecollections.defaultdict
- ...I'm sure you have a good reason for
Any
, so I'll keep reading ;)
import re | ||
from typing import Any, DefaultDict, Dict, Iterable, List, Set | ||
|
||
PO_HEADER = """msgid "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this an r"""
string to avoid needing to double-escape the \\n
.
""" | ||
|
||
|
||
def get_docs_strings(docs: List[Dict[str, str]]) -> Iterable[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks more like a Iterable[Mapping[str, str]]
but it's not so important...
|
||
|
||
def get_menu_strings(menu: Dict[str, Any]) -> Iterable[str]: | ||
for _, entry in menu.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use .values()
to just get the entries.
yield match | ||
|
||
|
||
def get_menu_strings(menu: Dict[str, Any]) -> Iterable[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not crazy about the Any
here and the no checking of the actual values of the things you're accessing, but I guess there's room for pragmatism here...
|
||
|
||
def get_manifest_strings(manifest: Dict[str, Any]) -> Iterable[str]: | ||
if 'menu' in manifest: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Walrus is your friend.
if menu := manifest.get('menu'):
yield from get_menu_strings(menu)
fp.write(f""" | ||
#: {manifest_filenames} | ||
msgid "{msgid}" | ||
msgstr "" | ||
""") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like it might be easier to read if you just print the lines one at a time. Definitely a matter of taste, though.
parser.add_argument('files', nargs='+', help='One or more input files', type=pathlib.Path, metavar='FILE') | ||
|
||
args = parser.parse_args() | ||
strings: DefaultDict[str, Set[str]] = collections.defaultdict(set) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use this syntax instead:
strings = collections.defaultdict[str, set[str]](set)
args = parser.parse_args() | ||
strings: DefaultDict[str, Set[str]] = collections.defaultdict(set) | ||
|
||
file: pathlib.Path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this cause file
inside of the loop to be pathlib.Path
instead of Any
?
# There are variables which when not substituted can cause JSON.parse to fail | ||
# Dummy replace them. None variable is going to be translated anyway | ||
safe_data = re.sub(r"@.+?@", "1", data) | ||
manifest = json.loads(safe_data) | ||
for msgid in get_manifest_strings(manifest): | ||
strings[msgid].add(str(file)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last traces of that got removed in 473db0e.
Extracting translatable strings from manifests is easy enough to do in pure Python and removes an dependency on node_modules.
The differences in the Python implementation versus the JavaScript version:
Next up would be porting
pkg/lib/html2po
to Python, this would allow us to drophtml-parser
from node_modules.