Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lib: rewrite manifest2po in Python #21532

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jelly
Copy link
Member

@jelly jelly commented Jan 17, 2025

Extracting translatable strings from manifests is easy enough to do in pure Python and removes an dependency on node_modules.

The differences in the Python implementation versus the JavaScript version:

  • no longer relies on global state to gather the msgid's
  • doesn't encode duplicate filenames for the same msgid

Next up would be porting pkg/lib/html2po to Python, this would allow us to drop html-parser from node_modules.

Extracting translatable strings from manifests is easy enough to do in
pure Python and removes an dependency on node_modules.

The differences in the Python implementation versus the JavaScript
version:
- no longer relies on global state to gather the msgid's
- doesn't encode duplicate filenames for the same msgid
@jelly jelly added the no-test For doc/workflow changes, or experiments which don't need a full CI run, label Jan 17, 2025
Copy link
Member

@allisonkarlitskaya allisonkarlitskaya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I like the structure in general. See the usual bag of small nags.

import json
import pathlib
import re
from typing import Any, DefaultDict, Dict, Iterable, List, Set
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is part of the build system, so we don't have to worry about compatibility with old Python versions:

  • for Dict, List, and Set, use the built-in equivalents (dict, list, set).
  • for Iterable use collections.abc.Iterable
  • for DefaultDict use collections.defaultdict
  • ...I'm sure you have a good reason for Any, so I'll keep reading ;)

import re
from typing import Any, DefaultDict, Dict, Iterable, List, Set

PO_HEADER = """msgid ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this an r""" string to avoid needing to double-escape the \\n.

"""


def get_docs_strings(docs: List[Dict[str, str]]) -> Iterable[str]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks more like a Iterable[Mapping[str, str]] but it's not so important...



def get_menu_strings(menu: Dict[str, Any]) -> Iterable[str]:
for _, entry in menu.items():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use .values() to just get the entries.

yield match


def get_menu_strings(menu: Dict[str, Any]) -> Iterable[str]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not crazy about the Any here and the no checking of the actual values of the things you're accessing, but I guess there's room for pragmatism here...



def get_manifest_strings(manifest: Dict[str, Any]) -> Iterable[str]:
if 'menu' in manifest:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Walrus is your friend.

if menu := manifest.get('menu'):
    yield from get_menu_strings(menu)

Comment on lines +106 to +110
fp.write(f"""
#: {manifest_filenames}
msgid "{msgid}"
msgstr ""
""")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like it might be easier to read if you just print the lines one at a time. Definitely a matter of taste, though.

parser.add_argument('files', nargs='+', help='One or more input files', type=pathlib.Path, metavar='FILE')

args = parser.parse_args()
strings: DefaultDict[str, Set[str]] = collections.defaultdict(set)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use this syntax instead:

strings = collections.defaultdict[str, set[str]](set)

args = parser.parse_args()
strings: DefaultDict[str, Set[str]] = collections.defaultdict(set)

file: pathlib.Path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this cause file inside of the loop to be pathlib.Path instead of Any?

Comment on lines +94 to +99
# There are variables which when not substituted can cause JSON.parse to fail
# Dummy replace them. None variable is going to be translated anyway
safe_data = re.sub(r"@.+?@", "1", data)
manifest = json.loads(safe_data)
for msgid in get_manifest_strings(manifest):
strings[msgid].add(str(file))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last traces of that got removed in 473db0e.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-test For doc/workflow changes, or experiments which don't need a full CI run,
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants