Skip to content

Commit

Permalink
add logging, update readme and test files (#8)
Browse files Browse the repository at this point in the history
* initial check-in of working version.

* add file containing dependencies to install

* updated lint rules
* add mention of regex to README

* Move process impl into the base class FileProcessor.

* use file_extension as key to look up the right FileWriter class

* add command line arg processing

* move FileReader responsibility out of the FileProcessor

* expose FileReader and message linter

* modify FileProcessor to accept FileReader and FileWriter

* add more command line usages

* rename to message_lint (from str_res_lint)

* update README to reflect the renaming to message_lint (from str_res_lint)

* remove static method (build_output_folder) from FileWriter

* add test files


* change --dest option to --output_folder

* clean up table. remove need for --file option. rename --dest option to `output_folder`

* fix indentation issue

* add unit tests for filereader and filewriter.

* remove commented code

* add examples to help reader learn how to use message_lint

* correct / clean up some of the rules

* add overview diagram

* add diagram

* Isolate version number. Also, add logging

* updated rules. check for empty message resources.

* correct rule for empty messages

* add "Getting Started" section
* add table of issues the tool will find and how to resolve each issue

* minor edits

* move main function out of bin/message_lint and put it in its own folder. that way, it can be unit tested.

* add placeholder for unit test

* update diagram adding mention of product source content

* add two simple test (source content) files

* clarify mention of supporting plural noun forms

* commented out debug print statements. Will remove them at some point.

* add mention of regex

* move fileprocessor under message_lint app folder

* move fileprocessor out of utils to the app folder

* add type hints

* move linter to app folder

* add command line option "verbose"

* update test files with improved example messages with L12y issues

* remove print statements

* update version
  • Loading branch information
ehom authored Jan 30, 2023
1 parent 8ca5ec1 commit 6609de5
Show file tree
Hide file tree
Showing 21 changed files with 437 additions and 206 deletions.
143 changes: 101 additions & 42 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,44 +1,83 @@
# message_lint

`message_lint` checks each message for localizability (L12y) issues.

## What it looks for

| `message_lint` checks each message if it... | L12y Issues |
|-----------------------------------------------------------------|--------------------------------|
| begins with `,` or `.` | Text Fragments |
| begins with one of the following: `and` `or` | |
| ends with `,` | |
| ends with one of the following: `the` `to` `by` `on` `and` `or` | |
| | |
| contains `{placeholder}` preceded by | Articles before placeholders |
| `a` `an` `a(n)` or `the` | |
| | |
| contains `{placeholder}%` `{0}%` | Percentage Formatting |
| | |
| contains one of the following: `http://` `https://` | URIs/URLs embedded in messages |
| `<a href="...">...</a>` | |
| | |
| contains `{placeholder}` followed by: | Lack of Pluralization |
| `year` `month` `week` `day` | |
| `hour` `min` `sec` | |
| `groups` `issues` `users` `people` `other` `boards` `spaces` | |
| | |
| contains placeholder that uses a number `{[0-9]+}` | Non-named placeholders |
| | |
| contains any of following: | use of ASCII Punctuation Chars |
| `'` apostrophe (U+0027) | |
| `"` double quote (U+0022) | |
| `...` 3 periods (U+002E) | |

## Install dependencies
`message_lint` is for software developers or product localization managers who want to
find out if their product source content contains any localizability (L12y) issues
before it goes for localization.


This command line tool can read in:
* `react-intl` message resource (JSON) and
* Java properties files

## A bird eye's view of `message_lint`

![Alt text here](images/message_lint_diagram.svg)

## What common L12y issues does `message_lint` look for?

| L12y Issue | How to resolve |
|--------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------|
| text fragments | each message should be a complete sentence/phrase |
| articles before placeholders | the value that goes into the placeholder should include the article |
| percentage formatting | the value that goes into the placeholder should include the percent sign either prefixed or suffixed |
| embedded URLs/URIs -- this is really not part of the user-facing content. Updating a URL should not trigger a localization workflow. | the URLs/URLs should be external to the messages. They come into the message through the placeholders. |
| inadequate plural nouns | use the ICU message format to support multiple plural noun forms |
| use of ASCII punctuation | replace apostrophe with Unicode Right Single Quotation (U+2019) |
| | replace double quotation with Unicode Left and Right Double Quotation (U+201C, U+201D ) |
| | replace ellipses "..." with Unicode Ellipses (U+2026) |

## How does it find these issues?

`message_lint` uses regular expressions (regex) to find these issues.

| `message_lint` checks each message if it... | L12y Issues |
|-----------------------------------------------------------------|----------------------------------------------|
| begins with `,` or `.` | Text Fragments |
| begins with one of the following: `and` `or` | |
| ends with `,` | |
| ends with one of the following: `the` `to` `by` `on` `and` `or` | |
| | |
| contains `{placeholder}` preceded by | Articles before placeholders |
| `a` `an` `a(n)` or `the` | |
| | |
| contains `{placeholder}%` `{0}%` | Percentage Formatting |
| | |
| contains one of the following: `http://` `https://` | URIs/URLs embedded in messages |
| `<a href="...">...</a>` | |
| | |
| contains `{placeholder}` followed by: | Inadequate support for multiple plural nouns |
| `year` `month` `week` `day` | |
| `hour` `min` `sec` | |
| `groups` `issues` `users` `people` `other` `boards` `spaces` | |
| | |
| contains placeholder that uses a number `{[0-9]+}` | Non-named placeholders |
| | |
| contains any of following: | use of ASCII Punctuation Chars |
| `'` apostrophe (U+0027) | |
| `"` double quote (U+0022) | |
| `...` 3 periods (U+002E) | |


## Why you should check your source content?

Checking your source content for L12y issues will save time and money if you do it early
in the product development life cycle (PDLC).

Localization specialists will be grateful too :smile:

## Getting Started

### Install dependencies

Run the following at the command line:

% `cd message_lint`

% `pip install -r requirements.txt`

## First, do this
### First, take a look at the command line help for `message_lint`

% `message-lint/bin/message_lint --help`
*message_lint %* `bin/message_lint --help`

```
usage: message_lint [-h] [-o OUTPUT_FOLDER] [-v] files [files ...]
Expand All @@ -59,20 +98,40 @@ Thanks for using message_lint!
%
```

## Now try your files
### Now try out the test files we provided

You must pass JSON message files and Java (message) Properties files to `message_lint`.

The lint reports for `test.json` will be located in the same directory under `message_lint_reports`
By default, the lint reports will be generated and located in the same directory
as the test_files under `message_lint_reports`.

The lint reports will only be generated in `.json` format.

Here are some example command lines you can try out:

#### Example 1.1

When you run the following command line, `message_lint` will examine each message in `test_files\test.json`
and generate a report of localizability issues if any. By default, the lint reports will be generated and located in
the same directory as the `test_files` but in `message_lint_reports`

*message_lint %* `bin/message_lint test_files\test.json`

#### Example 1.2

You can also pass it more than one file.

*message_lint %* `bin/message_lint test_files\test.json test_files\test.properties`

A lint report will be generated for each message resource file.

Here are some example command lines:
#### Example 2

% `message_lint/bin/message_lint test.json test.properties`
You can specify a custom output folder where the lint reports will go.

You can also pass it an output folder where the lint reports will go. The lint reports for this next command will
located in `output\message_lint_reports`
*message_lint %* `bin/message_lint test.json test.properties --output_folder ..\output`

% `message-lint/bin/message_lint test.json test.properties --output_folder ..\output`
The lint reports for this next command will located in `output\message_lint_reports`


---
---
52 changes: 7 additions & 45 deletions bin/message_lint
Original file line number Diff line number Diff line change
Expand Up @@ -3,52 +3,11 @@
import os
import sys
import argparse
import pathlib

fpath = os.path.join(os.path.dirname(__file__), '..')
sys.path.append(fpath)

import utils


def build_file_path(filename, target_path, extra_folder=None) -> str:
""" build a file_path """
file_path = os.path.abspath(filename)
p = pathlib.Path(file_path)
src_path = p.parents[0]
filename = p.name
print(src_path, filename)

if target_path is None:
target_path = src_path
else:
target_path = os.path.abspath(target_path)

if extra_folder is not None:
target_path = os.path.join(target_path, extra_folder, filename)
print("path of target folder:", target_path)

os.makedirs(target_path, exist_ok=True)
file_path = os.path.join(target_path, filename)
print("path of target file:", file_path)

return file_path


def main(args):
print(args.files)
print(args.output_folder)

for file in args.files:
reader = utils.FileReader.get(file)

# build file path for the output folder
file_path = build_file_path(file, args.output_folder, extra_folder="message_lint_reports")

writer = utils.FileWriter.get(file_path)

utils.FileProcessor(reader, writer).execute()

import message_lint

if __name__ == "__main__":
parser = argparse.ArgumentParser(
Expand All @@ -65,7 +24,10 @@ if __name__ == "__main__":
parser.add_argument(
"-v", "--version",
action="version",
version="%(prog)s 0.1.0")

version="%(prog)s {version}".format(version=message_lint.__version__))
parser.add_argument(
"--verbose",
action="store_true",
required=False)
arguments = parser.parse_args(args=None if sys.argv[1:] else ["--help"])
main(arguments)
message_lint.main(arguments)
1 change: 1 addition & 0 deletions images/message_lint_diagram.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions message_lint/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from message_lint.main import main

__version__ = "0.1.1"
56 changes: 56 additions & 0 deletions message_lint/fileprocessor.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
from .linter import lint
from pprint import PrettyPrinter

pp = PrettyPrinter(
indent=2,
width=100,
compact=True
)


class FileProcessor:
def __init__(self, reader, writer, logger):
self.reader = reader
self.writer = writer
self.logger = logger
self.content = {}

def execute(self) -> dict:
try:
self.content = self.reader.read()
# pp.pprint(self.content)
except FileNotFoundError:
print("Error: File Not Found: {0}".format(self.reader.filename))
return {}

# lookup-table that maps a message to its findings
findings = {}
for message_id, message in self.content.items():
if message is None:
continue

self.logger.log_info("Processing...\"{0}\": \"{1}\"".format(message_id, message))

if type(message) is dict and message['message'] is not None:
message = message['message']
elif type(message) is not str:
continue
else: # type(message) is str
pass

findings[message_id] = {
"message": message,
"linted": []
}

found_something = lint(message)

if len(found_something):
print(">>> '{0}': \"{1}\"".format(message_id, message))
for something in found_something:
findings[message_id]["linted"].append(something['desc'])
print(">>> {0}".format(something['desc']))
print('~' * 10)
self.writer.write(findings)
return findings

1 change: 1 addition & 0 deletions message_lint/linter/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from message_lint.linter.str_lint import lint
35 changes: 16 additions & 19 deletions utils/lint_rules.json → message_lint/linter/lint_rules.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@
"regexp": [
"([\\s]{2,})"
],
"outputFile": "extraneous_spaces",
"desc": "Extraneous Spaces detected"
},
{
Expand All @@ -12,63 +11,56 @@
"(\\,)$",
"^\\s*(and|or)\\b",
"\\b(the|to|by|on|or|and)\\b$",
"^{\\w+}$"
"^{\\w*\\}$"
],
"outputFile": "fragments",
"desc": "Sentence Fragments"
},
{
"regexp": [
"(the\\b\\{\\w+\\})",
"(a\\b\\{\\w+\\})",
"(an\\b\\{\\w+\\})",
"(a\\(n\\)\\b\\{\\w+\\})"
"(the\\s*\\{\\w*\\})",
"(a\\s*\\{\\w*\\})",
"(an\\s*\\{\\w*\\})",
"(a\\(n\\)\\s*\\{\\w*\\})"
],
"outputFile": "articles",
"desc": "definite and indefinite articles"
"desc": "definite and indefinite articles before placeholders"
},
{
"regexp": [
"(\\{\\w+\\}\\s*%)"
"(\\{\\w*\\}\\s*%)"
],
"outputFile": "percentage",
"desc": "percentage format"
},
{
"regexp": [
"\\'\\{\\'"
],
"outputFile": "placeholder_quotations",
"desc": "Incorrect placeholder quoting."
},
{
"regexp": [
"\\{\\w+\\}\\s*(year|month|week|day|hour|min|sec|groups|issues|users|people|other|boards|spaces)"
"\\{\\w+\\}\\s*(year|month|week|day|hour|min|sec)",
"\\{\\w+\\}\\s*(groups|issues|users|people|other|boards|spaces)"
],
"outputFile": "plural_nouns",
"desc": "Plural Nouns"
},
{
"regexp": [
"(http|https)://",
"(<a\\s*.*>\\s*.*<\\/a>)"
],
"outputFile": "url_uri",
"desc": "String Resource contains URIs/URLs"
},
{
"regexp": [
"\\{\\d+\\}",
"\\{\\s*\\}"
],
"outputFile": "numbered_placeholders",
"desc": "String Resource contains numbered placeholders like \u2019{0}\u2019. Please use variable names in placeholders. "
"desc": "Message contains numbered placeholders like \u2019{0}\u2019. Please use variable names in placeholders. "
},
{
"regexp": [
"\\{\\s*\\d\\s*,\\s*choice.+\\}"
],
"outputFile": "choice_formatted",
"desc": "Find string resources using the choice format"
},
{
Expand All @@ -77,7 +69,12 @@
"\\\"",
"[\\.]{3}"
],
"outputFile": "ascii_punct",
"desc": "ASCII punctuation in use. Best Practice is to use their Unicode equivalents."
},
{
"regexp": [
"^\\s*$"
],
"desc": "empty string"
}
]
File renamed without changes.
Loading

0 comments on commit 6609de5

Please sign in to comment.