Skip to content

Commit

Permalink
feat: new extractor (#88)
Browse files Browse the repository at this point in the history
 - improved performance
 - parsing all files not just those importing from `@tolgee/*`
 - default namespace option
 - option to not be so strict with namespace detectability
  • Loading branch information
stepan662 authored Jul 2, 2024
1 parent 9d79095 commit f6b7fd3
Show file tree
Hide file tree
Showing 135 changed files with 4,809 additions and 4,801 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ on:
pull_request:

jobs:
lint:
eslint:
name: Eslint
runs-on: ubuntu-latest
steps:
Expand All @@ -22,7 +22,7 @@ jobs:
run: npm ci

- name: Run eslint
run: npm run lint
run: npm run eslint

test-unit:
name: Unit Tests
Expand Down
78 changes: 49 additions & 29 deletions HACKING.md
Original file line number Diff line number Diff line change
@@ -1,38 +1,47 @@
# Hacking

Here are some internal info that might be useful for new contributors trying to understand the codebase and how
to get some work done.

## Toolchain

To work on this project, you will just need Node 16+ (and Docker to run tests). We use `npm` to manage dependencies,
and [prettier](https://github.com/prettier/prettier) to lint our code.

## Scripts

These are the runnable scripts with `npm run`:

General:
- `run-dev`: Run the CLI (with `ts-node`). Use `--` to pass arguments to the CLI rather than NPM: \
`npm run run-dev -- extract print --extractor react src/**/*.tsx`
- `build`: Build the CLI.
- `prettier`: Run Prettier.
- `lint`: Run Prettier but does not update files (report-only).
- `schema`: Generate REST API schemas (see [REST Client](#rest-client))

- `run-dev`: Run the CLI (with `ts-node`). Use `--` to pass arguments to the CLI rather than NPM: \
`npm run run-dev -- extract print --extractor react src/**/*.tsx`
- `build`: Build the CLI.
- `prettier`: Run Prettier.
- `lint`: Run Prettier but does not update files (report-only).
- `schema`: Generate REST API schemas (see [REST Client](#rest-client))

Tests:
- `test`: Run all tests (Unit & E2E). Will start (and stop) the E2E Tolgee test instance
- `test:unit`: Run unit tests only.
- `test:e2e`: Run e2e tests only. Will start (and stop) the E2E Tolgee test instance

- `test`: Run all tests (Unit & E2E). Will start the E2E Tolgee test instance
- `test:unit`: Run unit tests only.
- `test:e2e`: Run e2e tests only. Will start the E2E Tolgee test instance

E2E test instance:
- `tolgee:start`: Start the E2E testing instance. Will be available on port 22222.
- `tolgee:stop`: Stop the E2E testing instance.

- `tolgee:start`: Start the E2E testing instance. Will be available on port 22222.
- `tolgee:stop`: Stop the E2E testing instance.

## Code & internals overview

### Command parsing

The CLI uses [commander.js](https://github.com/tj/commander.js) to handle the whole command parsing & routing logic.
As the way we deal with arguments is more complex than what the library can do by itself, we have some extra validation
logic.

### Config loading & validation

We use [cosmiconfig](https://github.com/davidtheclark/cosmiconfig) to handle the loading of the `.tolgeerc` file.
There is also a module that manages the authentication token store (`~/.tolgee/authentication.json`). These modules
can be found in `src/config`.
Expand All @@ -41,31 +50,42 @@ The `.tolgeerc` file is loaded at program startup, and the tokens (which depend
custom validation logic.

### REST Client
The REST Client to interact with the Tolgee API is a light abstraction that uses types generated from our OpenAPI
specifications. Feel free to add new methods in the client if you need them. It can be found in `src/config`.

ApiClient uses `openapi-typescript` to generate typescript schema and `openapi-fetch` for fetching, so it is fully typed client. Endpoints that use `multipart/form-data` are a bit problematic (check `ImportClient.ts`).

### Extractor
The Tolgee Extractor/Code Analyzer is one of the biggest components of the CLI. Tolgee uses TextMate grammars to
parse source code files, and then uses states machines powered by [XState](https://github.com/statelyai/xstate) to
perform the actual extraction.

The Tolgee Extractor/Code Analyzer is one of the biggest components of the CLI, it has following layers:

1. TextMate grammars to parse source code files and generate tokens
2. Mappers (generalMapper, jsxMapper, vueMapper), which rename tokens to general tolgee tokens (which are typed)
1. Because tokens are abstracted to general ones, we can reuse many pieces of logic across different file types
3. Mergers allow merging multiple tokens into one, this has two usecases:
1. Simplifying tokens (e.g. there are three tokens specifying a string, which can be merged into one)
2. Generating trigger tokens (e.g. `<T` is merged into `trigger.t.component`) - these triggers are then mapped to custom rules
4. Very simple semantic tree is then constructed, where we identify blocks, expressions and objects + when there is a trigger, a custom rule is applied and there are special node types for important pieces (like `KeyInfoNode` and `NamespaceInfoNode`)
5. Last step is generating report from the semantic tree, we look if the values are static or dynamic and because we keep the structure of blocks, we know which `useTranslate` belongs to which `t` function
1. Tree can be manipulated before the report is generated (with `treeTransform` function), which is used for `vue` and `svelte`, so the `script` tags are hoisted to the top and so on

#### Adding new TextMate grammars

To add new TextMate grammars, **do not do it manually**! Modify the `scripts/grammars.js` file following these
steps:

- Add the URL to the grammar file to the `Grammars` dictionary.
- Add applicable licensing information to the `GrammarsLicense` dictionary.
- If you need to transform the TextMate grammar:
- In the `Transformers` object, add a function that'll receive the raw TextMate grammar
- Make sure to add a comment to the file stating the file is modified, a link to the original, and a reason for
the transformation
- *Hint*: Look at how the transformation for `TypeScriptReact` is done.
- In `src/extractor/tokenizer.ts`:
- Add a new entry to the `Grammar` enum
- Add a new entry to the `GrammarFiles` dict
- Add new cases in the `extnameToGrammar` function

----
- Add the URL to the grammar file to the `Grammars` dictionary.
- Add applicable licensing information to the `GrammarsLicense` dictionary.
- If you need to transform the TextMate grammar:
- In the `Transformers` object, add a function that'll receive the raw TextMate grammar
- Make sure to add a comment to the file stating the file is modified, a link to the original, and a reason for
the transformation
- _Hint_: Look at how the transformation for `TypeScriptReact` is done.
- In `src/extractor/tokenizer.ts`:
- Add a new entry to the `Grammar` enum
- Add a new entry to the `GrammarFiles` dict
- Add new cases in the `extnameToGrammar` function

---

Feel free to join the [Slack channel](https://tolg.ee/slack) if you have questions!

Happy hacking 🐀
22 changes: 5 additions & 17 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 1 addition & 3 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
"test:package": "node scripts/validatePackage.js",
"tolgee:start": "node scripts/startDocker.js",
"tolgee:stop": "docker stop tolgee_cli_e2e",
"lint": "eslint --ext .ts --ext .js --ext .cjs ./src ./scripts vitest.config.ts",
"eslint": "eslint --max-warnings 0 --ext .ts --ext .js --ext .cjs ./src ./scripts vitest.config.ts",
"prettier": "prettier --write ./src ./scripts vitest.config.ts",
"run-dev": "cross-env NODE_OPTIONS=\"--import=./scripts/registerTsNode.js\" node ./src/cli.ts",
"schema": "openapi-typescript http://localhost:22222/v3/api-docs/All%20Internal%20-%20for%20Tolgee%20Web%20application --output src/client/internal/schema.generated.ts",
Expand All @@ -37,10 +37,8 @@
"json5": "^2.2.3",
"jsonschema": "^1.4.1",
"openapi-fetch": "^0.9.7",
"undici": "^5.22.1",
"vscode-oniguruma": "^1.7.0",
"vscode-textmate": "^9.0.0",
"xstate": "^4.38.1",
"yauzl": "^2.10.0"
},
"devDependencies": {
Expand Down
24 changes: 18 additions & 6 deletions schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,23 +5,35 @@
"description": "Project ID. Only required when using a Personal Access Token.",
"type": ["number", "string"]
},
"extractor": {
"description": "A path to a custom extractor to use instead of the default one.",
"type": "string"
},
"apiUrl": {
"description": "The url of Tolgee API.",
"type": "string"
},
"format": {
"$ref": "#/$defs/format"
},
"extractor": {
"description": "A path to a custom extractor to use instead of the default one.",
"type": "string"
},
"patterns": {
"description": "File glob patterns to your source code, used for keys extraction.",
"type": "array",
"items": {
"type": "string"
}
},
"format": {
"$ref": "#/$defs/format"
"strictNamespace": {
"description": "Require namespace to be reachable, turn off if you don't use namespaces. (Default: true)",
"type": "boolean"
},
"defaultNamespace": {
"description": "Default namespace used in extraction if not specified otherwise.",
"type": "string"
},
"parser": {
"description": "Override parser detection.",
"enum": ["react", "vue", "svelte"]
},
"push": {
"type": "object",
Expand Down
2 changes: 1 addition & 1 deletion scripts/configType.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@ import { writeFileSync } from 'fs';
import { compileFromFile } from 'json-schema-to-typescript';

// compile from file
compileFromFile('schema.json', {}).then((ts) =>
compileFromFile('schema.json', { additionalProperties: false }).then((ts) =>
writeFileSync('./src/schema.d.ts', ts)
);
10 changes: 5 additions & 5 deletions scripts/validatePackage.js
Original file line number Diff line number Diff line change
Expand Up @@ -59,29 +59,29 @@ console.log('OK: tolgee help works');
// 2. ensure `tolgee extract` works
// this test is to ensure textmate grammars have been imported work
console.log('TEST: tolgee extract print works');
const TEST_EXTRACTOR_FILE = join(PACKAGE_DEST, 'test.js');
const TEST_EXTRACTOR_FILE = join(PACKAGE_DEST, 'test.tsx');
await writeFile(
TEST_EXTRACTOR_FILE,
`import '@tolgee/react'\nReact.createElement(T, { keyName: 'owo' })`
);
const tolgeeExtract = execOrError(
'npx --no tolgee extract print --patterns test.js',
'npx --no tolgee extract print --patterns test.tsx',
{
cwd: PACKAGE_DEST,
}
);
ok(tolgeeExtract.toString().includes('1 key found in test.js:'));
ok(tolgeeExtract.toString().includes('1 key found in test.tsx:'));
console.log('OK: tolgee extract print works');

// 3. ensure `tolgee-cli/extractor` types are importable
console.log('TEST: tolgee-cli/extractor types are importable');
const TEST_TYPE_FILE = join(PACKAGE_DEST, 'test.ts');
const TEST_TYPE_FILE = join(PACKAGE_DEST, 'test.tsx');
await writeFile(
TEST_TYPE_FILE,
`import type { ExtractionResult } from '@tolgee/cli/extractor'`
);
execOrError('npm i typescript', { cwd: PACKAGE_DEST });
const tsc = execOrError('npx --no tsc -- --noEmit --lib es2022 test.ts', {
const tsc = execOrError('npx --no tsc -- --noEmit --lib es2022 test.tsx', {
cwd: PACKAGE_DEST,
});
ok(!tsc.length);
Expand Down
47 changes: 20 additions & 27 deletions src/cli.ts
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,15 @@ import {
API_KEY_OPT,
API_URL_OPT,
CONFIG_OPT,
DEFAULT_NAMESPACE,
EXTRACTOR,
FILE_PATTERNS,
FORMAT_OPT,
STRICT_NAMESPACE,
PARSER,
PROJECT_ID_OPT,
STRICT_NAMESPACE_NEGATION,
VERBOSE,
} from './options.js';
import {
API_KEY_PAK_PREFIX,
Expand Down Expand Up @@ -135,52 +140,40 @@ const preHandler = (config: Schema) =>
}

// Apply verbosity
setDebug(prog.opts().verbose);
setDebug(Boolean(prog.opts().verbose));
};

const program = new Command('tolgee')
.version(VERSION)
.configureOutput({ writeErr: error })
.description('Command Line Interface to interact with the Tolgee Platform')
.option('-v, --verbose', 'Enable verbose logging.');

.description('Command Line Interface to interact with the Tolgee Platform');
// get config path to update defaults
const configPath = getSingleOption(CONFIG_OPT, process.argv);

async function loadConfig(program: Command) {
const tgConfig = await loadTolgeeRc(configPath);

if (tgConfig) {
[program, ...program.commands].forEach((cmd) =>
cmd.options.forEach((opt) => {
const key = opt.attributeName();
const value = (tgConfig as any)[key];
if (value) {
const parsedValue = opt.parseArg
? opt.parseArg(value, undefined)
: value;
cmd.setOptionValueWithSource(key, parsedValue, 'config');
}
})
);
}

return tgConfig ?? {};
}

async function run() {
try {
const config = await loadConfig(program);
program.hook('preAction', preHandler(config));

// Global options
program.addOption(VERBOSE);
program.addOption(CONFIG_OPT);
program.addOption(API_URL_OPT.default(DEFAULT_API_URL));
program.addOption(API_URL_OPT.default(config.apiUrl ?? DEFAULT_API_URL));
program.addOption(API_KEY_OPT);
program.addOption(PROJECT_ID_OPT.default(-1));
program.addOption(FORMAT_OPT.default('JSON_TOLGEE'));
program.addOption(EXTRACTOR);
program.addOption(FILE_PATTERNS);

const config = await loadConfig(program);
program.hook('preAction', preHandler(config));
program.addOption(PROJECT_ID_OPT.default(config.projectId ?? -1));
program.addOption(FORMAT_OPT.default(config.format ?? 'JSON_TOLGEE'));
program.addOption(EXTRACTOR.default(config.extractor));
program.addOption(FILE_PATTERNS.default(config.patterns));
program.addOption(PARSER.default(config.parser));
program.addOption(STRICT_NAMESPACE.default(config.strictNamespace ?? true));
program.addOption(STRICT_NAMESPACE_NEGATION);
program.addOption(DEFAULT_NAMESPACE.default(config.defaultNamespace));

// Register commands
program.addCommand(Login);
Expand Down
Loading

0 comments on commit f6b7fd3

Please sign in to comment.