Skip to content

Commit

Permalink
update readmes
Browse files Browse the repository at this point in the history
  • Loading branch information
pokornyd committed Jan 13, 2025
1 parent 64c0b11 commit 7f66ffa
Show file tree
Hide file tree
Showing 3 changed files with 130 additions and 59 deletions.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,7 @@ dist
.svelte-kit

### VisualStudioCode ###
.vscode/*
.vscode/
!.vscode/settings.json
!.vscode/tasks.json
!.vscode/launch.json
Expand Down
28 changes: 20 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Install the package via npm

### Parsing rich text HTML to a JSON tree

The tool provides environment-aware `parseHtml` function to transform HTML into an array of simplified JSON trees. Any valid HTML is parsed, including all attributes. Together with built-in transformation methods, this is a suitable option for processing HTML and rich text from external sources, to make it compatible with Kontent.ai rich text format. See dedicated [JSON transformer docs](docs/index.md) for further information.
The tool provides environment-aware (browser or Node.js) `parseHtml` function to transform HTML into an array of simplified JSON trees. Any valid HTML is parsed, including all attributes. Together with built-in transformation methods, this tool is a suitable option for processing HTML and rich text from external sources, to make it compatible with Kontent.ai rich text format. See dedicated [JSON transformer docs](docs/index.md) for further information.

### Portable text resolution

Expand All @@ -44,31 +44,37 @@ Combined with a suitable package for the framework of your choice, this makes fo
#### Custom portable text blocks

Besides default blocks for common elements, Portable Text supports custom blocks, which can represent other entities. Each custom block should extend `ArbitraryTypedObject` to ensure `_key` and `_type` properties are present. Key should be a unique identifier (e.g. guid), while type should indicate what the block represents. Value of `_type` property is used for subsequent override and resolution purposes.
Besides default blocks for common elements, Portable Text supports custom blocks, which can represent other entities. Each custom block should extend `ArbitraryTypedObject` to ensure `_key` and `_type` properties are present. Key should be a unique identifier (e.g. guid), while type should indicate what the block represents. Value of `_type` property is used for mapping purposes in subsequent resolution.

**This package comes with built-in custom block definitions for representing Kontent.ai rich text entities:**

##### Component/linked item
##### Component/linked item**PortableTextComponentOrItem**

https://github.com/kontent-ai/rich-text-resolver-js/blob/6fe68490a32bb304d141cff741fb7e57001550eb/showcase/showcase.ts#L3-L11

##### Image
##### Image**PortableTextImage**

https://github.com/kontent-ai/rich-text-resolver-js/blob/6fe68490a32bb304d141cff741fb7e57001550eb/showcase/showcase.ts#L13-L22

> [!TIP]
> For image resolution, you may use `resolveImage` helper function. You can provide it either with a custom resolution method or use provided default implementations.
> Package provides helpers for image resolution:
> * React: `ImageComponent` component, accepting `PortableTextImage` as a prop.
> * HTML: `resolveImage` function, accepting `PortableTextImage` and an optional custom resolver.
> * Vue: `resolveImageVue` function, accepting `PortableTextImage`, Vue render function and an optional custom resolver.
##### Item link
##### Item link**PortableTextItemLink**

https://github.com/kontent-ai/rich-text-resolver-js/blob/6fe68490a32bb304d141cff741fb7e57001550eb/showcase/showcase.ts#L24-L31

##### Table
##### Table**PortableTextTable**

https://github.com/kontent-ai/rich-text-resolver-js/blob/6fe68490a32bb304d141cff741fb7e57001550eb/showcase/showcase.ts#L33-L59

> [!TIP]
> For table resolution, you may use `resolveTable` helper function. You can provide it either with a custom resolution method or use a default implementation from a resolution package of your choice (such as `toHTML` or `toPlainText`).
> Package provides helpers for table resolution:
> * React: `TableComponent` component, accepting `PortableTextTable` as a prop.
> * HTML: `resolveTable` function, accepting `PortableTextTable` and an optional custom resolver.
> * Vue: `resolveTableVue` function, accepting `PortableTextTable`, Vue render function and an optional custom resolver.
## Examples

Expand All @@ -77,6 +83,12 @@ https://github.com/kontent-ai/rich-text-resolver-js/blob/6fe68490a32bb304d141cff
Package exports a `traversePortableText` method, which accepts an array of `PortableTextObject` and a callback function. The method recursively traverses all nodes and their subnodes, optionally modifying them with the provided callback:

```ts
import {
PortableTextObject,
transformToPortableText,
traversePortableText,
} from "@kontent-ai/rich-text-resolver";

const input = `<figure data-asset-id="guid" data-image-id="guid"><img src="https://asseturl.xyz" data-asset-id="guid" data-image-id="guid" alt=""></figure>`;

// Adds height parameter to asset reference and changes _type.
Expand Down
159 changes: 109 additions & 50 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# JSON Transformers

This module provides an environment-aware `parseHtml` function to convert an HTML string into an array of nodes. The JSON structure can subsequently be transformed using one of the provided transformation methods, either to a modified HTML string or a completely different structure, both in synchronous and asynchronous manner.
This module provides an environment-aware (browser or Node.js) `parseHtml` function to convert an HTML string into an array of nodes. The JSON structure can subsequently be transformed using one of the provided transformation methods, either to a modified HTML string or a completely different structure, both in synchronous and asynchronous manner.

This toolset can be particularly useful for transforming rich text or HTML content from external sources into a valid Kontent.ai rich text format in migration scenarios.

Expand Down Expand Up @@ -38,12 +38,12 @@ The resulting array can be transformed using one of the functions included in th

### HTML Transformation

To transform the `DomNode` array back to HTML, you can use `nodesToHtml` function or its async variant `nodesToHtmlAsync`. The function accepts the parsed array and a `transformers` object, which defines custom transformation for each HTML node. Text nodes are transformed automatically. A wildcard `*` can be used to define fallback transformation for all tags not explicitly defined. If no explicit or wildcard transformation is provided, default resolution is used.
To transform the `DomNode` array back to HTML, you can use `nodesToHtml` function or its async variant `nodesToHtmlAsync`. The function accepts the parsed array and a `transformers` object, which defines custom transformation for each HTML node. Text nodes are transformed automatically. A wildcard `*` can be used to define fallback transformation for remaining tags. If no explicit or wildcard transformation is provided, default resolution is used.

#### Basic
Basic example of HTML transformation, removing HTML attribute `style` and transforming `b` tag to `strong`:
```ts
import { nodesToHtml } from 'your-json-transformer';
import { nodesToHtml, NodeToStringMap, parseHtml } from '@kontent-ai/rich-text-resolver';

const rawHtml = `<p style="color:red">Hello <b>World!</b></p>`;
const parsedNodes = parseHtml(rawHtml);
Expand All @@ -55,79 +55,78 @@ const transformers: NodeToStringMap = {
};

// restores original HTML with attributes
let htmlOutput = nodesToHtml(parsedNodes, {});
console.log(htmlOutput); // <p style="color:red">Hello <b>World!</b></p>
const defaultOutput = nodesToHtml(parsedNodes, {});
console.log(defaultOutput); // <p style="color:red">Hello <b>World!</b></p>

// b is converted to strong, wildcard transformation omits attributes from remaining nodes
htmlOutput = nodesToHtml(parsedNodes, transformers);
console.log(htmlOutput); // <p>Hello <strong>World!</strong></p>
const customOutput = nodesToHtml(parsedNodes, transformers);
console.log(customOutput); // <p>Hello <strong>World!</strong></p>
```

#### Advanced
For more complex scenarios, optional context and its handler can be passed to the transformation function as third and fourth parameters respectively. Context can be accessed when defining node transformations. If a handler is provided, it clones the context before it's modified and passed to child node processing, thus maintaining correct context state for each node.

Example showcasing asynchronous transformation of `<img>` tag to `<figure>`, simultaneously uploading the image to Kontent.ai asset library, using SDK `ManagementClient` provided as context:
For more complex scenarios, optional context and its handler can be passed to the top level transformation function (`nodesToHtml` or its async variant) as third and fourth parameters respectively.

The context can then be accessed in individual transformations, defined in the `transformers` object. If you need to dynamically update the context, you may optionally provide a context handler, which accepts current node and context as parameters and passes a cloned, modified context for child node processing, ensuring each node gets valid contextual data.

##### Transforming img tag and creating an asset in the process (no handler)

In Kontent.ai rich text, images are represented by a `<figure>` tag, with `data-asset-id` attribute referencing an existing asset in the asset library. Transforming an `img` tag is therefore a two-step process:

1. Load the binaries from `src` attribute and create an asset in Kontent.ai asset library
2. Use the asset ID from previous step to reference the asset in the transformed `<figure>` tag.

For that matter, we will use `nodesToHtmlAsync` method and pass an instance of JS SDK `ManagementClient` as context, to perform the asset creation. Since we don't need to modify the client in any way, we can omit the context handler for this example.

```ts
import axios from "axios";
import { ManagementClient } from "@kontent-ai/management-sdk";
import {
parseHtml,
AsyncNodeToStringMap,
nodesToHtmlAsync,
} from "@kontent-ai/rich-text-resolver";

const input = `<p><img src="https://website.com/image.jpg" alt="some image"></p>`;
const input = `<img src="https://website.com/image.jpg" alt="some image">`;
const nodes = parseHtml(input);

// type parameter specifies context type, in this case ManagementClient
const transformers: AsyncNodeToStringMap<ManagementClient> = {
img: async (node, _, client) => {
return await new Promise<string>((resolve, reject) => {
// context (client) can be accessed as a third parameter in each transformation
img: async (node, _, client) =>
await new Promise<string>(async (resolve, reject) => {
if (!client) {
reject("Client is not provided");
return;
}

const src: string = node.attributes.src;

axios.get(src, { responseType: "arraybuffer" }).then(async (response) => {
const base64 = Buffer.from(response.data, "binary").toString("base64");
const filename = src.split("/").pop()!;
const uploadResponse = await client
.uploadBinaryFile()
.withData({
binaryData: base64,
contentLength: base64.length,
contentType:
response.headers["content-type"] || "application/octet-stream",
filename,
})
.toPromise()
.then((res) => res.data);

const fileReference = {
id: uploadResponse.id,
type: "internal" as const,
};

const image = await client
.addAsset()
.withData(() => ({
file_reference: fileReference,
title: filename,
const fileName = src.split("/").pop() || "untitled_file";

// SDK provides a helper method for creating an asset from URL
const assetId = await client
.uploadAssetFromUrl()
.withData({
binaryFile: {
filename: fileName,
},
fileUrl: src,
asset: {
title: fileName,
descriptions: [
{
language: { codename: "default" },
description: node.attributes.alt ?? "Auto-generated asset",
description: node.attributes.alt,
},
],
}))
.toPromise()
.then((res) => res.data);

resolve(`<figure data-asset-id="${image.id}"></figure>`);
});
});
},
},
})
.toPromise()
.then((res) => res.data.id) // get asset ID from the response
.catch((err) => reject(err));

// return transformed tag, referencing the newly created asset
resolve(`<figure data-asset-id="${assetId}"></figure>`);
}),
};

const richText = nodesToHtmlAsync(
Expand All @@ -139,8 +138,67 @@ const richText = nodesToHtmlAsync(
})
);

console.log(richText);
// <p><figure data-asset-id="cc8f13a2-e0fb-468b-ba18-344c6e2ecb66"></figure></p>
console.log(richText);
// <figure data-asset-id="cc8f13a2-e0fb-468b-ba18-344c6e2ecb66"></figure>
```

##### Removing nested divs and spans (with context handler)

Assume we have a scenario where we want to transform external HTML to Kontent.ai rich text. The HTML may contain divs and spans, which are not valid rich text tags. Furthermore, these tags can be nested on multiple levels, so a simple transformation `div/span → p` may not suffice, as it could result in nested `p` tags, which is not a valid HTML.

In this case, we can store depth as a context and increment it via handler anytime we access a nested div/span. We will then define transformers for top level divs and spans to be converted to `p`. Remaining nested invalid tags will be removed.

> [!WARNING]
> The below example is primarily intended as a showcase of context handling during transformation. Unwrapping divs and spans in this manner may still result in an invalid HTML. While a more complex transformation logic can be defined to fit your requirements, we ideally advise you to split the original HTML into multiple elements and for rich text processing, isolate the content originally created in a rich text editor, as it may prove easier to transform in this manner.
```ts
import {
nodesToHtml,
DomNode,
NodeToStringMap,
parseHtml,
} from "@kontent-ai/rich-text-resolver";

type DepthContext = {
divSpanDepth: number;
};

const input = `
<div>Top level
<span> some text <div>nested <span>deep</span></div></span>
</div>
<div>Another top-level div <span>with text</span></div>
`;

const parsedNodes = parseHtml(input);

// handler increments depth whenever we encounter a div or span tag node.
const depthHandler = (node: DomNode, context: DepthContext): DepthContext =>
node.type === "tag" && (node.tagName === "div" || node.tagName === "span")
? { ...context, divSpanDepth: context.divSpanDepth + 1 } // return new context with incremented depth
: context; // return the same context if not div/span

const transformers: NodeToStringMap<DepthContext> = {
// we'll only define transformations for 'div' and 'span'. Default resolution will transform remaining tags.
div: (_, children, context) =>
// topmost div is at depth=1, as context is updated before processing.
context?.divSpanDepth === 1 ? `<p>${children}</p>` : children,

// same for span
span: (_, children, context) =>
context?.divSpanDepth === 1 ? `<p>${children}</p>` : children,
};

const output = nodesToHtml(
parsedNodes,
transformers,
{ divSpanDepth: 0 }, // initial context
depthHandler
);

console.log(output);
// <p>Top level some text nested deep</p><p>Another top-level div with text</p>

```

### Generic transformation
Expand All @@ -150,6 +208,7 @@ Should you need to transform the nodes to a different structure, rather than HTM
Snippet showcasing use of `transformNodes` to convert the `DomNode` array into Portable Text, as used internally in this module. Full source code in [the corresponding TS file](../src/transformers/portable-text-transformer/portable-text-transformer.ts).

```ts
// context stores current list type and list item depth
type ListContext = {
depth: number;
type: "number" | "bullet" | "unknown";
Expand Down

0 comments on commit 7f66ffa

Please sign in to comment.