Semantic Modeling Language, or SML for short, encompasses over a decade of hands-on development, solving use cases for hundreds of customers across industries such as finance, healthcare, retail, manufacturing, CPG, and more. SML covers more than just tabular use cases. At its core, it is a multidimensional semantic modeling language that supports metrics, dimensions, hierarchies, semi-additive measures, many-to-many relationships, cell-based expressions, and much more.
SML delivers on the following requirements:
- Object-oriented: SML is an object-oriented language that promotes composability and inheritance. This allows semantic objects to be shared within other semantic objects and across organizations, supporting easy and consistent model-building.
- Comprehensive: SML is based on more than a decade of modeling experience across various industry verticals and use cases. SML handles multi-dimensional constructs and serves as a superset of all other existing semantic modeling languages.
- Familiar: SML is based on YAML, a widely adopted, human-readable, industry-standard syntax. CI/CD Friendly: SML is code, so it is compatible with Git and CI/CD practices for version control, automated deployment, and software lifecycle management.
- Extensible: SML syntax can be enhanced to support additional properties and features. Open: SML is Apache open-sourced to support community innovation and is free to use in any application or use case.
Open-sourcing SML aims to promote the building of reusable models and semantic objects. We are making the SML specification available for public consumption and collaboration. Soon, we will add software tools to make serializations and translations from various semantic dialects easier.
We are or will be open-sourcing the following:
- A YAML-based Language Specification: The SML specification is documented and encompasses tabular and multidimensional constructs.
- Pre-built Semantic Models: The GitHub repository contains pre-built semantic models incorporating standard data models such as TPC-DS, common training models such as Worldwide Importers and AdventureWorks, and marketplace models such as Snowplow and CRISP. We expect to add semantic models for SaaS applications such as Salesforce, Google Analytics, and Jira soon.
- Helper Classes (coming soon): We will release helper classes that will facilitate the programmatic reading and writing of SML syntax.
- Semantic Translators (coming soon): We will release converters for migrating other semantic modeling languages to SML, including dbt Lab’s semantic layer and Power BI. Shortly, we expect to release a variety of converters to support the legacy (i.e., Microstrategy, Business Objects, Cognos) and modern (i.e. Looker) semantic modeling tools.
The following is an example of an SML model
object:
unique_name: Internet Sales
object_type: model
label: Internet Sales
visible: true
relationships:
- unique_name: factinternetsales_Date_Dimension_Order
from:
dataset: factinternetsales
join_columns:
- orderdatekey
to:
dimension: Date Dimension
level: DayMonth
role_play: "Order {0}"
dimensions:
- Color Dimension
- Size Dimension
- Style Dimension
- Weight
metrics:
- unique_name: orderquantity
folder: Sales Metrics
- unique_name: salesamount
folder: Sales Metrics
The following graphic illustrates the key SML objects and their relationships:
erDiagram
CATALOG }|..|{ MODEL : has
CATALOG }|..|{ PACKAGE : "may have"
MODEL ||--|{ DIMENSION : references
MODEL ||--|{ METRIC : references
MODEL ||--|{ METRIC_CALC : references
MODEL ||--|{ PACKAGE : "may reference"
MODEL ||--|{ DATASET : references
DIMENSION ||--|{ DATASET : references
MODEL ||--|{ ROW_SECURITY : "may reference"
DIMENSION ||--|{ ROW_SECURITY : "may reference"
METRIC ||--|{ DATASET : references
METRIC_CALC ||--|{ METRIC : "may reference"
METRIC_CALC ||--|{ DIMENSION : "may reference"
DATASET ||--|{ CONNECTION : references
The following sections describe the different SML object types as well as the properties available for each:
- Catalog - Defines the control file for a SML repository. It contains all repository-level definitions.
- Package - Defines additional Git repositories references whose objects can be used in the current repository.
- Model - Defines the logical, business-friendly representation on top of the physical data.
- Dimension - Defines the logical collection of attributes and hierarchies for supporting drill-down.
- Row Security - Defines row-level data access rules for users and groups.
- Metric - Defines a numeric value representing a summarized (or aggregated) column.
- Calculation - Defines an expression to combine, evaluate, or manipulate other metrics defined in the model.
- Dataset - Defines columns on a physical table or query. Columns can be defined as SQL expressions.
- Connection - Defines a database and schema for connecting datasets to the physical data platform.
- Composite Model - Defines a model made up of multiple other models.
- Internet Sales - a simple, single-fact model derived from the fictitious AdventureWorks retail dataset.
- World Wide Importers - a more complex, multi-fact model representing a fictional wholesale and distribution company.
- TPC-DS - a complex, multi-fact model that encodes the TPC-DS benchmark model in SML.
- TPC-H - a complex, multi-fact model that encodes the TPC-H benchmark model in SML.
- AdventureWorks2012 - the standard Microsoft SSAS tutorial in SML.
- Snowplow Digital Analytics Model - Snowplow empowers organizations to create a scalable, first-party data foundation so marketing and data teams can effectively analyze and tackle Customer 360 use cases.
- CRISP CPG Retail and Distributor Data Model - Crisp connects to over 40 leading U.S. retailers and distributors.