Make it easy to define/code/review a complex pipeline of transformation rules.
- The rules define a DAG/Directed-Acyclic-Graph, going from raw-data to complex measures. Intermediate nodes could also be used as measures.
- The DAG should be easily readable and modifiable but a human, not necessarily a developer.
- The DAG can express simple operations like
SUM
orPRODUCT
, and complex operations likeGROUPBY
.
Humans are generally happier when things goes faster. Adhoc
enables split-second queries over the underlying table. As the slow-part of a full AdhocQuery
is generally the
table pre-aggregation, we do not target absolute performance in Adhoc
. In other words, we prefer things to remains slower, as long as it enables
this project to remains simpler, given a query is generally slow due to the underlying table.
- SQLServer. We rely on many concepts from SQLServer to define our own abstractions.
- Apache Beam. Though Beam seems less flexible to access intermediate results as intermediate measures.
- MongoDB Aggregation Pipeline.
- DAX enables complex queries in Microsoft eco-system.
- SquashQL is an SQL query-engine for OLAP, with a strong emphasis on its typescript UI.
- Atoti PostProcessors is the standard Atoti way of building complex tree of measures on top of Atoti cubes.
RAM: any JVM can run Adhoc, as Adhoc does not store data: it queries on-the-fly the underlying/external tables. CPU: any JVM can run Adhoc. If multiple cores are available, Adhoc will takes advantage of them. But even a single-core JVM can run Adhoc queries smoothly.
- Ensure you have JDK 21 available
- Add a (
maven
/gradle
) dependency toeu.solven.adhoc:adhoc:0.0.1
(yet to be released. You have to compile the project locally for now) - Define an
IAdhocTableWrapper
: it defines how Adhoc can access your data
Assuming your data is queryable with JooQ:
myTableWrapper = new AdhocJooqTableWrapper(AdhocJooqTableWrapperParameters.builder()
.dslSupplier(DuckDbHelper.inMemoryDSLSupplier())
.table(yourJooqTableLike)
.build());
For local .parquet
files, it can be done with:
myTableWrapper = new AdhocJooqTableWrapper(AdhocJooqTableWrapperParameters.builder()
.dslSupplier(DuckDbHelper.inMemoryDSLSupplier())
.table(DSL.table(DSL.unquotedName("read_parquet('myRootFolder/2025-*-BaseFacts_*.parquet', union_by_name=True)")))
.build());
- Define a
MeasureBag
: it defines the measures and the links between them
An early-stage measureBag could look like:
Aggregator k1Sum = Aggregator.builder().name("k1").aggregationKey(SumAggregator.KEY).build();
Aggregator k2Sum = Aggregator.builder().name("k2").aggregationKey(SumAggregator.KEY).build();
Combinator k1PlusK2AsExpr = Combinator.builder()
.name("k1PlusK2AsExpr")
.underlyings(Arrays.asList("k1", "k2"))
.combinationKey(ExpressionCombination.KEY)
.combinationOptions(ImmutableMap.<String, Object>builder().put("expression", "IF(k1 == null, 0, k1) + IF(k2 == null, 0, k2)").build())
.build();
AdhocMeasureBag measureBag = AdhocMeasureBag.builder().build();
measureBag.addMeasure(k1Sum);
measureBag.addMeasure(k2Sum);
measureBag.addMeasure(k1PlusK2AsExpr);
- Defines an Adhoc Engine: it know how to execute a query given the measure relationships
AdhocQueryEngine aqe = AdhocQueryEngine.builder().eventBus(AdhocTestHelper.eventBus()).measureBag(measureBag).build();
- Define your query
ITabularView result = aqe.execute(AdhocQuery.builder().measure(k1SumSquared.getName()).debug(true).build(), jooqDb);
MapBasedTabularView mapBased = MapBasedTabularView.load(result);
Assertions.assertThat(mapBased.keySet().map(AdhocSliceAsMap::getCoordinates).toList())
.containsExactly(Map.of());
Assertions.assertThat(mapBased.getCoordinatesToValues())
.containsEntry(Map.of(), Map.of(k1SumSquared.getName(), (long) Math.pow(123 + 234, 2)));
- Execute your query
An AdhocQuery
is similar to a SELECT ... WHERE ... GROUP BY ...
SQL statement. It is defined by:
- a list of
groupBy
columns. - a set of
filter
clauses. - a list of measures, being either aggregated or transformed measures.
graph TB
sum -- haircut --> delta
sum -- haircut --> gamma
sum.FR -- country=France --> sum
ratio.FR --> sum.FR
ratio.FR --> sum
Adhoc is not a database, it is a query engine. It knows how to execute complex KPI queries, typically defined as complex graph of logics. The leaves of these graphes are pre-aggregated measures, to be provided by external tables.
Typical tables are:
- CSV or Parquet files: Adhoc recommends querying local/remote CSV/Parquet files through DuckDb, with the JooqSqlTable.
- Any SQL table: you should rely on JooqSqlTable, possibly requiring a Professional or Enterprise JooQ license.
- ActivePivot/Atoti
- Your own Database implementing
IAdhocTableWrapper
Given tables may hold similar data but with different column names. A IAdhocTableWrapper
enables coding once per table such a mapping.
A default IAdhocTableWrapper
assumes IAdhocQuery
columns matches the IAdhocTableWrapper
columns.
In case of a table with JOIN
s, one would often encounter ambiguities when querying a field. For instance when:
- querying a field used in a JOIN definition: the same name may appear in multiple tables
- querying joined tables with
*
, but tables have conflicting field names.
In such a case, one can resolve ambiguities by resolving them in a IAdhocTableWrapper
. For instance:
MapTableTranscoder.builder()
.queriedToUnderlying("someColumn", "someTable.someColumn")
.build()
Tables may not all accept query with similar types. Typically, one may filter a column with an enum
while given enum
type
may be unknown to the table.
This can be managed with a ICustomTypeManager
, which will handle type-transcoding on a per-column per-value basis.
A measure can be:
- an aggregated measure (a column aggregated by an aggregation function)
- an transformed measure (one or multiple measures are mixed together, possibly with additional
filter
and/orgroupBys
).
A set of measures defines a Directed-Acyclic-Graph, where leaves are pre-aggregated measures and nodes are transformed measures. The DAG is typically evaluated on a per-query basis, as the AdhocQuery groupBy
and filter
has to be combined with the own measures groupBys
and filters
.
Measures are evaluated for a slice, defined by the groupBy
and the filter
of its parent node. The root node have they groupBy
and filter
defined by the AdhocQuery.
- Combinator neither change the
groupBy
nor thefilter
. - Filtrator adds a
filter
, AND-ed with node ownfilter
. - Bucketor adds a
groupBy
, UNION-ed with node owngroupBy
.
Aggregations are used to reduce input data up to the requested (by groupBys
) granularity. Multiple aggregation functions may be applied over the same column.
See https://support.microsoft.com/en-us/office/aggregate-function-43b9278e-6aa7-4f17-92b6-e19993fa26df
ExpressionAggregation
enable custom expression for table processing.
For instance, in DuckDB, one can use the syntax SUM("v") FILTER color = 'red'
. It can be used as aggregator:
Aggregator.builder()
.name("v_RED")
.aggregationKey(ExpressionAggregation.KEY)
.column("max(\"v\") FILTER(\"color\" in ('red'))")
.build();
On top of aggregated-measures, one can define transformators.
- Combinator: the simplest transformation evaluate a formula over underlying measures. (e.g.
sumMeasure=a+b
). - Filtrator: evaluate underlying measure with a coordinate when the filter is enforced. The node
filter
is AND-ed with themeasure
filter. Hence, if the query filterscountry=France
and the filtrator filterscountry=Germany
, then the result is empty. - Bucketor: evaluates the underlying measures with an additional groupBy, then aggregates up to the node granularity.
- Dispatchor: given an cell, it will contribute into multiple cells. Useful for
many-to-many
orrebucketing
.
In Analysis-Services, Many-to-Many is a feature enabling a fact (i.e. an input row) to contribute into multiple coordinate of a given column.
For instance, in a flatten GeographicalZone
column (e.g. having flattened a hierarchical Region->Country->City
), a single Paris
fact would contribute into Paris
, France
and Europe
.
This can be achieved in Adhoc with a Dispatchor
.
A full example is visible in TestManyToManyAdhocQuery
.
references:
Sometimes, one wants to filter the visible members along some columns, without filtering the actual query. Typically, one may want
to query the ratio France/Europe
by filtering the France
country, without restricting Europe
to France
-only. For now, this can not be easily done.
The underlying issue is that one mah have a column filtering Country-with_firstLetterIsForG
. Assuming we have a measure returning currentCountry/Europe
where currentCountry
is the country on the Country
column, if we filter Country-with_firstLetterIsForG=true
in the query, should we show
France/(France+Germany)
or France/Europe
?
- We may introduce a special
groupBy
, where we would express we groupBycountry
but only showingCountry-with_firstLetterIsForG=true
- We may introduce a special
filter
, stating thatCountry-with_firstLetterIsForG=true
is aVisual
filter. It resonates with https://learn.microsoft.com/en-us/sql/mdx/visualtotals-mdx?view=sql-server-ver16
Typical errors with Adhoc are :
- Issue referencing underlying Table columns
- Unexpected data returned by underlying Table
Tools to investigate these issues are:
- Enable
debug
in your query:AdhocQuery.builder()[...].debug(true).build()
- Enable
explain
in your query:AdhocQuery.builder()[...].explain(true).build()
Debug will enable various additional [DEBUG]
logs with INFO
logLevel. It may also conduct additional operations (like executing some sanity checks), or enforcing some ordering to facilitate some investigations. It may lead to very poor performances.
Explain will provide additional information about the on-going query. It will typically log the query executed to the underlying table.
SQL integration is provided with the help of JooQ. To query a complex star/snowflake schema (i.e. with many/deep joins), one should provide a TableLike
expressing these JOIN
s.
For instance:
Table<Record> fromClause = DSL.table(DSL.name(factTable))
.as("f")
.join(DSL.table(DSL.name(productTable)).as("p"))
.using(DSL.field("productId"))
.join(DSL.table(DSL.name(countryTable)).as("c"))
.using(DSL.field("countryId"));
See eu.solven.adhoc.column.IMissingColumnManager.onMissingColumn(String)
Right-management is typically implemented by an AND
operation combining the user filter
and a filter based on user-rights.
This can be achieved through IAdhocImplicitFilter
. A Spring-Security example is demonstrated in TestImplicitFilter_SpringSecurity
.
- [Perf] Concurrency: for instance when processing the DAG given the results from the
TableQuery
- [CodeGen] BDD: Automated generation of Scenario given an
IAdhocQuery
- [Feature] Introduce the concept of multiLevel hierarchies, hence implicitly the concept of slicing hierarchies. For now, each hierarchy is optional: no hierarchy is required in groupBy (or implicit on some default member).
- [Resiliency] Resiliency on querySteps errors: a failing measure should not break the whole query.
- [Explain] EXPLAIN should go through CompositeCube and down to
IAdhocTableWrapper
own EXPLAIN - [DuckDB] Detect multiple tableQueries having very similar filters, and do a single query with multiple
FILTER
aggregates. (May happen when many queries are filtrator). - [Documentation]
MeasuresSetFromResource
should enable generation of Graphiviz.dot
files - [Perf]
MultiTypeStorage
should not rely onhash-based
structure when Adhoc guarantee to publish distinct keys.