Releases: bmeares/Meerschaum
🚸 v2.7.10 Add persistent webterms, limit concurrency for verify pipes.
v2.7.9 – v2.7.10
-
Add persistent Webterm sessions.
On the Web Console, the Webterm will attach to a persistent terminal for the current session's user. -
Reconnect Webterms after client disconnect.
If a Webterm socket connection is broken, the client logic will attempt to reconnect and attach to thetmux
session. -
Add
tmux
sessions to Webterms.
Webterm sessions now connect totmux
sessions (tied to the user accounts).
Setsystem:webterm:tmux:enabled
tofalse
to disabletmux
sessions. -
Limit concurrent connections during
verify pipes
.
To keep from exhausting the SQL connection pool, limit the number of concurrent intra-chunk connections. -
Return the precision and scale from a table's columns and types.
Reading a table's columns and types withmeerschaum.utils.sql.get_table_columns_types()
now returns the precision and scale forNUMERIC
(DECIMAL
) columns.
⚡️ v2.7.8 Memory improvements, add precision and scale support to numerics.
v2.7.8
-
Add support for user-supplied precision and scale for
numeric
columns.
You may now manually specify a numeric column's precision and scale:import meerschaum as mrsm pipe = mrsm.Pipe( 'demo', 'numeric', 'precision_scale', instance='sql:local', dtypes={'val': 'numeric[5,2]'}, ) pipe.sync([{'val': '123.456'}]) print(pipe.get_data()) # val # 0 123.46
-
Serialize
numeric
columns to exact values during bulk inserts.
Decimal values are serialized when inserting intoNUMERIC
columns during bulk inserts. -
Return a generator when fetching with
SQLConnector
.
To alleviate memory pressure, skip loading the entire dataframe when fetching. -
Add
json_serialize_value()
to handle custom dtypes.
When serializing documents, passjson_serialize_value
as the default handler:import json from decimal import Decimal from datetime import datetime, timezone from meerschaum.utils.dtypes import json_serialize_value print(json.dumps( { 'bytes': b'hello, world!', 'decimal': Decimal('1.000000001'), 'datetime': datetime(2025, 1, 1, tzinfo=timezone.utc), }, default=json_serialize_value, indent=4, )) # { # "bytes": "aGVsbG8sIHdvcmxkIQ==", # "decimal": "1.000000001", # "datetime": "2025-01-01T00:00:00+00:00" # }
-
Fix an issue with the
WITH
keyword in pipe definitions for MSSQL.
Previously, pipes with used with keywordWITH
but not as a CTE (e.g. to specify an index) were incorrectly parsed.
⚡️ v2.7.7 Index performance improvements, add drop indices and index pipes, and more.
v2.7.7
-
Add actions
drop indices
andindex pipes
.
You may now drop and create indices on pipes with the actionsdrop indices
andindex pipes
or the pipe methodsdrop_indices()
andcreate_indices()
:import meerschaum as mrsm pipe = mrsm.Pipe('demo', 'drop_indices', columns=['id'], instance='sql:local') pipe.sync([{'id': 1}]) print(pipe.get_columns_indices()) # {'id': [{'name': 'IX_demo_drop_indices_id', 'type': 'INDEX'}]} pipe.drop_indices() print(pipe.get_columns_indices()) # {} pipe.create_indices() print(pipe.get_columns_indices()) # {'id': [{'name': 'IX_demo_drop_indices_id', 'type': 'INDEX'}]}
-
Remove
CAST()
to datetime with selecting from a pipe's definition.
For some databases, casting to the same dtype causes the query optimizer to ignore the datetime index. -
Add
INCLUDE
clause to datetime index for MSSQL.
This is to coax the query optimizer into using the datetime axis. -
Remove redundant unique index.
The two competing unique indices have been combined into a single index (for the keyunique
). The unique constraint (whenupsert
is true) shares the name but has the prefixUQ_
in place ofIX_
. -
Add pipe parameter
null_indices
.
Set the pipe parameternull_indices
toFalse
for a performance improvement in situations where null index values are not expected. -
Apply backtrack minutes when fetching integer datetimes.
Backtrack minutes are now applied to pipes with integer datetimes axes.
🔧 v2.7.6 Make temporary table names configurable.
v2.7.6
-
Make temporary table names configurable.
The values for temporary SQL tables may be set inMRSM{system:connectors:sql:instance:temporary_target}
. The new default prefix is'_'
, and the new default transaction length is 4. The values have been re-ordered to target, transaction ID, then label. -
Add connector completions to
copy pipes
.
When copying pipes, the connector keys prompt will offer auto-complete suggestions. -
Fix stale job results.
When polling for job results, the job result is dropped from in-memory cache to avoid overwriting the on-disk result. -
Format row counts and seconds into human-friendly text.
Row counts and sync durations are now formatted into human-friendly representations. -
Add digits to
generate_password()
.
Random strings frommeerschaum.utils.misc.generate_password()
may now contain digits.
✅ v2.7.5 Enforce TZ-aware columns as UTC, add dynamic queries.
v2.7.3 – v2.7.5
-
Allow for dynamic targets in SQL queries.
Include a pipe definition in double curly braces (à la Jinja) to substitute a pipe's target into a templated query.import meerschaum as mrsm pipe = mrsm.Pipe('demo', 'template', target='foo', instance='sql:local') _ = pipe.register() downstream_pipe = mrsm.Pipe( 'sql:local', 'template', instance='sql:local', parameters={ 'sql': "SELECT *\nFROM {{Pipe('demo', 'template', instance='sql:local')}}" }, ) conn = mrsm.get_connector('sql:local') print(conn.get_pipe_metadef(downstream_pipe)) # WITH "definition" AS ( # SELECT * # FROM "foo" # ) # SELECT * # FROM "definition"
-
Add
--skip-enforce-dtypes
.
To override a pipe'senforce
parameter, pass--skip-enforce-dtypes
to a sync. -
Add bulk inserts for MSSQL.
To disable this behavior, setsystem:connectors:sql:bulk_insert:mssql
tofalse
. Bulk inserts for PostgreSQL-like flavors may now be disabled as well. -
Fix altering multiple column types for MSSQL.
When a table has multiple columns to be altered, each column will have its ownALTER TABLE
query. -
Skip enforcing custom dtypes when
enforce=False
.
To avoid confusion, special Meerschaum data types (numeric
,json
, etc.) are not coerced into objects whenenforce=False
. -
Fix timezone-aware casts.
A bug has been fixed where it was possible to mix timezone-aware and -naive casts in a single query. This patch ensures that this no longer occurs. -
Explicitly cast timezone-aware datetimes as UTC in SQL syncs.
By default, timezone-aware columns are now cast as time zone UTC in SQL. This may be skipped by settingenforce
toFalse
. -
Added virtual environment inter-process locks.
Competing processes now cooperate for virtual environment verification, which protects installed packages.
✨ v2.7.2 Add bytes, enforce, allow autoincrementing datetime index, improve MSSQL indices.
v2.7.0 – v2.7.2
-
Introduce the
bytes
data type.
Instance connectors which support binary data (e.g.SQLConnector
) may now take advantage of thebytes
dtype. Other connectors (e.g.ValkeyConnector
) may usemeerschaum.utils.dtypes.serialize_bytes()
to store binary data as a base64-encoded string.import meerschaum as mrsm pipe = mrsm.Pipe( 'demo', 'bytes', instance='sql:memory', dtypes={'blob': 'bytes'}, ) pipe.sync([ {'blob': b'hello, world!'}, ]) df = pipe.get_data() binary_data = df['blob'][0] print(binary_data.decode('utf-8')) # hello, world! from meerschaum.utils.dtypes import serialize_bytes, attempt_cast_to_bytes df['encoded'] = df['blob'].apply(serialize_bytes) df['decoded'] = df['encoded'].apply(attempt_cast_to_bytes) print(df) # blob encoded decoded # 0 b'hello, world!' aGVsbG8sIHdvcmxkIQ== b'hello, world!'
-
Allow for pipes to use the same column for
datetime
,primary
, andautoincrement=True
.
Pipes may now use the same column as thedatetime
axis andprimary
withautoincrement
set toTrue
.pipe = mrsm.Pipe( 'demo', 'datetime_primary_key', 'autoincrement', instance='sql:local', columns={ 'datetime': 'Id', 'primary': 'Id', }, autoincrement=True, )
-
Only join on
primary
when present.
When the indexprimary
is set, use the column as the primary joining index. This will improve performance when syncing tables with a primary key. -
Add the parameter
enforce
.
The parameterenforce
(defaultTrue
) toggles data type enforcement behavior. Whenenforce
isFalse
, incoming data will not be cast to the desired data types. For static datasets where the incoming data is always expected to be of the correct dtypes, then it is recommended to setenforce
toFalse
andstatic
toTrue
.from decimal import Decimal import meerschaum as mrsm pipe = mrsm.Pipe( 'demo', 'enforce', instance='sql:memory', enforce=False, static=True, autoincrement=True, columns={ 'primary': 'Id', 'datetime': 'Id', }, dtypes={ 'Id': 'int', 'Amount': 'numeric', }, ) pipe.sync([ {'Amount': Decimal('1.11')}, {'Amount': Decimal('2.22')}, ]) df = pipe.get_data() print(df)
-
Create the
datetime
axis as a clustered index for MSSQL, even when aprimary
index is specififed.
Specifying adatetime
andprimary
index will create a nonclusteredPRIMARY KEY
. Specifying the same column as bothdatetime
andprimary
will create a clustered primary key (tip: this is useful whenautoincrement=True
). -
Increase the default chunk interval to 43200 minutes.
New hypertables will use a default chunksize of 30 days (43200 minutes). -
Virtual environment bugfixes.
Existing virtual environment packages are backed up before re-initializing a virtual environment. This fixes the issue of disappearing dependencies. -
Store
numeric
asTEXT
for SQLite and DuckDB.
Due to limited precision,numeric
columns are now stored asTEXT
, then parsed intoDecimal
objects upon retrieval. -
Show the Webterm by default when changing instances.
On the Web Console, changing the instance select will make the Webterm visible. -
Improve dtype inference.
🎨 v2.6.17 Enhance pipeline editing, fix dropping pipes with custom schema.
v2.6.17
-
Add relative deltas to
starting in
scheduler syntax.
You may specify a delta in the job schedulerstarting
syntax:mrsm sync pipes -s 'daily starting in 30 seconds'
-
Fix
drop pipes
for pipes on custom schemas.
Pipes created under a specific schema are now correctly dropped. -
Enhance editing pipeline jobs.
Pipeline jobs now provide the job label as the default text to be edited. Pipeline arguments are now placed on a separate line to improve legibility. -
Disable the progress timer for jobs.
Thesync pipes
progress timer will now be hidden when running through a job. -
Unset
MRSM_NOASK
for daemons.
Now that jobs may accept user input, the environment variableMRSM_NOASK
is no longer needed for jobs run as daemons (executorlocal
). -
Replace
Cx_Oracle
withoracledb
.
The Oracle SQL driver is no longer required now that the default Python binding for Oracle isoracledb
. -
Fix Oracle auto-incrementing for good.
At long last, the mystery of Oracle auto-incrementing identity columns has been laid to rest.
🐛 v2.6.16 Fix inplace syncs without a datetime column.
v2.6.15 – v2.6.16
-
Fix inplace syncs without a
datetime
axis.
A bug introduced by a performance optimization has been fixed. Inplace pipes without adatetime
axis will skip searching for date bounds. Settingupsert
totrue
will bypass this bug for previous releases. -
Skip invoking
get_sync_time()
for pipes without adatetime
axis.
Invoking an instance connector'sget_sync_time()
method will now only occur whendatetime
is set. -
Remove
guess_datetime()
check fromSQLConnector.get_sync_time()
.
Because sync times are only checked for pipes with a dedicateddatetime
column, theguess_datetime()
check has been removed from theSQLConnector.get_sync_time()
method. -
Skip persisting default
target
to parameters.
The default target table name will no longer be persisted toparameters
. This helps avoid accidentally setting the wrong target table when copying pipes. -
Default to "no" for syncing data when copying pipes.
The actioncopy pipes
will no longer sync data by default, instead requiring an explicit yes to begin syncing. -
Fix the "Update query" button behavior on the Web Console.
Existing but null keys are now accounted for when update a SQL pipe's query. -
Fix another Oracle autoincrement edge case.
Resetting the autoincrementing primary key value on Oracle will now behave as expected.
⚡️ v2.6.14 Speed up dtype enforcement, fix Oracle auto-incrementing IDs.
v2.6.10 – v2.6.14
-
Improve datetime timezone-awareness enforcement performance.
Datetime columns are only parsed for timezone awareness if the desired awareness differs. This drastically speeds up sync times. -
Switch to
tz_localize()
when stripping timezone information.
The previous method of using a lambda to replace individualtzinfo
attributes did not scale well. Usingtz_localize()
can be vectorized and greatly speeds up syncs, especially with large chunks. -
Add
enforce_dtypes
toPipe.filter_existing()
.
You may optionally enforce dtype information duringfilter_existing()
. This may be useful when implementing custom syncs for instance connectors. Note this may impact memory and compute performance.import meerschaum as mrsm import pandas as pd pipe = mrsm.Pipe('a', 'b', instance='sql:local') pipe.sync([{'a': 1}]) df = pd.DataFrame([{'a': '2'}]) ### `enforce_dtypes=True` will suppress the differing dtypes warning. unseen, update, delta = pipe.filter_existing(df, enforce_dtypes=True) print(delta)
-
Fix
query_df()
for null parameters.
This is useful for when you may usequery_df()
with onlyselect_columns
oromit_columns
. -
Fix autoincrementing IDs for Oracle SQL.
-
Enforce security settings for creating jobs.
Jobs and remote actions will only be accessible to admin users when running with--secure
(system:permissions:actions:non_admin
in config).
⚡️ v2.6.7 Fix dtypes for new indices, improve metadata caching.
v2.6.6 – v2.6.8
-
Improve metadata performance when syncing.
Syncs via the SQLConnector now cache schema and index metadata, speeding up transactions. -
Fix upserts for MySQL / MariaDB.
Upserts in MySQL and MariaDB now useON DUPLICATE
instead ofREPLACE INTO
. -
Fix dtype detection for index columns.
A bug where new index columns were incorrectly created asINT
has been fixed. -
Delete old keys when dropping Valkey pipes.
Dropping a pipe from Valkey now clears all old index keys. -
Fix timezone-aware enforcement bugs.