Releases: bmeares/Meerschaum
🔐 v2.8.4 Improve API allowed instance keys settings.
v2.8.4
-
Allow for pattern matching in
allowed_instance_keys
.
You may now generalize the instances exposed by the API by using Unix-style patterns in the listsystem:api:permissions:instances:allowed_instance_keys
:{ "api": { "permissions": { "instances": { "allowed_instance_keys": [ "valkey:*", "*_dev" ] } } } }
-
Return pipe attributes for the route
/pipes/{connector}/{metric}/{location}
.
The API routes/pipes/{connector}/{metric}/{location}
and/pipes/{connector}/{metric}/{location}/attributes
both return pipe attributes. -
Check entire batches for
verify rowcounts
.
The commandverify rowcounts
will now check batch boundaries before checking row-counts for individual chunks. This should moderately increase performance. -
Kill orphaned child processes when the parent job is killed.
Jobs created with pipeline arguments should now kill associated child processes. -
Add
--skip-hooks
.
The flag--skip-hooks
prevents any sync hooks from firing when syncing pipes. -
Remove datetime rounding from
parse_schedule()
.
Scheduled actions now behave as expected ― the current timestamp is no longer rounded to the nearest minute, which was causing issues with thestarting in
delay feature. -
Fix
allowed_instance_keys
enforcement.
🧹 v2.8.3 Clean up API endpoints.
v2.8.3
- Increase username limit to 60 characters.
- Add chunk retries to
Pipe.verify()
. - Add instance keys to remaining pipes endpoints.
- Misc bugfixes.
⚡️ v2.8.2 Add batches to `verify pipes`, allow for multiple instances from WebAPI, and memory improvements.
v2.8.0 – v2.8.2
-
Add batches to
Pipe.verify()
.
Verification syncs now run in sequential batches so that they may be interrupted and resumed. SeePipe.get_chunk_bounds_batches()
for more information:from datetime import timedelta import meerschaum as mrsm pipe = mrsm.Pipe('demo', 'get_chunk_bounds', instance='sql:local') bounds = pipe.get_chunk_bounds( chunk_interval=timedelta(hours=10), begin='2025-01-10', end='2025-01-15', bounded=True, ) batches = pipe.get_chunk_bounds_batches(bounds, workers=4) mrsm.pprint( [ tuple( (str(bounds[0]), str(bounds[1])) for bounds in batch ) for batch in batches ] ) # [ # ( # ('2025-01-10 00:00:00+00:00', '2025-01-10 10:00:00+00:00'), # ('2025-01-10 10:00:00+00:00', '2025-01-10 20:00:00+00:00'), # ('2025-01-10 20:00:00+00:00', '2025-01-11 06:00:00+00:00'), # ('2025-01-11 06:00:00+00:00', '2025-01-11 16:00:00+00:00') # ), # ( # ('2025-01-11 16:00:00+00:00', '2025-01-12 02:00:00+00:00'), # ('2025-01-12 02:00:00+00:00', '2025-01-12 12:00:00+00:00'), # ('2025-01-12 12:00:00+00:00', '2025-01-12 22:00:00+00:00'), # ('2025-01-12 22:00:00+00:00', '2025-01-13 08:00:00+00:00') # ), # ( # ('2025-01-13 08:00:00+00:00', '2025-01-13 18:00:00+00:00'), # ('2025-01-13 18:00:00+00:00', '2025-01-14 04:00:00+00:00'), # ('2025-01-14 04:00:00+00:00', '2025-01-14 14:00:00+00:00'), # ('2025-01-14 14:00:00+00:00', '2025-01-15 00:00:00+00:00') # ) # ]
-
Add
--skip-chunks-with-greater-rowcounts
toverify pipes
.
The flag--skip-chunks-with-greater-rowcounts
will compare a chunk's rowcount with the rowcount of the remote table and skip if the chunk is greater than or equal to the remote count. This is only applicable for connectors which implementremote=True
support forget_sync_time()
. -
Add
verify rowcounts
.
The actionverify rowcounts
(same as passing--check-rowcounts-only
toverify pipes
) will compare row-counts for a pipe's chunks against remote rowcounts. This is only applicable for connectors which implementget_pipe_rowcount()
with support forremote=True
. -
Add
remote
topipe.get_sync_time()
.
For pipes which support it (i.e. theSQLConnector
), the optionremote
is intended to return the sync time of a pipe's fetch definition, like the optionremote
inPipe.get_rowcount()
. -
Allow for the Web API to serve pipes from multiple instances.
You can disable this behavior by settingsystem:api:permissions:instances:allow_multiple_instances
tofalse
. You may also explicitly allow which instances may be accessed by the WebAPI by setting the listsystem:api:permissions:instances:allowed_instance_keys
(defaults to["*"]
). -
Fix memory leak for retrying failed chunks.
Failed chunks were kept in memory and retried later. In resource-intensive syncs with large chunks and high failures, this would result in large objects not being freed and hogging memory. This situation has been fixed. -
Add negation to job actions.
Prefix a job name with an underscore to select all other jobs. This is useful for filtering out noise forshow logs
. -
Add
Pipe.parent
.
As a quality-of-life improvement, the attributePipe.parent
will return the first member ofPipe.parents
(if available). -
Use the current instance for new tabs in the Webterm.
Clicking "New Tab" will open a newtmux
window using the currently selected instance on the Web Console. -
Other webterm quality-of-life improvements.
Added a size toggle button to allow for the webterm to take the entire page. -
Additional refactoring work.
The API endpoints code has been cleaned up. -
Added system configurations.
New options have been added to thesystem
configuration, such asmax_response_row_limit
,allow_multiple_instances
,allowed_instance_keys
.
🚸 v2.7.10 Add persistent webterms, limit concurrency for verify pipes.
v2.7.9 – v2.7.10
-
Add persistent Webterm sessions.
On the Web Console, the Webterm will attach to a persistent terminal for the current session's user. -
Reconnect Webterms after client disconnect.
If a Webterm socket connection is broken, the client logic will attempt to reconnect and attach to thetmux
session. -
Add
tmux
sessions to Webterms.
Webterm sessions now connect totmux
sessions (tied to the user accounts).
Setsystem:webterm:tmux:enabled
tofalse
to disabletmux
sessions. -
Limit concurrent connections during
verify pipes
.
To keep from exhausting the SQL connection pool, limit the number of concurrent intra-chunk connections. -
Return the precision and scale from a table's columns and types.
Reading a table's columns and types withmeerschaum.utils.sql.get_table_columns_types()
now returns the precision and scale forNUMERIC
(DECIMAL
) columns.
⚡️ v2.7.8 Memory improvements, add precision and scale support to numerics.
v2.7.8
-
Add support for user-supplied precision and scale for
numeric
columns.
You may now manually specify a numeric column's precision and scale:import meerschaum as mrsm pipe = mrsm.Pipe( 'demo', 'numeric', 'precision_scale', instance='sql:local', dtypes={'val': 'numeric[5,2]'}, ) pipe.sync([{'val': '123.456'}]) print(pipe.get_data()) # val # 0 123.46
-
Serialize
numeric
columns to exact values during bulk inserts.
Decimal values are serialized when inserting intoNUMERIC
columns during bulk inserts. -
Return a generator when fetching with
SQLConnector
.
To alleviate memory pressure, skip loading the entire dataframe when fetching. -
Add
json_serialize_value()
to handle custom dtypes.
When serializing documents, passjson_serialize_value
as the default handler:import json from decimal import Decimal from datetime import datetime, timezone from meerschaum.utils.dtypes import json_serialize_value print(json.dumps( { 'bytes': b'hello, world!', 'decimal': Decimal('1.000000001'), 'datetime': datetime(2025, 1, 1, tzinfo=timezone.utc), }, default=json_serialize_value, indent=4, )) # { # "bytes": "aGVsbG8sIHdvcmxkIQ==", # "decimal": "1.000000001", # "datetime": "2025-01-01T00:00:00+00:00" # }
-
Fix an issue with the
WITH
keyword in pipe definitions for MSSQL.
Previously, pipes with used with keywordWITH
but not as a CTE (e.g. to specify an index) were incorrectly parsed.
⚡️ v2.7.7 Index performance improvements, add drop indices and index pipes, and more.
v2.7.7
-
Add actions
drop indices
andindex pipes
.
You may now drop and create indices on pipes with the actionsdrop indices
andindex pipes
or the pipe methodsdrop_indices()
andcreate_indices()
:import meerschaum as mrsm pipe = mrsm.Pipe('demo', 'drop_indices', columns=['id'], instance='sql:local') pipe.sync([{'id': 1}]) print(pipe.get_columns_indices()) # {'id': [{'name': 'IX_demo_drop_indices_id', 'type': 'INDEX'}]} pipe.drop_indices() print(pipe.get_columns_indices()) # {} pipe.create_indices() print(pipe.get_columns_indices()) # {'id': [{'name': 'IX_demo_drop_indices_id', 'type': 'INDEX'}]}
-
Remove
CAST()
to datetime with selecting from a pipe's definition.
For some databases, casting to the same dtype causes the query optimizer to ignore the datetime index. -
Add
INCLUDE
clause to datetime index for MSSQL.
This is to coax the query optimizer into using the datetime axis. -
Remove redundant unique index.
The two competing unique indices have been combined into a single index (for the keyunique
). The unique constraint (whenupsert
is true) shares the name but has the prefixUQ_
in place ofIX_
. -
Add pipe parameter
null_indices
.
Set the pipe parameternull_indices
toFalse
for a performance improvement in situations where null index values are not expected. -
Apply backtrack minutes when fetching integer datetimes.
Backtrack minutes are now applied to pipes with integer datetimes axes.
🔧 v2.7.6 Make temporary table names configurable.
v2.7.6
-
Make temporary table names configurable.
The values for temporary SQL tables may be set inMRSM{system:connectors:sql:instance:temporary_target}
. The new default prefix is'_'
, and the new default transaction length is 4. The values have been re-ordered to target, transaction ID, then label. -
Add connector completions to
copy pipes
.
When copying pipes, the connector keys prompt will offer auto-complete suggestions. -
Fix stale job results.
When polling for job results, the job result is dropped from in-memory cache to avoid overwriting the on-disk result. -
Format row counts and seconds into human-friendly text.
Row counts and sync durations are now formatted into human-friendly representations. -
Add digits to
generate_password()
.
Random strings frommeerschaum.utils.misc.generate_password()
may now contain digits.
✅ v2.7.5 Enforce TZ-aware columns as UTC, add dynamic queries.
v2.7.3 – v2.7.5
-
Allow for dynamic targets in SQL queries.
Include a pipe definition in double curly braces (à la Jinja) to substitute a pipe's target into a templated query.import meerschaum as mrsm pipe = mrsm.Pipe('demo', 'template', target='foo', instance='sql:local') _ = pipe.register() downstream_pipe = mrsm.Pipe( 'sql:local', 'template', instance='sql:local', parameters={ 'sql': "SELECT *\nFROM {{Pipe('demo', 'template', instance='sql:local')}}" }, ) conn = mrsm.get_connector('sql:local') print(conn.get_pipe_metadef(downstream_pipe)) # WITH "definition" AS ( # SELECT * # FROM "foo" # ) # SELECT * # FROM "definition"
-
Add
--skip-enforce-dtypes
.
To override a pipe'senforce
parameter, pass--skip-enforce-dtypes
to a sync. -
Add bulk inserts for MSSQL.
To disable this behavior, setsystem:connectors:sql:bulk_insert:mssql
tofalse
. Bulk inserts for PostgreSQL-like flavors may now be disabled as well. -
Fix altering multiple column types for MSSQL.
When a table has multiple columns to be altered, each column will have its ownALTER TABLE
query. -
Skip enforcing custom dtypes when
enforce=False
.
To avoid confusion, special Meerschaum data types (numeric
,json
, etc.) are not coerced into objects whenenforce=False
. -
Fix timezone-aware casts.
A bug has been fixed where it was possible to mix timezone-aware and -naive casts in a single query. This patch ensures that this no longer occurs. -
Explicitly cast timezone-aware datetimes as UTC in SQL syncs.
By default, timezone-aware columns are now cast as time zone UTC in SQL. This may be skipped by settingenforce
toFalse
. -
Added virtual environment inter-process locks.
Competing processes now cooperate for virtual environment verification, which protects installed packages.
✨ v2.7.2 Add bytes, enforce, allow autoincrementing datetime index, improve MSSQL indices.
v2.7.0 – v2.7.2
-
Introduce the
bytes
data type.
Instance connectors which support binary data (e.g.SQLConnector
) may now take advantage of thebytes
dtype. Other connectors (e.g.ValkeyConnector
) may usemeerschaum.utils.dtypes.serialize_bytes()
to store binary data as a base64-encoded string.import meerschaum as mrsm pipe = mrsm.Pipe( 'demo', 'bytes', instance='sql:memory', dtypes={'blob': 'bytes'}, ) pipe.sync([ {'blob': b'hello, world!'}, ]) df = pipe.get_data() binary_data = df['blob'][0] print(binary_data.decode('utf-8')) # hello, world! from meerschaum.utils.dtypes import serialize_bytes, attempt_cast_to_bytes df['encoded'] = df['blob'].apply(serialize_bytes) df['decoded'] = df['encoded'].apply(attempt_cast_to_bytes) print(df) # blob encoded decoded # 0 b'hello, world!' aGVsbG8sIHdvcmxkIQ== b'hello, world!'
-
Allow for pipes to use the same column for
datetime
,primary
, andautoincrement=True
.
Pipes may now use the same column as thedatetime
axis andprimary
withautoincrement
set toTrue
.pipe = mrsm.Pipe( 'demo', 'datetime_primary_key', 'autoincrement', instance='sql:local', columns={ 'datetime': 'Id', 'primary': 'Id', }, autoincrement=True, )
-
Only join on
primary
when present.
When the indexprimary
is set, use the column as the primary joining index. This will improve performance when syncing tables with a primary key. -
Add the parameter
enforce
.
The parameterenforce
(defaultTrue
) toggles data type enforcement behavior. Whenenforce
isFalse
, incoming data will not be cast to the desired data types. For static datasets where the incoming data is always expected to be of the correct dtypes, then it is recommended to setenforce
toFalse
andstatic
toTrue
.from decimal import Decimal import meerschaum as mrsm pipe = mrsm.Pipe( 'demo', 'enforce', instance='sql:memory', enforce=False, static=True, autoincrement=True, columns={ 'primary': 'Id', 'datetime': 'Id', }, dtypes={ 'Id': 'int', 'Amount': 'numeric', }, ) pipe.sync([ {'Amount': Decimal('1.11')}, {'Amount': Decimal('2.22')}, ]) df = pipe.get_data() print(df)
-
Create the
datetime
axis as a clustered index for MSSQL, even when aprimary
index is specififed.
Specifying adatetime
andprimary
index will create a nonclusteredPRIMARY KEY
. Specifying the same column as bothdatetime
andprimary
will create a clustered primary key (tip: this is useful whenautoincrement=True
). -
Increase the default chunk interval to 43200 minutes.
New hypertables will use a default chunksize of 30 days (43200 minutes). -
Virtual environment bugfixes.
Existing virtual environment packages are backed up before re-initializing a virtual environment. This fixes the issue of disappearing dependencies. -
Store
numeric
asTEXT
for SQLite and DuckDB.
Due to limited precision,numeric
columns are now stored asTEXT
, then parsed intoDecimal
objects upon retrieval. -
Show the Webterm by default when changing instances.
On the Web Console, changing the instance select will make the Webterm visible. -
Improve dtype inference.
🎨 v2.6.17 Enhance pipeline editing, fix dropping pipes with custom schema.
v2.6.17
-
Add relative deltas to
starting in
scheduler syntax.
You may specify a delta in the job schedulerstarting
syntax:mrsm sync pipes -s 'daily starting in 30 seconds'
-
Fix
drop pipes
for pipes on custom schemas.
Pipes created under a specific schema are now correctly dropped. -
Enhance editing pipeline jobs.
Pipeline jobs now provide the job label as the default text to be edited. Pipeline arguments are now placed on a separate line to improve legibility. -
Disable the progress timer for jobs.
Thesync pipes
progress timer will now be hidden when running through a job. -
Unset
MRSM_NOASK
for daemons.
Now that jobs may accept user input, the environment variableMRSM_NOASK
is no longer needed for jobs run as daemons (executorlocal
). -
Replace
Cx_Oracle
withoracledb
.
The Oracle SQL driver is no longer required now that the default Python binding for Oracle isoracledb
. -
Fix Oracle auto-incrementing for good.
At long last, the mystery of Oracle auto-incrementing identity columns has been laid to rest.