Skip to content

Releases: bmeares/Meerschaum

🚸 v2.7.10 Add persistent webterms, limit concurrency for verify pipes.

12 Jan 20:58
39b41eb
Compare
Choose a tag to compare

v2.7.9 – v2.7.10

  • Add persistent Webterm sessions.
    On the Web Console, the Webterm will attach to a persistent terminal for the current session's user.

  • Reconnect Webterms after client disconnect.
    If a Webterm socket connection is broken, the client logic will attempt to reconnect and attach to the tmux session.

  • Add tmux sessions to Webterms.
    Webterm sessions now connect to tmux sessions (tied to the user accounts).
    Set system:webterm:tmux:enabled to false to disable tmux sessions.

  • Limit concurrent connections during verify pipes.
    To keep from exhausting the SQL connection pool, limit the number of concurrent intra-chunk connections.

  • Return the precision and scale from a table's columns and types.
    Reading a table's columns and types with meerschaum.utils.sql.get_table_columns_types() now returns the precision and scale for NUMERIC (DECIMAL) columns.

⚡️ v2.7.8 Memory improvements, add precision and scale support to numerics.

11 Jan 03:55
f6bb6f1
Compare
Choose a tag to compare

v2.7.8

  • Add support for user-supplied precision and scale for numeric columns.
    You may now manually specify a numeric column's precision and scale:

    import meerschaum as mrsm
    
    pipe = mrsm.Pipe(
        'demo', 'numeric', 'precision_scale',
        instance='sql:local',
        dtypes={'val': 'numeric[5,2]'},
    )
    pipe.sync([{'val': '123.456'}])
    print(pipe.get_data())
    #       val
    # 0  123.46
  • Serialize numeric columns to exact values during bulk inserts.
    Decimal values are serialized when inserting into NUMERIC columns during bulk inserts.

  • Return a generator when fetching with SQLConnector.
    To alleviate memory pressure, skip loading the entire dataframe when fetching.

  • Add json_serialize_value() to handle custom dtypes.
    When serializing documents, pass json_serialize_value as the default handler:

    import json
    from decimal import Decimal
    from datetime import datetime, timezone
    from meerschaum.utils.dtypes import json_serialize_value
    
    print(json.dumps(
        {
            'bytes': b'hello, world!',
            'decimal': Decimal('1.000000001'),
            'datetime': datetime(2025, 1, 1, tzinfo=timezone.utc),
        },
        default=json_serialize_value,
        indent=4,
    ))
    # {
    #     "bytes": "aGVsbG8sIHdvcmxkIQ==",
    #     "decimal": "1.000000001",
    #     "datetime": "2025-01-01T00:00:00+00:00"
    # }
  • Fix an issue with the WITH keyword in pipe definitions for MSSQL.
    Previously, pipes with used with keyword WITH but not as a CTE (e.g. to specify an index) were incorrectly parsed.

⚡️ v2.7.7 Index performance improvements, add drop indices and index pipes, and more.

09 Jan 00:46
79c48a0
Compare
Choose a tag to compare

v2.7.7

  • Add actions drop indices and index pipes.
    You may now drop and create indices on pipes with the actions drop indices and index pipes or the pipe methods drop_indices() and create_indices():

    import meerschaum as mrsm
    
    pipe = mrsm.Pipe('demo', 'drop_indices', columns=['id'], instance='sql:local')
    pipe.sync([{'id': 1}])
    print(pipe.get_columns_indices())
    # {'id': [{'name': 'IX_demo_drop_indices_id', 'type': 'INDEX'}]}
    
    pipe.drop_indices()
    print(pipe.get_columns_indices())
    # {}
    
    pipe.create_indices()
    print(pipe.get_columns_indices())
    # {'id': [{'name': 'IX_demo_drop_indices_id', 'type': 'INDEX'}]}
  • Remove CAST() to datetime with selecting from a pipe's definition.
    For some databases, casting to the same dtype causes the query optimizer to ignore the datetime index.

  • Add INCLUDE clause to datetime index for MSSQL.
    This is to coax the query optimizer into using the datetime axis.

  • Remove redundant unique index.
    The two competing unique indices have been combined into a single index (for the key unique). The unique constraint (when upsert is true) shares the name but has the prefix UQ_ in place of IX_.

  • Add pipe parameter null_indices.
    Set the pipe parameter null_indices to False for a performance improvement in situations where null index values are not expected.

  • Apply backtrack minutes when fetching integer datetimes.
    Backtrack minutes are now applied to pipes with integer datetimes axes.

🔧 v2.7.6 Make temporary table names configurable.

07 Jan 22:32
ff6bbbe
Compare
Choose a tag to compare

v2.7.6

  • Make temporary table names configurable.
    The values for temporary SQL tables may be set in MRSM{system:connectors:sql:instance:temporary_target}. The new default prefix is '_', and the new default transaction length is 4. The values have been re-ordered to target, transaction ID, then label.

  • Add connector completions to copy pipes.
    When copying pipes, the connector keys prompt will offer auto-complete suggestions.

  • Fix stale job results.
    When polling for job results, the job result is dropped from in-memory cache to avoid overwriting the on-disk result.

  • Format row counts and seconds into human-friendly text.
    Row counts and sync durations are now formatted into human-friendly representations.

  • Add digits to generate_password().
    Random strings from meerschaum.utils.misc.generate_password() may now contain digits.

✅ v2.7.5 Enforce TZ-aware columns as UTC, add dynamic queries.

30 Dec 03:44
e350002
Compare
Choose a tag to compare

v2.7.3 – v2.7.5

  • Allow for dynamic targets in SQL queries.
    Include a pipe definition in double curly braces (à la Jinja) to substitute a pipe's target into a templated query.

    import meerschaum as mrsm
    
    pipe = mrsm.Pipe('demo', 'template', target='foo', instance='sql:local')
    _ = pipe.register()
    
    downstream_pipe = mrsm.Pipe(
        'sql:local', 'template',
        instance='sql:local',
        parameters={
            'sql': "SELECT *\nFROM {{Pipe('demo', 'template', instance='sql:local')}}"
        },
    )
    
    conn = mrsm.get_connector('sql:local')
    print(conn.get_pipe_metadef(downstream_pipe))
    # WITH "definition" AS (
    #     SELECT *
    #     FROM "foo"
    # )
    # SELECT *
    # FROM "definition"
  • Add --skip-enforce-dtypes.
    To override a pipe's enforce parameter, pass --skip-enforce-dtypes to a sync.

  • Add bulk inserts for MSSQL.
    To disable this behavior, set system:connectors:sql:bulk_insert:mssql to false. Bulk inserts for PostgreSQL-like flavors may now be disabled as well.

  • Fix altering multiple column types for MSSQL.
    When a table has multiple columns to be altered, each column will have its own ALTER TABLE query.

  • Skip enforcing custom dtypes when enforce=False.
    To avoid confusion, special Meerschaum data types (numeric, json, etc.) are not coerced into objects when enforce=False.

  • Fix timezone-aware casts.
    A bug has been fixed where it was possible to mix timezone-aware and -naive casts in a single query. This patch ensures that this no longer occurs.

  • Explicitly cast timezone-aware datetimes as UTC in SQL syncs.
    By default, timezone-aware columns are now cast as time zone UTC in SQL. This may be skipped by setting enforce to False.

  • Added virtual environment inter-process locks.
    Competing processes now cooperate for virtual environment verification, which protects installed packages.

✨ v2.7.2 Add bytes, enforce, allow autoincrementing datetime index, improve MSSQL indices.

27 Dec 21:07
60e985a
Compare
Choose a tag to compare

v2.7.0 – v2.7.2

  • Introduce the bytes data type.
    Instance connectors which support binary data (e.g. SQLConnector) may now take advantage of the bytes dtype. Other connectors (e.g. ValkeyConnector) may use meerschaum.utils.dtypes.serialize_bytes() to store binary data as a base64-encoded string.

    import meerschaum as mrsm
    
    pipe = mrsm.Pipe(
        'demo', 'bytes',
        instance='sql:memory',
        dtypes={'blob': 'bytes'},
    )
    pipe.sync([
        {'blob': b'hello, world!'},
    ])
    
    df = pipe.get_data()
    binary_data = df['blob'][0]
    print(binary_data.decode('utf-8'))
    # hello, world!
    
    from meerschaum.utils.dtypes import serialize_bytes, attempt_cast_to_bytes
    df['encoded'] = df['blob'].apply(serialize_bytes)
    df['decoded'] = df['encoded'].apply(attempt_cast_to_bytes)
    print(df)
    #                blob               encoded           decoded
    # 0  b'hello, world!'  aGVsbG8sIHdvcmxkIQ==  b'hello, world!'
  • Allow for pipes to use the same column for datetime, primary, and autoincrement=True.
    Pipes may now use the same column as the datetime axis and primary with autoincrement set to True.

    pipe = mrsm.Pipe(
        'demo', 'datetime_primary_key', 'autoincrement',
        instance='sql:local',
        columns={
            'datetime': 'Id',
            'primary': 'Id',
        },
        autoincrement=True,
    )
  • Only join on primary when present.
    When the index primary is set, use the column as the primary joining index. This will improve performance when syncing tables with a primary key.

  • Add the parameter enforce.
    The parameter enforce (default True) toggles data type enforcement behavior. When enforce is False, incoming data will not be cast to the desired data types. For static datasets where the incoming data is always expected to be of the correct dtypes, then it is recommended to set enforce to False and static to True.

    from decimal import Decimal
    import meerschaum as mrsm
    
    pipe = mrsm.Pipe(
        'demo', 'enforce',
        instance='sql:memory',
        enforce=False,
        static=True,
        autoincrement=True,
        columns={
            'primary': 'Id',
            'datetime': 'Id',
        },
        dtypes={
            'Id': 'int',
            'Amount': 'numeric',
        },
    )
    pipe.sync([
        {'Amount': Decimal('1.11')},
        {'Amount': Decimal('2.22')},
    ]) 
    
    df = pipe.get_data()
    print(df)
  • Create the datetime axis as a clustered index for MSSQL, even when a primary index is specififed.
    Specifying a datetime and primary index will create a nonclustered PRIMARY KEY. Specifying the same column as both datetime and primary will create a clustered primary key (tip: this is useful when autoincrement=True).

  • Increase the default chunk interval to 43200 minutes.
    New hypertables will use a default chunksize of 30 days (43200 minutes).

  • Virtual environment bugfixes.
    Existing virtual environment packages are backed up before re-initializing a virtual environment. This fixes the issue of disappearing dependencies.

  • Store numeric as TEXT for SQLite and DuckDB.
    Due to limited precision, numeric columns are now stored as TEXT, then parsed into Decimal objects upon retrieval.

  • Show the Webterm by default when changing instances.
    On the Web Console, changing the instance select will make the Webterm visible.

  • Improve dtype inference.

🎨 v2.6.17 Enhance pipeline editing, fix dropping pipes with custom schema.

11 Dec 01:05
a289921
Compare
Choose a tag to compare

v2.6.17

  • Add relative deltas to starting in scheduler syntax.
    You may specify a delta in the job scheduler starting syntax:

    mrsm sync pipes -s 'daily starting in 30 seconds'
    
  • Fix drop pipes for pipes on custom schemas.
    Pipes created under a specific schema are now correctly dropped.

  • Enhance editing pipeline jobs.
    Pipeline jobs now provide the job label as the default text to be edited. Pipeline arguments are now placed on a separate line to improve legibility.

  • Disable the progress timer for jobs.
    The sync pipes progress timer will now be hidden when running through a job.

  • Unset MRSM_NOASK for daemons.
    Now that jobs may accept user input, the environment variable MRSM_NOASK is no longer needed for jobs run as daemons (executor local).

  • Replace Cx_Oracle with oracledb.
    The Oracle SQL driver is no longer required now that the default Python binding for Oracle is oracledb.

  • Fix Oracle auto-incrementing for good.
    At long last, the mystery of Oracle auto-incrementing identity columns has been laid to rest.

🐛 v2.6.16 Fix inplace syncs without a datetime column.

26 Nov 04:27
2fa0c35
Compare
Choose a tag to compare

v2.6.15 – v2.6.16

  • Fix inplace syncs without a datetime axis.
    A bug introduced by a performance optimization has been fixed. Inplace pipes without a datetime axis will skip searching for date bounds. Setting upsert to true will bypass this bug for previous releases.

  • Skip invoking get_sync_time() for pipes without a datetime axis.
    Invoking an instance connector's get_sync_time() method will now only occur when datetime is set.

  • Remove guess_datetime() check from SQLConnector.get_sync_time().
    Because sync times are only checked for pipes with a dedicated datetime column, the guess_datetime() check has been removed from the SQLConnector.get_sync_time() method.

  • Skip persisting default target to parameters.
    The default target table name will no longer be persisted to parameters. This helps avoid accidentally setting the wrong target table when copying pipes.

  • Default to "no" for syncing data when copying pipes.
    The action copy pipes will no longer sync data by default, instead requiring an explicit yes to begin syncing.

  • Fix the "Update query" button behavior on the Web Console.
    Existing but null keys are now accounted for when update a SQL pipe's query.

  • Fix another Oracle autoincrement edge case.
    Resetting the autoincrementing primary key value on Oracle will now behave as expected.

⚡️ v2.6.14 Speed up dtype enforcement, fix Oracle auto-incrementing IDs.

13 Nov 05:47
33a3828
Compare
Choose a tag to compare

v2.6.10 – v2.6.14

  • Improve datetime timezone-awareness enforcement performance.
    Datetime columns are only parsed for timezone awareness if the desired awareness differs. This drastically speeds up sync times.

  • Switch to tz_localize() when stripping timezone information.
    The previous method of using a lambda to replace individual tzinfo attributes did not scale well. Using tz_localize() can be vectorized and greatly speeds up syncs, especially with large chunks.

  • Add enforce_dtypes to Pipe.filter_existing().
    You may optionally enforce dtype information during filter_existing(). This may be useful when implementing custom syncs for instance connectors. Note this may impact memory and compute performance.

    import meerschaum as mrsm
    import pandas as pd
    
    pipe = mrsm.Pipe('a', 'b', instance='sql:local')
    pipe.sync([{'a': 1}])
    
    df = pd.DataFrame([{'a': '2'}])
    
    ### `enforce_dtypes=True` will suppress the differing dtypes warning.
    unseen, update, delta = pipe.filter_existing(df, enforce_dtypes=True)
    print(delta)
  • Fix query_df() for null parameters.
    This is useful for when you may use query_df() with only select_columns or omit_columns.

  • Fix autoincrementing IDs for Oracle SQL.

  • Enforce security settings for creating jobs.
    Jobs and remote actions will only be accessible to admin users when running with --secure (system:permissions:actions:non_admin in config).

⚡️ v2.6.7 Fix dtypes for new indices, improve metadata caching.

11 Nov 00:08
e317081
Compare
Choose a tag to compare

v2.6.6 – v2.6.8

  • Improve metadata performance when syncing.
    Syncs via the SQLConnector now cache schema and index metadata, speeding up transactions.

  • Fix upserts for MySQL / MariaDB.
    Upserts in MySQL and MariaDB now use ON DUPLICATE instead of REPLACE INTO.

  • Fix dtype detection for index columns.
    A bug where new index columns were incorrectly created as INT has been fixed.

  • Delete old keys when dropping Valkey pipes.
    Dropping a pipe from Valkey now clears all old index keys.

  • Fix timezone-aware enforcement bugs.