Skip to content

Releases: bmeares/Meerschaum

v1.5.10

27 Feb 06:20
Compare
Choose a tag to compare

v1.5.8 – v1.5.10

  • Infer JSON columns from the first first non-null value.
    When determining complex columns (dictionaries or lists), the first non-null value of the dataframe is checked rather than the first row only. This accounts for documents which contain variable keys in the same sync, e.g.:

    import meerschaum as mrsm
    pipe = mrsm.Pipe('a', 'b')
    pipe.sync([
        {'a': {'b': 1}},
        {'c': {'d': 2}},
    ])
  • Fix a bug when reconstructing JSON columns.
    When rebuilding JSON values after merging, a check is first performed if the value is in fact a string (sometimes NULLS slip in).

  • Increase the timeout when determining Python versions.
    This fixes some difficult-to-reproduce bugs on Windows.

v1.5.7

13 Feb 23:17
ca90650
Compare
Choose a tag to compare

v1.5.7

  • Replace ast.literal_eval() with json.loads() when filtering JSON columns.
    This patch replaces the use of str and ast.literal_eval() with json.dumps() and json.loads() to preserve accuracy.

  • Fix a subtle bug with subprocesses.
    The function run_python_package() now better handles environment passing and raises a more verbose warning when something goes wrong.

  • Allow columns with 'create' in the name.
    A security measure previously disallowed certain keywords when sanitizing input. Now columns are allowed to contain certain keywords.

v1.5.6

16 Jan 21:22
a3202fe
Compare
Choose a tag to compare

v1.5.3 – v1.5.6

  • Pipes now support syncing dictionaries and lists.
    Complex columns (dicts or lists) will now be preserved:

    import meerschaum as mrsm
    pipe = mrsm.Pipe('a', 'b')
    pipe.sync([{'a': {'b': 1}}])
    df = pipe.get_data()
    print(df['a'][0])
    # {'b': 1}

    You can also force strings to be parsed by setting the data type to json:

    import meerschaum as mrsm
    pipe = mrsm.Pipe(
        'foo', 'bar',
        columns = {'datetime': 'id'},
        dtypes = {'data': 'json', 'id': 'Int64'},
    )
    docs = [{'id': 1, 'data': '{"foo": "bar"}'}]
    pipe.sync(docs)
    df = pipe.get_data()
    print(df['data'][0])
    # {'foo': 'bar'}

    For PostgreSQL-like databases (e.g. TimescaleDB), this is stored as JSONB under the hood. For all others, it's stored as the equivalent for TEXT.

  • Fixed determining the version when installing plugins.
    Like the required list, the __version__ string must be explicitly set in order for the correct version to be determined.

  • Automatically cast postgres to postgresql
    When a SQLConnector is built with a flavor of postgres, it will be automatically set to postgresql.

v1.5.2

09 Jan 09:53
c773463
Compare
Choose a tag to compare

v1.5.0 – v1.5.2

  • Pipes may now use integers for the datetime column.
    If you use an auto-incrementing integer as your primary key, you may now use that column as your pipe's datetime column, just specify the dtype as an Int64:

    import meerschaum as mrsm
    pipe = mrsm.Pipe(
        'foo', 'bar',
        instance = 'sql:memory',
        columns = {
            'datetime': 'id',
        },
        dtypes = {
            'id': 'Int64',
        },
    )
    pipe.sync([{'id': 1, 'foo': 'bar'}])
    pipe.sync([{'id': 2, 'foo': 'baz'}])

    This applies the same incremental range filtering logic as is normally done on the datetime axis.

  • Allow for multiple plugins directories.
    You may now set multiple directories for MRSM_PLUGINS_DIR. All of the plugins contained in each directory will be symlinked together into a single plugins namespace. To do this, just set MRSM_PLUGINS_DIR to a JSON-encoded list:

    export MRSM_PLUGINS_DIR='["./plugins_1", "./plugins_2"]'
  • Better Windows support.
    At long last, the color issues plaguing Windows users have finally been resolved. Additionally, support for background jobs has been fixed on Windows, though the daemonization library I use is pretty hacky and doesn't make for the smoothest experience. But at least it works now!

  • Fixed unsafe TAR extraction.
    A PR about unsafe use of tar.extractall() brought this issue to light.

  • Fixed the blank logs bug in show logs.
    Backtracking a couple lines before following the rest of the logs has been fixed.

  • Requirements may include brackets.
    Python packages listed in a plugin's requirements list may now include brackets (e.g. meerschaum[api]).

  • Enforce 1000 row limit in SQLConnector.to_sql() for SQLite.
    When inserting rows, the chunksize of 1000 is enforced for SQLite (was previously enforced only for reading).

  • Patch parameters from --params in edit pipes and register pipes.
    When editing or registering pipes, the value of --params will now be patched into the pipe's parameters. This should be very helpful when scripting.

  • Fixed edit users.
    This really should have been fixed a long time ago. The action edit users was broken due to a stray import left over from a major refactor.

  • Fixed a regex bug when cleaning up packages.

  • Removed show gui and show modules.

v1.4.14

05 Dec 04:28
c3094a7
Compare
Choose a tag to compare

v1.4.14

  • Added flag temporary to Pipe (and --temporary).
    Pipes built with temporary=True, will not create instance tables (pipes, users, and plugins) or be able to modify registration. This is particularly useful when creating pipes from existing tables when automatic registration is not desired.

    import meerschaum as mrsm
    import pandas as pd
    conn = mrsm.get_connector('sql:temp', uri='postgresql://user:pass@localhost:5432/db')
    
    ### Simulating an existing table.
    table_name = 'my_table'
    conn.to_sql(
        pd.DataFrame([{'id_column': 1, 'value': 1.0}]),
        name = table_name,
    )
    
    ### Create a temporary pipe with the existing table as its target.
    pipe = mrsm.Pipe(
        'foo', 'bar',
        target = table_name,
        temporary = True,
        instance = conn,
        columns = {
            'id': 'id_column',
        },
    )
    
    docs = [
        {
            "id_column": 1,
            "value": 123.456,
            "new_column": "hello, world!",
        },
    ]
    
    ### Existing table `my_table` is synced without creating other tables
    ### or affecting pipes' registration.
    pipe.sync(docs)
  • Fixed potential security of public instance tables.
    The API now refuses to sync or serve data if the target is a protected instance table (pipes, users, or plugins).

  • Added not-null check to pipe.get_sync_time().
    The datetime column should never contain null values, but just in case, pipe.get_sync_time() now passes a not-null check to params for the datetime column.

  • Removed prompt for value from pipe.bootstrap().
    The prompt for an optional value column has been removed from the bootstrapping wizard because pipe.columns is now largely used as a collection of indices rather than the original purpose of meta-columns.

  • Pass --debug and other flags in copy pipes.
    Command line flags are now passed to the new pipe when copying an existing pipe.

v1.4.13

22 Nov 01:56
edc4d23
Compare
Choose a tag to compare

v1.4.12 – v1.4.13

  • Fixed an issue when syncing empty DataFrames (#95).
    When syncing an empty list of documents, Pipe.filter_existing() would trigger pulling the entire table into memory. This patch adds a check if the dataframe is empty.

  • Allow the datetime column to be omitted in the bootstrap wizard.
    Now that the datetime index is optional, the bootstrapping wizard allows users to skip this index.

  • Fixed a small issue when syncing to MySQL.
    Due to the addition of MySQL 5.7 support in v1.4.11, a slight edge case arose which broke SQL definitions. This patch fixes MySQL behavior when a WHERE clause is present in the definition.

v1.4.11

19 Nov 04:18
a9200a7
Compare
Choose a tag to compare

v1.4.11

  • Add support for older versions of MySQL.
    The WITH keyword for CTE blocks was not introduced until MySQL 8.0. This patch uses the older syntax for older versions of MySQL and MariaDB. MySQL 5.7 was added to the test suite.

  • Allow for any iterable in items_str()
    If an iterable other than a list is passed to items_str(), it will convert to a list before building the string:

    from meerschaum.utils.misc import items_str
    print(items_str({'apples': 1, 'bananas': 2}, quotes=False)
    # apples and bananas
  • Fixed an edge case with datetime set to None.
    This patch will ignore the datetime index even if it was set explicitly to None.

  • Added Pipe.children.
    To complement Pipe.parents, setting the parameters key children to a list of pipes' keys will be treated the same as Pipe.parents:

    import meerschaum as mrsm
    pipe = mrsm.Pipe(
        'a', 'b',
        parameters = {
            'children': [
                {
                    'connector': 'a',
                    'metric': 'b',
                    'location': 'c',
                },
            ]
        }
    )
    print(pipe.children)
    # [Pipe('a', 'b', 'c')]
  • Added support for type:label syntax in mrsm.get_connector().
    The factory function mrsm.get_connector() expects the type and label as two arguments, but this patch allows for passing a single string with both arguments:

    import meerschaum as mrsm
    print(mrsm.get_connector('sql:local'))
    # sql:local
  • Fixed more edge case bugs.
    For example, converting to Int64 sometimes breaks with older versions of pandas. This patch adds a workaround.

v1.4.10

18 Nov 03:45
656279f
Compare
Choose a tag to compare

v1.4.10

  • Fixed an issue with syncing background jobs.
    The --name flag of background jobs with colliding with the name keyword argument of SQLConnector.to_sql().

  • Fixed a datetime bounding issue when datetime index is omitted.
    If the minimum datetime value of the incoming dataframe cannot be determined, do not bound the get_data() request.

  • Keep existing parameters when registering plugin pipes.
    When a pipe is registered with a plugin as its connector, the return value of the register() function will be patched with the existing in-memory parameters.

  • Fixed a data type syncing issue.
    In cases where fetched data types do not match the data types in the pipe's table (e.g. automatic datetime columns), a bug has been patched to ensure the correct data types are enforced.

  • Added Venv to the root namespace.
    Now you can access virtual environments directly from mrsm:

    import meerschaum as mrsm
    
    with mrsm.Venv('noaa'):
        import pandas as pd

v1.4.9

13 Nov 04:59
e3c3cf3
Compare
Choose a tag to compare

v1.4.9

  • Fixed in-place syncs for aggregate queries.
    In-place SQL syncs which use aggregation functions are now handled correctly. This version addresses differences in column types between backtrack and new data. For example, the following query will now be correctly synced:

    WITH days_src AS (
      SELECT *, DATE_TRUNC('day', "datetime") AS days
      FROM plugin_stress_test
    )
    SELECT days, AVG(val) AS avg_value
    FROM days_src
    GROUP BY days
  • Activate virtual environments for custom instance connectors.
    All pipe methods now activate virtual environments for custom instance connectors.

  • Improved database connection performance.
    Cold connections to a SQL database have been sped up by replacing sqlalchemy_utils with handwritten logic (JSON for PostgreSQL-like and SQLite).

  • Fixed an issue with virtual environment verification in a portable environment.
    The portable build has been updated to Python 3.9.15, and this patch includes a check to determine the known site-package path for a virtual environment of None instead of relying on the default user site-packages directory.

  • Fixed some environment warnings when starting the API

v1.4.8

04 Nov 04:21
e783609
Compare
Choose a tag to compare

v1.4.5 – v1.4.8

  • Bugfixes and stability improvements.
    These versions included several bugfixes, such as patching --skip-check-existing for in-place syncs and fixing the behavior of --params (build_where()).