26 Sep 04:31

bmeares

922bca3

v2.0.1

Fix syncing bools within in-place SQL pipes.
SQL pipes may now sync bools in-place. For database flavors which lack native BOOLEAN support (e.g. sqlite, oracle, mysql), then the boolean columns must be stated in pipe.dtypes.
Fix an issue with multiple users managing jobs.
Extra validation was added to the web UI to allow for multiple users to interact with jobs.
Fix a minor formatting bug with highlight_pipes().
Improved validation logic was added to prevent incorrectly prepending the Pipe( prefix.
Hold back pydantic to <2.0.0
Pydantic 2 is supported in all features except --schedule. Until rocketry supports Pydantic 2, it will be held back.

Assets 2

24 Sep 22:59

bmeares

v2.0.0

1d21364

v2.0.0

Breaking Changes

Removed redundant Pipe.sync_time property.
Use pipe.get_sync_time() instead.
Removed SQLConnector.get_pipe_backtrack_minutes().
Use pipe.get_backtrack_interval() instead.
Replaced pipe.parameters['chunk_time_interval'] with pipe.parameters['verify']['chunk_minutes']
For better security and cohesiveness, the TimescaleDB chunk_time_interval value is now derived from the standard chunk_minutes value. This also means pipes with integer date axes will be created with a new default chunk interval of 1440 (was previously 100,000).
Moved choose_subaction() into meerschaum.actions.
This function is for internal use and as such should not affect any users.

Features

Added verify pipes and --verify.
The command mrsm verify pipes or mrsm sync pipes --verify will resync pipes' chunks with different rowcounts to catch any backfilled data.

import meerschaum as mrsm
foo = mrsm.Pipe(
    'foo', 'src',
    target = 'foo',
    columns = {'datetime': 'dt'},
    instance = 'sql:local'
)
docs = [
    {'dt': '2023-01-01'},
    {'dt': '2023-01-02'},
]
foo.sync(docs)

pipe = mrsm.Pipe(
    'sql:local', 'verify', 
    columns = {'datetime': 'dt'},
    parameters = {
        'query': f'SELECT * FROM "{foo.target}"'
    },
    instance = 'sql:local',
)
pipe.sync(docs) 

backfilled_docs = [
    {'dt': '2022-12-30'},
    {'dt': '2022-12-31'},
]
foo.sync(backfilled_docs)
mrsm.pprint(pipe.verify())
assert foo.get_rowcount() == pipe.get_rowcount()

Added deduplicate pipes and --deduplicate.
Running mrsm deduplicates pipes or mrsm sync pipes --deduplicate will iterate over pipes' entire intervals, chunking at the configured chunk interval (see pipe.get_chunk_interval() below) and clearing + resyncing chunks with duplicate rows.

If your instance connector implements deduplicate_pipe() (e.g. SQLConnector), then this method will override the default pipe.deduplicate().
```
pipe = mrsm.Pipe(
    'demo', 'deduplicate',
    columns = {'datetime': 'dt'},
    instance = 'sql:local',
)
docs = [
    {'dt': '2023-01-01'},
    {'dt': '2023-01-01'},
]
pipe.sync(docs)
print(pipe.get_rowcount())
# 2
pipe.deduplicate()
print(pipe.get_rowcount())
# 1
```

Added pyarrow support.
The dtypes enforcement system was overhauled to add support for pyarrow data types.

import meerschaum as mrsm
import pandas as pd

df = pd.DataFrame(
    [{'a': 1, 'b': 2.3}]
).convert_dtypes(dtype_backend='pyarrow')

pipe = mrsm.Pipe(
    'demo', 'pyarrow',
    instance = 'sql:local',
    columns = {'a': 'a'},
)
pipe.sync(df)

Added bool support.
Pipes may now sync DataFrames with booleans (even on Oracle and MySQL):

import meerschaum as mrsm
pipe = mrsm.Pipe(
    'demo', 'bools',
    instance = 'sql:local',
    columns = {'id': 'id'},
)
pipe.sync([{'id': 1, 'is_blue': True}])
assert 'bool' in pipe.dtypes['is_blue']

pipe.sync([{'id': 1, 'is_blue': False}])
assert pipe.get_data()['is_blue'][0] == False

Added preliminary dask support.
For example, you may now return Dask DataFrames in your plugins, pass into pipe.sync(), and pipe.get_data() now has the flag as_dask.

import meerschaum as mrsm
pipe = mrsm.Pipe(
    'dask', 'demo',
    columns = {'datetime': 'dt'},
    instance = 'sql:local',
)
pipe.sync([
    {'dt': '2023-01-01', 'val': 1},
    {'dt': '2023-01-02', 'val': 2},
    {'dt': '2023-01-03', 'val': 3},
])
ddf = pipe.get_data(as_dask=True)
print(ddf)
# Dask DataFrame Structure:
#                            dt             val
# npartitions=4
#                datetime64[ns]  int64[pyarrow]
#                           ...             ...
#                           ...             ...
#                           ...             ...
#                           ...             ...
# Dask Name: from-delayed, 5 graph layers

print(ddf.compute())
#           dt  val
# 0 2023-01-01    1
# 0 2023-01-02    2
# 0 2023-01-03    3

pipe2 = mrsm.Pipe(
    'dask', 'insert',
    columns = pipe.columns,
    instance = 'sql:local',
)
pipe2.sync(ddf)
assert pipe.get_data().to_dict() == pipe2.get_data().to_dict()

pipe.sync([{'dt': '2023-01-01', 'val': 10}])
pipe2.sync(pipe.get_data(as_dask=True))
assert pipe.get_data().to_dict() == pipe2.get_data().to_dict()

Added chunk_minutes to pipe.parameters['verify'].
Like pipe.parameters['fetch']['backtrack_minutes'], you may now specify the default chunk interval to use for verification syncs and iterating over the datetime axis.

import meerschaum as mrsm
pipe = mrsm.Pipe(
    'a', 'b',
    instance = 'sql:local',
    columns = {'datetime': 'dt'},
    parameters = {
        'verify': {
            'chunk_minutes': 2880,
        }
    },
)
pipe.sync([
    {'dt': '2023-01-01'},
    {'dt': '2023-01-02'},
    {'dt': '2023-01-03'},
    {'dt': '2023-01-04'},
    {'dt': '2023-01-05'},
])
chunk_bounds = pipe.get_chunk_bounds(bounded=True)
for chunk_begin, chunk_end in chunk_bounds:
    print(chunk_begin, '-', chunk_end)

# 2023-01-01 00:00:00 - 2023-01-03 00:00:00
# 2023-01-03 00:00:00 - 2023-01-05 00:00:00

Added --chunk-minutes, --chunk-hours, and --chunk-days.
You may override a pipe's chunk interval during a verification sync with --chunk-minutes (or --chunk-hours or --chunk-days).
```
mrsm verify pipes --chunk-days 3
```

Added pipe.get_chunk_interval() and pipe.get_backtrack_interval().
Return the timedelta (or int for integer datetimes) from verify:chunk_minutes and fetch:backtrack_minutes, respectively.

import meerschaum as mrsm
dt_pipe = mrsm.Pipe(
    'demo', 'intervals', 'datetime',
    instance = 'sql:local',
    columns = {'datetime': 'dt'},
)
print(dt_pipe.get_chunk_interval())
# 1 day, 0:00:00

int_pipe = mrsm.Pipe(
    'demo', 'intervals', 'int',
    instance = 'sql:local',
    columns = {'datetime': 'dt'},
    dtypes = {'dt': 'int'},
)
print(int_pipe.get_chunk_interval())
# 1440

Added pipe.get_chunk_bounds().
Return a list of begin and end values to use when iterating over a pipe's datetime axis.

from datetime import datetime
import meerschaum as mrsm
pipe = mrsm.Pipe(
    'demo', 'chunk_bounds',
    instance = 'sql:local',
    columns = {'datetime': 'dt'},
    parameters = {
        'verify': {
            'chunk_minutes': 1440,
        }
    },
)
pipe.sync([
    {'dt': '2023-01-01'},
    {'dt': '2023-01-02'},
    {'dt': '2023-01-03'},
    {'dt': '2023-01-04'},
])

open_bounds = pipe.get_chunk_bounds()
for i, (begin, end) in enumerate(open_bounds):
    print(f"Chunk {i}: ({begin}, {end})")

# Chunk 0: (None, 2023-01-01 00:00:00)
# Chunk 1: (2023-01-01 00:00:00, 2023-01-02 00:00:00)
# Chunk 2: (2023-01-02 00:00:00, 2023-01-03 00:00:00)
# Chunk 3: (2023-01-03 00:00:00, 2023-01-04 00:00:00)
# Chunk 4: (2023-01-04 00:00:00, None)

closed_bounds = pipe.get_chunk_bounds(bounded=True)
for i, (begin, end) in enumerate(closed_bounds):
    print(f"Chunk {i}: ({begin}, {end})")

# Chunk 0: (2023-01-01 00:00:00, 2023-01-02 00:00:00)
# Chunk 1: (2023-01-02 00:00:00, 2023-01-03 00:00:00)
# Chunk 2: (2023-01-03 00:00:00, 2023-01-04 00:00:00)

sub_bounds = pipe.get_chunk_bounds(
    begin = datetime(2023, 1, 1),
    end = datetime(2023, 1, 3),
)
for i, (begin, end) in enumerate(sub_bounds):
    print(f"Chunk {i}: ({begin}, {end})")

# Chunk 0: (2023-01-01 00:00:00, 2023-01-02 00:00:00)
# Chunk 1: (2023-01-02 00:00:00, 2023-01-03 00:00:00)

Added --bounded to verification syncs.
By default, verify pipes is unbounded, meaning it will sync values beyond the existing minimum and maximum datetime values. Running a verification sync with --bounded will bound the search to the existing datetime axis.
```
mrsm sync pipes --verify --bounded
```
Added pipe.get_num_workers().
Return the number of concurrent threads to be used with this pipe (with respect to its instance connector's thread safety).

Added select_columns and omit_columns to pipe.get_data().
In situations where not all columns are required, you can now either specify which columns you want to include (select_columns) and which columns to filter out (omit_columns). You may pass a list of columns or a single column, and the value '*' for select_columns will be treated as None (i.e. SELECT *).

pipe = mrsm.Pipe('a', 'b', 'c', instance='sql:local')
pipe.sync([{'a': 1, 'b': 2, 'c': 3}])

pipe.get_data(['a', 'b'])
#    a  b
# 0  1  2

pipe.get_data('*', 'b')
#    a  c
# 0  1  3

pipe.get_data(None, ['a', 'c'])
#    b
# 0  2

pipe.get_data(omit_columns=['b', 'c'])
#    a
# 0  1

pipe.get_data(select_columns=['c', 'a'])
#    c  a
# 0  3  1

Replace daemoniker with python-daemon.
python-daemon is a well-maintained and well-behaved daemon process library. However, this migration removes Windows support for background jobs (which was never really fully supported already, so no harm there).
Added pause jobs.
In addition to start jobs and stop jobs, the command pause jobs will s...

Assets 2

0 Join discussion

29 Aug 13:20

bmeares

v1.7.4

0f97fbc

v1.7.4

v1.7.3 – v1.7.4

Fix an issue with the local stack healthcheck.
Due to some edge cases, the local stack docker-compose.yaml file would not be correctly formatted until edit config had been executed. This patch ensures the files are synced with each invocation of stack.
Fix an issue when running the local stack with non-default ports.
Initializing a local stack with a different database port (e.g. 5433) now routes correctly within the Docker compose network (now patching to internal port to 5432).
Fix upgrade mrsm behavior.
Recent changes to stack broke the automatic stack pull within mrsm upgrade mrsm.

Assets 2

12 Aug 14:21

bmeares

v1.7.2

f8b810f

v1.7.2

Fix role "root" does not exist from stack logs.
Although the healthcheck was working as expected, the log output was filled with Error FATAL: role "root" does not exist. These errors have been fixed.
Fix MRSM_CONFIG behavior when running start api --production.
Starting the Web API through gunicorn (i.e. --production) now respects MRSM_CONFIG. This is useful for running stack up with non-default credentials.
Added --insecure as an alias for --no-auth.
To compliment the newly added --secure flag, starting the Web API with --insecure will bypass authentication.
Bump default TimescaleDB version to PG15.
The default TimescaleDB version for the Meerschaum stack is now latest-pg15-oss.
Pass sysargs to docker compose via stack
This patch allows for jumping into the api container:
```
mrsm stack exec api bash
```
Added the API endpoint /healthcheck.
This is used to determine reachability and the health of the local stack.

Assets 2

11 Aug 04:29

bmeares

v1.7.1

e95ae71

v1.7.1

v1.7.0 – v1.7.1

Remove get_backtrack_data() for instance connectors.
If provided, this method will still override the new generic implementation.
Add --keyfile and --certfile support.
When starting the Web API, you may now run via HTTPS with --keyfile and --certfile. Older releases required the keys to be set in MRSM_CONFIG. This also brings SSL support for --production (Gunicorn).
Add the Webterm to the Web Console.
At long last, the webterm is embedded within the web console and is accessible from the Web API at the endpoint /webterm. You must provide your active, authorized session ID to access to the Webterm.
Add --secure to start api.
Starting the Web API with --secure will now disallow actions from non-administrators. This is recommend for shared deployments.
Fixed the registration page on the Web API.
Users should now be able to create accounts from Dockerized deployments.
Held back dash-extensions
The recent 1.0.2+ releases have shipped some broken changes, so dash-extensions is held back to 1.0.1 until newer releases have been tested.
Allow for digits in environment connectors.
Connectors defined as environment variables may now have digits in the type.
```
export MRSM_A2B_TEST='{"foo": "bar"}'
```
Fixed stack on Windows.
Fixed a false error with background jobs.
Increased the minimum password length to 5.

Assets 2

14 Jul 15:38

bmeares

v1.6.19

71d596f

v1.6.19

Add Pydantic v2 support
The only feature which requires Pydantic v1 is the --schedule flag, which will throw a warning with a hint to install an older version. The underlying libraries for this feature should have Pydantic v2 support merged soon.
Bump dependencies.
This patch bumps the minimum required versions for typing-extensions, rich, prompt-toolkit, rocketry, uvicorn, websockets, and fastapi and loosens the minimum version of pydantic.
Fix shell formatting on Windows 10.
Some edge case issues have been patched for older versions of Windows.

Assets 2

16 Jun 21:32

bmeares

v1.6.15

7c07b2b

v1.6.15

Sync chunks in the copy pipes action.
This will help with large out-of-memory pipes.

Assets 2

05 Jun 03:35

bmeares

v1.6.14

38cf2c0

v1.6.14 (v1.5.23 for Python 3.7)

v1.6.14

Added healthchecks to mrsm stack up.
The internal Docker Compose file for mrsm stack was bumped to version 3.9, and secrets were replaced with environment variable references.
Fixed --no-auth when starting the API.
The command mrsm start api --no-auth now correctly handles sessions.

Assets 2

02 Jun 18:28

bmeares

v1.6.13

50c6f66

v1.6.13 (v1.5.22 for Python 3.7)

v1.6.13

Remove \\u0000 from strings when inserting into PostgreSQL.
Replace both \0 and \\u0000 with empty strings when streaming rows into PostgreSQL.

Thank you @SheepIsGoat for your contribution!

Contributors

SheepIsGoat

Assets 2

30 May 23:44

bmeares

v1.6.12

13aae02

v1.6.12 (v1.5.21 for Python 3.7)

v1.6.12

Allow nested chunk generators.
This patch more gracefully handles labels for situations with nested chunk generators and adds and explicit test for this scenario.

import meerschaum as mrsm
pipe = mrsm.Pipe('foo', 'bar', instance='sql:memory')

docs = [{'color': 'red'}, {'color': 'green'}]
num_chunks = 3
num_batches = 2

def build_chunks():
    return (
        [
            {'chunk_ix': chunk_ix, **doc}
            for doc in docs
        ]
        for chunk_ix in range(num_chunks)
    )

batches = (
    (
        [
            {'batch_ix': batch_ix, **doc}
            for doc in chunk
        ]
        for chunk in build_chunks()
    )
    for batch_ix in range(num_batches)
)

pipe.sync(batches)
print(pipe.get_data())
#     batch_ix  chunk_ix  color
# 0          0         0    red
# 1          0         0  green
# 2          0         1    red
# 3          0         1  green
# 4          0         2    red
# 5          0         2  green
# 6          1         0    red
# 7          1         0  green
# 8          1         1    red
# 9          1         1  green
# 10         1         2    red
# 11         1         2  green

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2.0.1

v2.0.0

v1.7.3 – v1.7.4

v1.7.2

v1.7.0 – v1.7.1

v1.6.14

v1.6.13

Contributors

v1.6.12

Releases: bmeares/Meerschaum

v2.0.1

v2.0.1

v2.0.0

v2.0.0

v1.7.4

v1.7.3 – v1.7.4

v1.7.2

v1.7.2

v1.7.1

v1.7.0 – v1.7.1

v1.6.19

v1.6.15

v1.6.14 (v1.5.23 for Python 3.7)

v1.6.14

v1.6.13 (v1.5.22 for Python 3.7)

v1.6.13

Contributors

v1.6.12 (v1.5.21 for Python 3.7)

v1.6.12