Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: fox-it/flow.record
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: 3.9
Choose a base ref
...
head repository: fox-it/flow.record
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: main
Choose a head ref

Commits on Mar 16, 2023

  1. Move to tox4 and pure pyproject packaging (#60)

    (DIS-1750)
    pyrco authored Mar 16, 2023
    Copy the full SHA
    cb20bb9 View commit details

Commits on Mar 17, 2023

  1. Always write Avro header (#62)

    * Always write Avro header
    
    * Added avro and test extras to pyproject.toml
    
    * Also skip lz4 and zstandard tests when running under PyPy due to incompatibilities
    
    * Remove avro[snappy] from test due to missing wheels
    
    * Add unit test
    
    ---------
    
    Co-authored-by: Yun Zheng Hu <hu@fox-it.com>
    Schamper and yunzheng authored Mar 17, 2023
    Copy the full SHA
    d182309 View commit details

Commits on Apr 13, 2023

  1. Fix invalid expression when parsing datetime type in Avro adapter (#63)

    Co-authored-by: Jan Starke <jan.starke@t-systems.com>
    janstarke and Jan Starke authored Apr 13, 2023
    Copy the full SHA
    059c300 View commit details

Commits on Apr 26, 2023

  1. Copy the full SHA
    1db1743 View commit details

Commits on May 8, 2023

  1. Copy the full SHA
    1de369f View commit details

Commits on May 15, 2023

  1. Improve the path field type (#66)

    The path class will now allow being instantiated using multiple path
    parts, making it behave more like its parent pathlib.PurePath class.
    
    The inference on whether to return a windows_path or posix_path class
    instance is
    now done based on the first path part it encounters that is an instance
    of pathlib.PurePath.
    
    For path parts that are a custom subclass of pathlib.PurePath (so not an
    instance of PureWindowsPath or PurePosixPath), the presence of a '\' as
    either separator or alternative separator will result in returning a
    windows_path class instance.
    
    (DIS-1977)
    pyrco authored May 15, 2023
    Copy the full SHA
    4b51ec6 View commit details

Commits on May 16, 2023

  1. Update README.md

    Add Requirements section with URL to the supported Python versions.
    
    Update links to main documentation.
    
    (DIS-1986)
    martinvanhensbergen authored May 16, 2023
    Copy the full SHA
    ee0cb69 View commit details

Commits on May 23, 2023

  1. Add documentation testing tooling (#69)

    * Add documentation testing tooling
    
    Add tox commands to generate API documentation for previewing in browser and automatic checking of broken URLs.
    
    (DIS-1888)
    
    * Remove unnecessary newline from tox.ini
    
    Make the formatting consistent with the other sections.
    martinvanhensbergen authored May 23, 2023
    Copy the full SHA
    a1a94b2 View commit details

Commits on May 31, 2023

  1. Skip incompatible unit tests on Windows (#71)

    (DIS-1765)
    cecinestpasunepipe authored and pyrco committed May 31, 2023
    Copy the full SHA
    6aa3aa5 View commit details

Commits on Jun 16, 2023

  1. Add useful trove classifiers to pyproject.toml (#74)

    (DIS-1502)
    pyrco authored Jun 16, 2023
    Copy the full SHA
    6c1da47 View commit details

Commits on Jun 27, 2023

  1. Add publishing step (#75)

    Miauwkeru authored Jun 27, 2023
    Copy the full SHA
    549d118 View commit details

Commits on Jul 21, 2023

  1. Add hash method to datetime fieldtype (#77)

    * Add hash method to datetime fieldtype
    
    (DIS-2036)
    Poeloe authored Jul 21, 2023
    Copy the full SHA
    d52f2b5 View commit details

Commits on Aug 1, 2023

  1. Add --skip flag to rdump (#76)

    Adds the ability to skip a number of records when reading or writing records using rdump.
    
    Co-authored-by: Yun Zheng Hu <hu@fox-it.com>
    MaxGroot and yunzheng authored Aug 1, 2023
    Copy the full SHA
    464914e View commit details

Commits on Aug 24, 2023

  1. Make datetime fieldtypes timezone aware (#78)

    - Record datetime fields are now offset-aware by default
    - Naive datetime fields are converted to UTC
    - Support for packing/unpacking aware datetimes
    
    Note that comparing to naive datetime objects will now break and is also in line with default Python behaviour.
    
    To ensure uniform datetime field output they are always displayed in UTC. 
    To use a different display timezone you can set the environment variable `FLOW_RECORD_TZ`. Examples:
    
    - `FLOW_RECORD_TZ=UTC` to display datetime fields in UTC, this is the default
    - `FLOW_RECORD_TZ=Europe/Amsterdam` to display datetime fields in local time of the Netherlands
    - `FLOW_RECORD_TZ=NONE` to disable the datetime display normalisation
    
    ---------
    
    Co-authored-by: Erik Schamper <1254028+Schamper@users.noreply.github.com>
    yunzheng and Schamper authored Aug 24, 2023
    Copy the full SHA
    7cc4440 View commit details
  2. Copy the full SHA
    ddea907 View commit details

Commits on Aug 25, 2023

  1. Add path type __eq__ and __repr__ QOL changes (#79)

    Co-authored-by: Yun Zheng Hu <hu@fox-it.com>
    Schamper and yunzheng authored Aug 25, 2023
    Copy the full SHA
    b2818b0 View commit details
  2. Add GitHub workflow to test extra compatibility (#81)

    As flow.record still targets Python 3.7 as a minimum, we also test:
    
    - Python 3.7
    - Python 3.8
    - Windows, Python 3.9+
    yunzheng authored Aug 25, 2023
    Copy the full SHA
    8a1c74f View commit details
  3. Copy the full SHA
    6358ba3 View commit details

Commits on Sep 13, 2023

  1. Support file-like inputs for RecordReader (#59)

    Peek into the file to find the right adapter by checking the file magic
    
    ---------
    
    Co-authored-by: Max Groot <max.groot@fox-it.com>
    Co-authored-by: Yun Zheng Hu <hu@fox-it.com>
    Co-authored-by: Erik Schamper <1254028+Schamper@users.noreply.github.com>
    4 people authored Sep 13, 2023
    Copy the full SHA
    2e2eb62 View commit details

Commits on Oct 11, 2023

  1. Add behaviour to always use datetime.UTC if there is no zoneinfo avai…

    …lable (#86)
    
    Add behaviour to always use datetime.UTC if there is no zoneinfo available
    
    ---------
    
    Co-authored-by: Yun Zheng Hu <hu@fox-it.com>
    Miauwkeru and yunzheng authored Oct 11, 2023
    Copy the full SHA
    ccfa214 View commit details

Commits on Oct 13, 2023

  1. Drop Python 3.7 support (#88)

    Python 3.7 is EOL since 27 June 2023.
    Minimal supported Python version for flow.record is now 3.8.
    
    Windows tests are now tested via the main github dissect-workflow-template.
    yunzheng authored Oct 13, 2023
    Copy the full SHA
    f0a2608 View commit details

Commits on Oct 16, 2023

  1. Speedup parsing of datetime fieldtypes initialization by string (#87)

    This change mainly removes the use of expensive regexes and exception handling when parsing datetime strings.
    It speeds up parsing significantly on Python versions below 3.11. 
    
    ---------
    
    Co-authored-by: Erik Schamper <1254028+Schamper@users.noreply.github.com>
    yunzheng and Schamper authored Oct 16, 2023
    Copy the full SHA
    53c744b View commit details
  2. Copy the full SHA
    ecbd912 View commit details

Commits on Oct 26, 2023

  1. Fix str() and repr() in selectors (#93)

    (DIS-2565)
    Schamper authored Oct 26, 2023
    Copy the full SHA
    fdcecba View commit details
  2. Update elastic.py adapter (#92)

    Fix AttributeError if invalid uri is given, add `verify_certs` flag to optional arguments.
    Also adds `hash_record` argument for making every document unique.
    0xbart authored Oct 26, 2023
    Copy the full SHA
    6144cf4 View commit details

Commits on Oct 27, 2023

  1. Fix RecordReader not reading from stdin by default (#94)

    Calling `RecordReader()` without arguments should always default to stdin
    yunzheng authored Oct 27, 2023
    Copy the full SHA
    9cad89f View commit details

Commits on Nov 15, 2023

  1. Add SQLite adapter (#90)

    This adds support for reading from and writing to SQLite database via `sqlite://`.
    Columns are dynamically added if a RecordDescriptor changes for the table.
    
    ---------
    
    Co-authored-by: Erik Schamper <1254028+Schamper@users.noreply.github.com>
    yunzheng and Schamper authored Nov 15, 2023
    Copy the full SHA
    67f71d5 View commit details

Commits on Nov 17, 2023

  1. Copy the full SHA
    64263bc View commit details

Commits on Nov 20, 2023

  1. Refactor and improve CSV adapter (#96)

    * Refactor and improve CSV adapter
    
    This change allows the CSV adapter to:
     - ability to read reserved fields (eg: _generated, _source, etc)
     - use `normalize_fieldname` to normalize field names in flow.record
     - deduce format of csv file automatically by using `csv.Sniffer`
    yunzheng authored Nov 20, 2023
    Copy the full SHA
    676d61c View commit details

Commits on Dec 20, 2023

  1. Use reprlib to limit the warning message (#101)

    The warning message could get very long (single line) due to extra data
    in msgpack parsing. For example due to corrupt file, or incorrect use.
    
    By using reprlib we limit the length of the warning message.
    yunzheng authored Dec 20, 2023
    Copy the full SHA
    a9808ec View commit details

Commits on Jan 4, 2024

  1. Add Python 3.12 compatibility for path fieldtype (#91)

    ---------
    
    Co-authored-by: pyrco <105293448+pyrco@users.noreply.github.com>
    Co-authored-by: Schamper <1254028+Schamper@users.noreply.github.com>
    3 people authored Jan 4, 2024
    Copy the full SHA
    58d5915 View commit details
  2. Copy the full SHA
    9670a38 View commit details

Commits on Jan 23, 2024

  1. Copy the full SHA
    9a6829b View commit details

Commits on Feb 1, 2024

  1. Copy the full SHA
    09ed812 View commit details
  2. Copy the full SHA
    67a36e8 View commit details

Commits on Feb 19, 2024

  1. Add DuckDB adapter (#97)

    This adds DuckDB reader and writer support. Because DuckDB is mostly
    compatible with the SQLite API, we just subclass from the existing
    SQLite adapter with minimal changes.
    
    Changes done the SQLite adapter and tests are:
    
     * backtick quoting does not work in DuckDB so we use double quotes now
     * DuckDB has strict typing so some tests are not applicable and skipped
     * `executescript()` does not exist in DuckDB so we avoid using it
     * Switched SQLite to `isolation_level=None` for manual transactions
    
    ---------
    
    Co-authored-by: Erik Schamper <1254028+Schamper@users.noreply.github.com>
    yunzheng and Schamper authored Feb 19, 2024
    Copy the full SHA
    2f52023 View commit details

Commits on Feb 20, 2024

  1. Move lru_cache definitions to __init__ (#109)

    Using the lru_cache decorators on class methods, the ones that have a reference to `self`,
    will also cache self. So we move it to the __init__ of the class
    
    (DIS-2913)
    
    ---------
    
    Co-authored-by: Erik Schamper <1254028+Schamper@users.noreply.github.com>
    Miauwkeru and Schamper authored Feb 20, 2024
    Copy the full SHA
    abae08c View commit details
  2. Copy the full SHA
    523b96c View commit details

Commits on Mar 7, 2024

  1. Make records hashable (#107)

    ---------
    
    Co-authored-by: Yun Zheng Hu <hu@fox-it.com>
    JSCU-CNI and yunzheng authored Mar 7, 2024
    Copy the full SHA
    a3b5310 View commit details

Commits on Mar 15, 2024

  1. Copy the full SHA
    5933d34 View commit details

Commits on Mar 27, 2024

  1. Copy the full SHA
    4d267dd View commit details

Commits on Mar 28, 2024

  1. Copy the full SHA
    256733d View commit details

Commits on Apr 5, 2024

  1. Add ignore_fields_for_comparison() context manager (#115)

    It behaves the same as set_ignored_fields_for_comparison() but only for
    the duration of the context manager.
    yunzheng authored Apr 5, 2024
    Copy the full SHA
    5b9e62a View commit details

Commits on Apr 11, 2024

  1. Copy the full SHA
    c1c3abf View commit details
  2. Copy the full SHA
    9eb1557 View commit details

Commits on Apr 12, 2024

  1. Add support for Splunk HTTP Event Collector (#85)

    ---------
    
    Co-authored-by: Erik Schamper <1254028+Schamper@users.noreply.github.com>
    Co-authored-by: Yun Zheng Hu <hu@fox-it.com>
    3 people authored Apr 12, 2024
    Copy the full SHA
    4a47670 View commit details

Commits on May 3, 2024

  1. Add a command type (#118)

    This command type splits an executable (path) from its arguments (list).
    There is a best effort detection for windows type commands.
    This is because windows executables handles its own argument parsing.
    
    (DIS-2977)
    Miauwkeru authored May 3, 2024
    Copy the full SHA
    e0586ef View commit details

Commits on May 15, 2024

  1. Add metadata fields to elastic adapter (#121)

    This adds metadata fields to the elastic adapter and repairs `elastic+[PROTOCOL]://` behaviour.
    It also enables users to authenticate to Elasticsearch with an API key.
    
    You can now write arbitrary metadata to the `document._source._record_metadata` dict using the following syntax:
    
    rdump -w "elastic+https://localhost:9200?_meta_foo=bar"
    
    This will result in the following `_record_metadata` dict:
    
    {
        ...
        "foo": "bar"
    }
    
    ---------
    
    Co-authored-by: Yun Zheng Hu <hu@fox-it.com>
    JSCU-CNI and yunzheng authored May 15, 2024
    Copy the full SHA
    43a5656 View commit details

Commits on May 16, 2024

  1. Add support for empty value in path fieldtype (#122)

    * Add support for empty value in path fieldtype
    
    Normally an empty path would be normalized to a "." (dot) character.
    This change allows you to initialize a path field with an empty string.
    This is useful to represent a path that is empty.
    
    Fixes DIS-2557
    
    * Also initialise subclass
    
    * Move empty_path attribute to __new__
    
    * Fix comparison between windows and posix classes
    yunzheng authored May 16, 2024
    Copy the full SHA
    0865b50 View commit details

Commits on May 20, 2024

  1. Fix ValueError: I/O operation on closed file during tests (#123)

    Sometimes the `stdout` file object is closed by `flow.record` internals as it is sometimes mocked and swapped by `pytest` during tests which in turn can confuse `is_stdout()` to return `False` causing the file to be closed.
    
    This is now fixed by adding two custom methods for getting the stdio streams:
    
     * `flow.record.utils.get_stdout()`
     * `flow.record.utils.get_stdin()`
    
    These methods are the preferred way to get the stdio streams as they also set an extra attribute on the returned file object that is checked by `is_stdout()`.
    
    ---------
    
    Co-authored-by: Erik Schamper <1254028+Schamper@users.noreply.github.com>
    yunzheng and Schamper authored May 20, 2024
    Copy the full SHA
    a8fd59d View commit details
Showing with 5,454 additions and 1,748 deletions.
  1. +6 −0 .git-blame-ignore-revs
  2. +24 −0 .github/pull_request_template.md
  3. +30 −2 .github/workflows/dissect-ci.yml
  4. +3 −1 .gitignore
  5. +1 −1 MANIFEST.in
  6. +8 −2 README.md
  7. +35 −31 flow/record/__init__.py
  8. +18 −28 flow/record/adapter/__init__.py
  9. +12 −5 flow/record/adapter/archive.py
  10. +37 −26 flow/record/adapter/avro.py
  11. +16 −8 flow/record/adapter/broker.py
  12. +45 −21 flow/record/adapter/csvfile.py
  13. +56 −0 flow/record/adapter/duckdb.py
  14. +113 −16 flow/record/adapter/elastic.py
  15. +22 −11 flow/record/adapter/jsonfile.py
  16. +44 −10 flow/record/adapter/line.py
  17. +17 −8 flow/record/adapter/mongo.py
  18. +12 −5 flow/record/adapter/split.py
  19. +250 −32 flow/record/adapter/splunk.py
  20. +247 −0 flow/record/adapter/sqlite.py
  21. +22 −13 flow/record/adapter/stream.py
  22. +18 −12 flow/record/adapter/text.py
  23. +114 −27 flow/record/adapter/xlsx.py
  24. +380 −209 flow/record/base.py
  25. +4 −0 flow/record/exceptions.py
  26. +447 −197 flow/record/fieldtypes/__init__.py
  27. +2 −0 flow/record/fieldtypes/credential.py
  28. +5 −4 flow/record/fieldtypes/net/__init__.py
  29. +41 −21 flow/record/fieldtypes/net/ip.py
  30. +35 −41 flow/record/fieldtypes/net/ipv4.py
  31. +2 −0 flow/record/fieldtypes/net/tcp.py
  32. +2 −0 flow/record/fieldtypes/net/udp.py
  33. +28 −22 flow/record/jsonpacker.py
  34. +35 −28 flow/record/packer.py
  35. +113 −146 flow/record/selector.py
  36. +77 −60 flow/record/stream.py
  37. +18 −15 flow/record/tools/geoip.py
  38. +32 −12 flow/record/tools/rdump.py
  39. +65 −32 flow/record/utils.py
  40. +3 −0 flow/record/whitelist.py
  41. +123 −7 pyproject.toml
  42. +0 −14 setup.cfg
  43. +0 −29 setup.py
  44. +44 −0 tests/_utils.py
  45. +24 −0 tests/docs/Makefile
  46. +34 −0 tests/docs/conf.py
  47. +8 −0 tests/docs/index.rst
  48. +4 −2 tests/selector_explain_example.py
  49. +6 −4 tests/standalone_test.py
  50. +31 −0 tests/test_adapter_line.py
  51. +30 −0 tests/test_adapter_text.py
  52. +70 −0 tests/test_avro.py
  53. +58 −0 tests/test_avro_adapter.py
  54. +5 −3 tests/test_compiled_selector.py
  55. +80 −0 tests/test_csv_adapter.py
  56. +6 −4 tests/test_deprecations.py
  57. +59 −0 tests/test_elastic_adapter.py
  58. +33 −17 tests/test_fieldtype_ip.py
  59. +580 −120 tests/test_fieldtypes.py
  60. +48 −5 tests/test_json_packer.py
  61. +25 −38 tests/test_json_record_adapter.py
  62. +23 −19 tests/test_multi_timestamp.py
  63. +19 −15 tests/test_packer.py
  64. +278 −31 tests/test_rdump.py
  65. +218 −38 tests/test_record.py
  66. +143 −100 tests/test_record_adapter.py
  67. +10 −8 tests/test_record_descriptor.py
  68. +89 −39 tests/test_regression.py
  69. +78 −55 tests/test_selector.py
  70. +389 −55 tests/test_splunk_adapter.py
  71. +396 −0 tests/test_sqlite_duckdb_adapter.py
  72. +60 −0 tests/test_xlsx_adapter.py
  73. +0 −58 tests/utils_inspect.py
  74. +44 −41 tox.ini
6 changes: 6 additions & 0 deletions .git-blame-ignore-revs
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Formatting commits. You can ignore them during git-blame with `--ignore-rev` or `--ignore-revs-file`.
#
# $ git config --add 'blame.ignoreRevsFile' '.git-blame-ignore-revs'
#
# Change linter to Ruff (#158)
c67f778c653c295ec26146cf6422d3b06ac640e8
24 changes: 24 additions & 0 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
<!--
Thank you for submitting a Pull Request. Please:
* Read our commit style guide:
Commit messages should adhere to the following points:
* Separate subject from body with a blank line
* Limit the subject line to 50 characters as much as possible
* Capitalize the subject line
* Do not end the subject line with a period
* Use the imperative mood in the subject line
* The verb should represent what was accomplished (Create, Add, Fix etc)
* Wrap the body at 72 characters
* Use the body to explain the what and why vs. the how
For an example, look at the following link:
https://docs.dissect.tools/en/latest/contributing/style-guide.html#example-commit-message
* Include a description of the proposed changes and how to test them.
* After creation, associate the PR with an issue, under the development section.
Or use closing keywords in the body during creation:
E.G:
* close(|s|d) #<nr>
* fix(|es|ed) #<nr>
* resolve(|s|d) #<nr>
-->
32 changes: 30 additions & 2 deletions .github/workflows/dissect-ci.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,35 @@
name: Dissect CI
on: [push, pull_request, workflow_dispatch]
on:
push:
branches:
- main
tags:
- '*'
pull_request:
workflow_dispatch:

jobs:
jobs:
ci:
uses: fox-it/dissect-workflow-templates/.github/workflows/dissect-ci-template.yml@main
secrets: inherit

publish:
if: ${{ github.ref_name == 'main' || github.ref_type == 'tag' }}
needs: [ci]
runs-on: ubuntu-latest
environment: dissect_publish
permissions:
id-token: write
steps:
- uses: actions/download-artifact@v4
with:
name: packages
path: dist/
# According to the documentation, it automatically looks inside the `dist/` folder for packages.
- name: Publish package distributions to Pypi
uses: pypa/gh-action-pypi-publish@release/v1

trigger-tests:
needs: [publish]
uses: fox-it/dissect-workflow-templates/.github/workflows/dissect-ci-demand-test-template.yml@main
secrets: inherit
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -6,6 +6,8 @@ dist/
*.pyc
__pycache__/
.pytest_cache/
.tox/

flow/record/version.py
tests/docs/api
tests/docs/build
.tox/
2 changes: 1 addition & 1 deletion MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
exclude .gitignore
exclude .github
recursive-exclude .github/ *
10 changes: 8 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -8,6 +8,12 @@ Records can be read and transformed to other formats by using output adapters, s
For more information on how Dissect uses this library, please see [the
documentation](https://docs.dissect.tools/en/latest/tools/rdump.html#what-is-a-record).

## Requirements

This project is part of the Dissect framework and requires Python.

Information on the supported Python versions can be found in the Getting Started section of [the documentation](https://docs.dissect.tools/en/latest/index.html#getting-started).

## Installation

`flow.record` is available on [PyPI](https://pypi.org/project/flow.record/).
@@ -96,12 +102,12 @@ tox
```

For a more elaborate explanation on how to build and test the project, please see [the
documentation](https://docs.dissect.tools/en/latest/contributing/developing.html#building-testing).
documentation](https://docs.dissect.tools/en/latest/contributing/tooling.html).

## Contributing

The Dissect project encourages any contribution to the codebase. To make your contribution fit into the project, please
refer to [the style guide](https://docs.dissect.tools/en/latest/contributing/style-guide.html).
refer to [the development guide](https://docs.dissect.tools/en/latest/contributing/developing.html).

## Copyright and license

66 changes: 35 additions & 31 deletions flow/record/__init__.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,12 @@
from __future__ import annotations

import gzip
import os
from pathlib import Path

from flow.record.base import (
IGNORE_FIELDS_FOR_COMPARISON,
RECORD_VERSION,
RECORDSTREAM_MAGIC,
DynamicDescriptor,
FieldType,
GroupedRecord,
@@ -15,8 +19,12 @@
RecordWriter,
dynamic_fieldtype,
extend_record,
ignore_fields_for_comparison,
iter_timestamped_records,
open_path,
open_path_or_stream,
open_stream,
set_ignored_fields_for_comparison,
stream,
)
from flow.record.jsonpacker import JsonRecordPacker
@@ -32,66 +40,62 @@
)

__all__ = [
"IGNORE_FIELDS_FOR_COMPARISON",
"RECORDSTREAM_MAGIC",
"RECORD_VERSION",
"DynamicDescriptor",
"FieldType",
"Record",
"GroupedRecord",
"RecordDescriptor",
"JsonRecordPacker",
"PathTemplateWriter",
"Record",
"RecordAdapter",
"RecordArchiver",
"RecordDescriptor",
"RecordDescriptorError",
"RecordField",
"RecordReader",
"RecordWriter",
"RecordOutput",
"RecordPrinter",
"RecordPacker",
"JsonRecordPacker",
"RecordStreamWriter",
"RecordPrinter",
"RecordReader",
"RecordStreamReader",
"open_path",
"stream",
"RecordStreamWriter",
"RecordWriter",
"dynamic_fieldtype",
"DynamicDescriptor",
"PathTemplateWriter",
"RecordArchiver",
"RecordDescriptorError",
"record_stream",
"extend_record",
"ignore_fields_for_comparison",
"iter_timestamped_records",
"open_path",
"open_path_or_stream",
"open_stream",
"record_stream",
"set_ignored_fields_for_comparison",
"stream",
]


class View:
fields = None

def __init__(self, fields):
self.fields = fields

def __iter__(self, fields):
pass


class RecordDateSplitter:
basepath = None
out = None

def __init__(self, basepath):
self.basepath = basepath
def __init__(self, basepath: str | Path):
self.basepath = Path(basepath)
self.out = {}

def getstream(self, t):
def getstream(self, t: tuple[int, int, int]) -> RecordStreamWriter:
if t not in self.out:
path = os.path.join(self.basepath, "-".join(["{:2d}".format(v) for v in t]) + ".rec.gz")
path = self.basepath.joinpath("-".join([f"{v:2d}" for v in t]) + ".rec.gz")
f = gzip.GzipFile(path, "wb")
rs = RecordStreamWriter(f)
self.out[t] = rs
return self.out[t]

def write(self, r):
def write(self, r: Record) -> None:
t = (r.ts.year, r.ts.month, r.ts.day)
rs = self.getstream(t)
rs.write(r)
rs.fp.flush()

def close(self):
def close(self) -> None:
for rs in self.out.values():
rs.close()
46 changes: 18 additions & 28 deletions flow/record/adapter/__init__.py
Original file line number Diff line number Diff line change
@@ -1,63 +1,53 @@
from __future__ import annotations

__path__ = __import__("pkgutil").extend_path(__path__, __name__) # make this namespace extensible from other packages
import abc
from typing import TYPE_CHECKING

if TYPE_CHECKING:
from collections.abc import Iterator

def with_metaclass(meta, *bases):
"""Create a base class with a metaclass. Python 2 and 3 compatible."""

# This requires a bit of explanation: the basic idea is to make a dummy
# metaclass for one level of class instantiation that replaces itself with
# the actual metaclass.
class metaclass(type):
def __new__(cls, name, this_bases, d):
return meta(name, bases, d)

@classmethod
def __prepare__(cls, name, this_bases):
return meta.__prepare__(name, bases)

return type.__new__(metaclass, "temporary_class", (), {})
from flow.record.base import Record


class AbstractWriter(with_metaclass(abc.ABCMeta, object)):
class AbstractWriter(metaclass=abc.ABCMeta):
@abc.abstractmethod
def write(self, rec):
def write(self, rec: Record) -> None:
"""Write a record."""
raise NotImplementedError

@abc.abstractmethod
def flush(self):
def flush(self) -> None:
"""Flush any buffered writes."""
raise NotImplementedError

@abc.abstractmethod
def close(self):
def close(self) -> None:
"""Close the Writer, no more writes will be possible."""
raise NotImplementedError

def __del__(self):
def __del__(self) -> None:
self.close()

def __enter__(self):
def __enter__(self) -> AbstractWriter: # noqa: PYI034
return self

def __exit__(self, *args):
def __exit__(self, *args) -> None:
self.flush()
self.close()


class AbstractReader(with_metaclass(abc.ABCMeta, object)):
class AbstractReader(metaclass=abc.ABCMeta):
@abc.abstractmethod
def __iter__(self):
def __iter__(self) -> Iterator[Record]:
"""Return a record iterator."""
raise NotImplementedError

def close(self):
def close(self) -> None: # noqa: B027
"""Close the Reader, can be overriden to properly free resources."""
pass

def __enter__(self):
def __enter__(self) -> AbstractReader: # noqa: PYI034
return self

def __exit__(self, *args):
def __exit__(self, *args) -> None:
self.close()
17 changes: 12 additions & 5 deletions flow/record/adapter/archive.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,13 @@
from __future__ import annotations

from typing import TYPE_CHECKING

from flow.record.adapter import AbstractReader, AbstractWriter
from flow.record.stream import RecordArchiver

if TYPE_CHECKING:
from flow.record.base import Record

__usage__ = """
Record archiver adapter, writes records to YYYY/mm/dd directories (writer only)
---
@@ -12,27 +19,27 @@
class ArchiveWriter(AbstractWriter):
writer = None

def __init__(self, path, **kwargs):
def __init__(self, path: str, **kwargs):
self.path = path

path_template = kwargs.get("path_template")
name = kwargs.get("name")

self.writer = RecordArchiver(self.path, path_template=path_template, name=name)

def write(self, r):
def write(self, r: Record) -> None:
self.writer.write(r)

def flush(self):
def flush(self) -> None:
# RecordArchiver already flushes after every write
pass

def close(self):
def close(self) -> None:
if self.writer:
self.writer.close()
self.writer = None


class ArchiveReader(AbstractReader):
def __init__(self, path, **kwargs):
def __init__(self, path: str, **kwargs):
raise NotImplementedError
Loading