Partitioneer is a Python library that provides utilities for managing data files in a date-partitioned format. It offers functions for writing data to partitions, reading data from partitions with filtering capabilities, and retrieving partition date information.
You can install Partitioneer using pip:
pip install partitioneer
To write data to partitioned Parquet files:
from partitioneer import write_data_to_partitions
import pandas as pd
df = pd.DataFrame(...) # Your data
write_data_to_partitions(
df,
base_path="/path/to/data",
date_col="date_column",
override_existing=False
)
To read data from partitioned Parquet files:
from partitioneer import read_data_from_partitions, PartitionFilter
df = read_data_from_partitions(
base_path="/path/to/data",
filters=[
PartitionFilter("category", "in", ["A", "B"]),
PartitionFilter("value", "greater_than", 100)
],
add_partition_date=True,
start_date="2024-01-01",
end_date="2024-12-31"
)
To get the latest or first partition date:
from partitioneer import get_latest_partition_date, get_first_partition_date
latest_date = get_latest_partition_date("/path/to/data")
first_date = get_first_partition_date("/path/to/data")
To build the package:
python setup.py sdist bdist_wheel
To upload to PyPI:
pip install twine
twine upload dist/*
Automated build and publish script:
python setup.py sdist bdist_wheel
pip install twine
twine upload dist/* --password <add_pypi_token_here>
rm -r ./build
rm -r ./dist
rm -r ./partitioneer.egg-info