Skip to content

Latest commit

 

History

History
74 lines (61 loc) · 1.82 KB

README.md

File metadata and controls

74 lines (61 loc) · 1.82 KB

Intake-DuckDB

Build Status Documentation Status

DuckDB Plugin for Intake

Installation

From PyPI

pip install intake-duckdb

Or conda-forge

conda install -c conda-forge intake-duckdb

Usage

Load an entire table into a dataframe

source = intake.open_duckdb("path/to/dbfile", "tablename")
df = source.read()

Or a custom SQL in valid DuckDB query syntax

source = intake.open_duckdb("path/to/dbfile", "SELECT col1, col2 FROM tablename")
df = source.read()

Can also iterate over table chunks

source_chunked = intake.open_duckdb("path/to/dbfile", "tablename", chunks=10)
source_chunked.discover()
for chunk in source_chunked.read_chunked():
    # do something
    ...

DuckDB catalog: create an Intake catalog from a DuckDB backend

cat = intake.open_duckdb_cat("path/to/dbfile")

# list the sources in 'cat'
list(cat)

df = cat["tablename"].read()
df_chunks = [chunk for chunk in cat["tablename"](chunks=10).read_chunked()]

Run DuckDB queries on other Intake sources (that produce pandas DataFrames) within the same catalog

# cat.yaml
sources:
  csv_source:
    args:
      urlpath: https://data.csv
    description: Remote CSV source
    driver: csv

  duck_source:
    args:
      targets:
        - csv_source
      sql_expr: SELECT col FROM csv_source LIMIT 10
    description: Source referencing other sources in catalog
    driver: duckdb_transform
cat  = intake.open_catalog("cat.yaml")
duck_source = cat.duck_source.read()