Skip to content

Latest commit

 

History

History
341 lines (276 loc) · 21.6 KB

README.md

File metadata and controls

341 lines (276 loc) · 21.6 KB

🍻 Open Brewery DB Dataset

All Contributors

Open Brewery DB Logo

This is the open-source dataset for the Open Brewery DB API which is served by a REST API built with Ruby on Rails

🎯 Purpose

Provide an approval-based pipeline to update the dataset and API.

🗄 Data Formats

🚀 Getting Started

  1. git clone [email protected]:openbrewerydb/openbrewerydb.git
  2. cd openbrewerydb && npm install

⚙️ Scripts

The following npm scripts help maintain and manage the dataset:

Data Management

  • npm run validate

    • Validates all CSV files against the JSON Schema
    • Checks for required fields and data format consistency
    • Reports any validation errors that need attention
  • npm run csv:combine

    • Combines all individual CSV files from country/state-region folders into a single breweries.csv
    • Useful when you've made changes to individual state files and need to update the main dataset
  • npm run csv:split

    • Splits the main breweries.csv into separate files by country/state-region
    • Helps maintain organized, manageable data files for each region
    • Creates directories if they don't exist

Data Generation

  • npm run generate:ids

    • Creates unique OBDB IDs for each brewery based on name and city
    • Automatically updates breweries.csv with new IDs
    • Ensures no duplicate IDs exist in the dataset
  • npm run generate:json

    • Converts breweries.csv into a JSON format (breweries.json)
    • Useful for applications that prefer working with JSON data
    • Maintains data consistency across formats
  • npm run generate:sql

    • Creates PostgreSQL SQL file from breweries.csv
    • Includes table creation and data insertion statements
    • Perfect for database implementations
  • npm run generate:stats

    • Generates comprehensive dataset statistics
    • Shows brewery counts by state/city
    • Displays brewery type distribution
    • Reports data completeness metrics

Contributor Management

  • npm run contributors:add

    • Interactive CLI tool to add new contributors
    • Prompts for contributor information and contribution type
    • Updates .all-contributorsrc file
  • npm run contributors:check

    • Verifies if any contributors are missing from the list
    • Helps maintain accurate recognition of all contributors
  • npm run contributors:generate

    • Updates the Contributors section in README.md
    • Generates contributor table with avatars and contribution types

Workflow

  • npm run workflow:maintain
    • Comprehensive maintenance workflow that:
      1. Validates all CSV files
      2. Combines all CSV files
      3. Generates new IDs if needed
      4. Creates JSON and SQL files
      5. Splits back into individual state files
    • Run this after making any dataset updates

🤝 Contributing

For information on contributing to this project, please see the contributing guide and our code of conduct.

  1. Fork the repository
  2. Add or update breweries in the CSV (Excel, Google Sheets)
  3. Submit a Pull Request

Tips

First and foremost, don't worry about messing up! 🙂 Thank you so much for contributing! 🙌

  • CSVs are organized by data/[country]/[state_province]
  • Required fields/columns: name, brewery_type, city, state_province, and country
  • When adding a brewery, do not include an id. This will be created after review.
  • Please either add to breweries.csv (preferred if adding breweries for a new country) or the individual state/province CSV file. Adding to both at the same time may introduce duplicates/errors.

👾 Community

📫 Feedback

Any feedback, please email me.

Cheers! 🍻

📊 Project Status

  • Status: Active
  • Last Dataset Update: 2024
  • Maintenance: Actively maintained through community contributions
  • Dataset Size: 8,000+ breweries
  • Coverage: United States, with growing international data

🔧 Requirements

  • Node.js v18 or higher
  • npm package manager
  • Git

📚 Data Schema

Each brewery entry contains the following fields:

Field Type Description Required
id String Unique identifier Yes
name String Name of the brewery Yes
brewery_type String Type of brewery (micro, regional, brewpub, etc.) Yes
street String Street address No
city String City Yes
state_province String State/Province Yes
postal_code String Postal code Yes
country String Country Yes
longitude String Decimal longitude coordinate No
latitude String Decimal latitude coordinate No
phone String Phone number No
website_url String Website URL No

📖 Usage Examples

Python

import pandas as pd

# Read CSV
breweries_df = pd.read_csv('breweries.csv')

# Filter by state
california_breweries = breweries_df[breweries_df['state_province'] == 'California']

JavaScript/Node.js

const fs = require('fs');

// Read JSON
const breweries = JSON.parse(fs.readFileSync('breweries.json', 'utf8'));

// Filter by type
const microBreweries = breweries.filter(b => b.brewery_type === 'micro');

SQL

-- After importing breweries.sql
SELECT name, city, state_province
FROM breweries
WHERE brewery_type = 'brewpub'
ORDER BY state_province, city;

🔄 Versioning

The dataset is updated regularly through community contributions. Each update goes through the following process:

  1. Community members submit new breweries or updates via pull requests
  2. Changes are reviewed and validated
  3. Upon approval, changes are merged and new dataset files are generated
  4. The API is automatically updated with the new data

Latest dataset version: 2024.1

Contributors ✨

Thanks goes to these wonderful people (emoji key):

Mike Putnam
Mike Putnam

🔣
Andrew A. Barber
Andrew A. Barber

🔣
Jason Allen
Jason Allen

🔣
Juicob
Juicob

🔣
Will Karnasiewicz
Will Karnasiewicz

🔣
Dylan T. Vavra
Dylan T. Vavra

🔣
Madison Martinez
Madison Martinez

🔣
Daniel Eremchuk
Daniel Eremchuk

🔣
Alex Chong
Alex Chong

🔣
Matt S
Matt S

🔣
Samuel Rusher
Samuel Rusher

🔣
Evan Caraway
Evan Caraway

🔣
Tyler K Kuromiya Parker
Tyler K Kuromiya Parker

🔣
Chris Mears
Chris Mears

💬 💻 🔣 🚧 📆 🔧
donkeyslaps
donkeyslaps

🔣
Pranav Davar
Pranav Davar

🔧
Alexandre Hernandes Barrozo
Alexandre Hernandes Barrozo

🔣
Resten
Resten

🔣
Matt Higgins
Matt Higgins

🔣
Alex Justesen
Alex Justesen

🔣
Craig Kelly
Craig Kelly

🔣
Krzysztof Rewak
Krzysztof Rewak

🔣
John Baumert
John Baumert

🔣
Charlie Cox
Charlie Cox

🔣
Miles Kane
Miles Kane

🔣
Anthony Laflamme
Anthony Laflamme

💻
Georg Engelsmann
Georg Engelsmann

🔣
Clinton Williams
Clinton Williams

🔣
Brent Busby
Brent Busby

🔣
kenster89
kenster89

🔣
Adilet Sarsembayev
Adilet Sarsembayev

🔣
b-mc2
b-mc2

🔣
Nicole
Nicole

🔣
Nicholas Hance
Nicholas Hance

🔣
Joachim Nilsson
Joachim Nilsson

🔣
Alejandro Lopez Rocha
Alejandro Lopez Rocha

🔣
zshapleigh
zshapleigh

🔣
Praval Visvanath
Praval Visvanath

🔣
JohnHenry
JohnHenry

🔣
Alfredo Garcia
Alfredo Garcia

🔣
Qerewe
Qerewe

🔣
Nathan Peters
Nathan Peters

🔣
Erich Cervantez
Erich Cervantez

🔣
Ronald Sahagun
Ronald Sahagun

🔣

This project follows the all-contributors specification. Contributions of any kind welcome!

📊 Statistics

Last updated: 2024-11-01

Overview

  • Total Breweries: 8,355
  • Data Completeness: 78.0%

🏛 Top 10 States by Brewery Count

State Count
California 918
Washington 486
Colorado 448
New York 419
Michigan 375
Texas 352
Pennsylvania 345
Florida 312
North Carolina 307
Ohio 303

🍺 Brewery Types Distribution

Type Count Percentage
micro 4,305 51.5%
brewpub 2,500 29.9%
planning 684 8.2%
regional 225 2.7%
closed 216 2.6%
contract 192 2.3%
large 90 1.1%
proprietor 69 0.8%
bar 37 0.4%
taproom 20 0.2%
nano 13 0.2%
beergarden 3 0.0%
location 1 0.0%

🌆 Top 10 Cities by Brewery Count

City Count
Denver, Colorado 92
San Diego, California 91
Portland, Oregon 85
Seattle, Washington 80
Chicago, Illinois 64
Austin, Texas 49
Houston, Texas 40
San Francisco, California 39
Minneapolis, Minnesota 38
Cincinnati, Ohio 34

📋 Data Completeness by Field

Field Completeness
name 100.0%
brewery_type 100.0%
city 100.0%
state_province 100.0%
postal_code 100.0%
country 100.0%
address_1 91.0%
phone 90.0%
website_url 86.0%
longitude 72.0%
latitude 72.0%
address_2 1.0%
address_3 0.0%