Skip to content

Commit

Permalink
Spell TL;DR consistently in blog posts
Browse files Browse the repository at this point in the history
  • Loading branch information
szarnyasg committed Nov 29, 2023
1 parent ac5cb07 commit d6ea72a
Show file tree
Hide file tree
Showing 26 changed files with 27 additions and 29 deletions.
2 changes: 1 addition & 1 deletion _posts/2021-01-25-full-text-search.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ excerpt_separator: <!--more-->

---

_TLDR: DuckDB now has full-text search functionality, similar to the FTS5 extension in SQLite. The main difference is that our FTS extension is fully formulated in SQL. We tested it out on TREC disks 4 and 5._
_TL;DR: DuckDB now has full-text search functionality, similar to the FTS5 extension in SQLite. The main difference is that our FTS extension is fully formulated in SQL. We tested it out on TREC disks 4 and 5._

Searching through textual data stored in a database can be cumbersome, as SQL does not provide a good way of formulating questions such as "Give me all the documents about __Mallard Ducks__": string patterns with `LIKE` will only get you so far. Despite SQL's shortcomings here, storing textual data in a database is commonplace. Consider the table `products (id INT, name VARCHAR, description VARCHAR`) - it would be useful to search through the `name` and `description` columns for a website that sells these products.

Expand Down
2 changes: 1 addition & 1 deletion _posts/2021-05-14-sql-on-pandas.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ excerpt_separator: <!--more-->

---

_TLDR: DuckDB, a free and open source analytical data management system, can efficiently run SQL queries directly on Pandas DataFrames._
_TL;DR: DuckDB, a free and open source analytical data management system, can efficiently run SQL queries directly on Pandas DataFrames._

Recently, an article was published [advocating for using SQL for Data Analysis](https://hakibenita.com/sql-for-data-analysis). Here at team DuckDB, we are huge fans of [SQL](https://en.wikipedia.org/wiki/SQL). It is a versatile and flexible language that allows the user to efficiently perform a wide variety of data transformations, without having to care about how the data is physically represented or how to do these data transformations in the most optimal way.

Expand Down
2 changes: 1 addition & 1 deletion _posts/2021-06-25-querying-parquet.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ excerpt_separator: <!--more-->

---

_TLDR: DuckDB, a free and open source analytical data management system, can run SQL queries directly on Parquet files and automatically take advantage of the advanced features of the Parquet format._
_TL;DR: DuckDB, a free and open source analytical data management system, can run SQL queries directly on Parquet files and automatically take advantage of the advanced features of the Parquet format._

Apache Parquet is the most common "Big Data" storage format for analytics. In Parquet files, data is stored in a columnar-compressed binary format. Each Parquet file stores a single table. The table is partitioned into row groups, which each contain a subset of the rows of the table. Within a row group, the table data is stored in a columnar fashion.

Expand Down
2 changes: 1 addition & 1 deletion _posts/2021-08-27-external-sorting.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ excerpt_separator: <!--more-->

---

_TLDR: DuckDB, a free and Open-Source analytical data management system, has a new highly efficient parallel sorting implementation that can sort much more data than fits in main memory._
_TL;DR: DuckDB, a free and Open-Source analytical data management system, has a new highly efficient parallel sorting implementation that can sort much more data than fits in main memory._

Database systems use sorting for many purposes, the most obvious purpose being when a user adds an `ORDER BY` clause to their query.
Sorting is also used within operators, such as window functions.
Expand Down
2 changes: 1 addition & 1 deletion _posts/2021-10-13-windowing.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ excerpt_separator: <!--more-->

---

_TLDR: DuckDB, a free and Open-Source analytical data management system, has a state-of-the-art windowing engine
_TL;DR: DuckDB, a free and Open-Source analytical data management system, has a state-of-the-art windowing engine
that can compute complex moving aggregates like inter-quartile ranges as well as simpler moving averages._

Window functions (those using the `OVER` clause) are important tools for analysing data series,
Expand Down
2 changes: 1 addition & 1 deletion _posts/2021-10-29-duckdb-wasm.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ excerpt_separator: <!--more-->

---

_TLDR: [DuckDB-Wasm](https://github.com/duckdb/duckdb-wasm) is an in-process analytical SQL database for the browser.
_TL;DR: [DuckDB-Wasm](https://github.com/duckdb/duckdb-wasm) is an in-process analytical SQL database for the browser.
It is powered by WebAssembly, speaks Arrow fluently, reads Parquet, CSV and JSON files backed by Filesystem APIs or HTTP requests and has been tested with Chrome, Firefox, Safari and Node.js.
You can try it in your browser at [shell.duckdb.org](https://shell.duckdb.org) or on [Observable](https://observablehq.com/@cmudig/duckdb)._

Expand Down
2 changes: 1 addition & 1 deletion _posts/2021-11-12-moving-holistic.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ excerpt_separator: <!--more-->

---

_TLDR: DuckDB, a free and Open-Source analytical data management system, has a windowing API
_TL;DR: DuckDB, a free and Open-Source analytical data management system, has a windowing API
that can compute complex moving aggregates like interquartile ranges and median absolute deviation
much faster than the conventional approaches._

Expand Down
2 changes: 1 addition & 1 deletion _posts/2021-12-03-duck-arrow.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ excerpt_separator: <!--more-->

---

*TLDR: The zero-copy integration between DuckDB and Apache Arrow allows for rapid analysis of larger than memory datasets in Python and R using either SQL or relational APIs.*
_TL;DR: The zero-copy integration between DuckDB and Apache Arrow allows for rapid analysis of larger than memory datasets in Python and R using either SQL or relational APIs._

This post is a collaboration with and cross-posted on [the Arrow blog](https://arrow.apache.org/blog/2021/12/03/arrow-duckdb/).
<!--more-->
Expand Down
2 changes: 1 addition & 1 deletion _posts/2022-01-06-time-zones.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ author: Richard Wesley
excerpt_separator: <!--more-->
---

*TLDR: The DuckDB ICU extension now provides time zone support.*
_TL;DR: The DuckDB ICU extension now provides time zone support._

Time zone support is a common request for temporal analytics, but the rules are complex and somewhat arbitrary.
The most well supported library for locale-specific operations is the [International Components for Unicode (ICU)](https://icu.unicode.org).
Expand Down
2 changes: 1 addition & 1 deletion _posts/2022-03-07-aggregate-hashtable.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ author: Hannes Mühleisen and Mark Raasveldt
excerpt_separator: <!--more-->
---

*TL;DR: DuckDB has a fully parallelized aggregate hash table that can efficiently aggregate over millions of groups.*
_TL;DR: DuckDB has a fully parallelized aggregate hash table that can efficiently aggregate over millions of groups._


Grouped aggregations are a core data analysis command. It is particularly important for large-scale data analysis (“OLAP”) because it is useful for computing statistical summaries of huge tables. DuckDB contains a highly optimized parallel aggregation capability for fast and scalable summarization.
Expand Down
2 changes: 1 addition & 1 deletion _posts/2022-05-27-iejoin.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ author: Richard Wesley
excerpt_separator: <!--more-->
---

*TL;DR: DuckDB has fully parallelised range joins that can efficiently join millions of range predicates.*
_TL;DR: DuckDB has fully parallelised range joins that can efficiently join millions of range predicates._

Range intersection joins are an important operation in areas such as
[temporal analytics](https://www2.cs.arizona.edu/~rts/tdbbook.pdf),
Expand Down
2 changes: 1 addition & 1 deletion _posts/2022-07-27-art-storage.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ excerpt_separator: <!--more-->
/>


*TLDR: DuckDB uses Adaptive Radix Tree (ART) Indexes to enforce constraints and to speed up query filters. Up to this point, indexes were not persisted, causing issues like loss of indexing information and high reload times for tables with data constraints. We now persist ART Indexes to disk, drastically diminishing database loading times (up to orders of magnitude), and we no longer lose track of existing indexes. This blog post contains a deep dive into the implementation of ART storage, benchmarks, and future work. Finally, to better understand how our indexes are used, I'm asking you to answer the following [survey](https://forms.gle/eSboTEp9qpP7ybz98). It will guide us when defining our future roadmap.*
_TL;DR: DuckDB uses Adaptive Radix Tree (ART) Indexes to enforce constraints and to speed up query filters. Up to this point, indexes were not persisted, causing issues like loss of indexing information and high reload times for tables with data constraints. We now persist ART Indexes to disk, drastically diminishing database loading times (up to orders of magnitude), and we no longer lose track of existing indexes. This blog post contains a deep dive into the implementation of ART storage, benchmarks, and future work. Finally, to better understand how our indexes are used, I'm asking you to answer the following [survey](https://forms.gle/eSboTEp9qpP7ybz98). It will guide us when defining our future roadmap._

<!--more-->

Expand Down
2 changes: 1 addition & 1 deletion _posts/2022-09-30-postgres-scanner.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ excerpt_separator: <!--more-->
---


*TLDR: DuckDB can now directly query tables stored in PostgreSQL and speed up complex analytical queries without duplicating data.*
_TL;DR: DuckDB can now directly query tables stored in PostgreSQL and speed up complex analytical queries without duplicating data._

<!--more-->

Expand Down
2 changes: 1 addition & 1 deletion _posts/2022-10-12-modern-data-stack-in-a-box.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ excerpt_separator: <!--more-->
width=200
/>

*TLDR: A fast, free, and open-source Modern Data Stack (MDS) can now be fully deployed on your laptop or to a single machine using the combination of [DuckDB](https://duckdb.org/), [Meltano](https://meltano.com/), [dbt](https://www.getdbt.com/), and [Apache Superset](https://superset.apache.org/).*
_TL;DR: A fast, free, and open-source Modern Data Stack (MDS) can now be fully deployed on your laptop or to a single machine using the combination of [DuckDB](https://duckdb.org/), [Meltano](https://meltano.com/), [dbt](https://www.getdbt.com/), and [Apache Superset](https://superset.apache.org/)._

This post is a collaboration with Jacob Matson and cross-posted on [dataduel.co](https://www.dataduel.co/modern-data-stack-in-a-box-with-duckdb/).

Expand Down
2 changes: 1 addition & 1 deletion _posts/2022-10-28-lightweight-compression.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ excerpt_separator: <!--more-->
width=200px
/>

*TLDR: DuckDB supports efficient lightweight compression that is automatically used to keep data size down without incurring high costs for compression and decompression.*
_TL;DR: DuckDB supports efficient lightweight compression that is automatically used to keep data size down without incurring high costs for compression and decompression._

When working with large amounts of data, compression is critical for reducing storage size and egress costs. Compression algorithms typically reduce data set size by **75-95%**, depending on how compressible the data is. Compression not only reduces the storage footprint of a data set, but also often **improves performance** as less data has to be read from disk or over a network connection.

Expand Down
4 changes: 1 addition & 3 deletions _posts/2023-02-24-jupysql.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,7 @@ jupyter:
name: python3
---

## TLDR

[JupySQL](https://github.com/ploomber/jupysql) provides a seamless SQL experience in Jupyter and uses DuckDB to visualize larger than memory datasets in matplotlib.
_TL;DR: [JupySQL](https://github.com/ploomber/jupysql) provides a seamless SQL experience in Jupyter and uses DuckDB to visualize larger than memory datasets in matplotlib._

<!--more-->
## Introduction
Expand Down
2 changes: 1 addition & 1 deletion _posts/2023-03-03-json.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ author: Laurens Kuiper
excerpt_separator: <!--more-->
---

*TL;DR: We've recently improved DuckDB's JSON extension so JSON files can be directly queried as if they were tables.*
_TL;DR: We've recently improved DuckDB's JSON extension so JSON files can be directly queried as if they were tables._

<img src="/images/blog/jason-duck.jpg" alt="JSON is not scary anymore! Jason IS scary though, even as a duck." width=180/>

Expand Down
2 changes: 1 addition & 1 deletion _posts/2023-04-14-h2oai.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ author: Tom Ebergen
excerpt_separator: <!--more-->
---

*TL;DR: We've resurrected the H2O.ai database-like ops benchmark with up to date libraries and plan to keep re-running it.*
_TL;DR: We've resurrected the H2O.ai database-like ops benchmark with up to date libraries and plan to keep re-running it._


[Skip directly to the results](#results)
Expand Down
2 changes: 1 addition & 1 deletion _posts/2023-04-21-swift.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ author: Tristan Celder
excerpt_separator: <!--more-->
---

*TL;DR: DuckDB now has a native Swift API. DuckDB on mobile here we go!*
_TL;DR: DuckDB now has a native Swift API. DuckDB on mobile here we go!_


Today we’re excited to announce the [DuckDB API for Swift](https://github.com/duckdb/duckdb-swift). It enables developers on Swift platforms to harness the full power of DuckDB using a native Swift interface with support for great Swift features such as strong typing and concurrency. The API is available not only on Apple platforms, but on Linux too, opening up new opportunities for the growing Swift on Server ecosystem.
Expand Down
2 changes: 1 addition & 1 deletion _posts/2023-04-28-spatial.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ author: Max Gabrielsson
excerpt_separator: <!--more-->
---

*TL;DR: DuckDB now has an official [Spatial extension](https://github.com/duckdb/duckdb_spatial) to enable geospatial processing*
_TL;DR: DuckDB now has an official [Spatial extension](https://github.com/duckdb/duckdb_spatial) to enable geospatial processing._


Geospatial data has become increasingly important and prevalent in modern-day applications and data engineering workflows, with use-cases ranging from location-based services to environmental monitoring.
Expand Down
2 changes: 1 addition & 1 deletion _posts/2023-07-07-python-udf.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ excerpt_separator: <!--more-->
width=100
/>

*TLDR: DuckDB now supports vectorized Scalar Python User Defined Functions (UDFs). By implementing Python UDFs, users can easily expand the functionality of DuckDB while taking advantage of DuckDB's fast execution model, SQL and data safety.*
_TL;DR: DuckDB now supports vectorized Scalar Python User Defined Functions (UDFs). By implementing Python UDFs, users can easily expand the functionality of DuckDB while taking advantage of DuckDB's fast execution model, SQL and data safety._

User Defined Functions (UDFs) enable users to extend the functionality of a Database Management System (DBMS) to perform domain-specific tasks that are not implemented as built-in functions. For instance, users who frequently need to export private data can benefit from an anonymization function that masks the local part of an email while preserving the domain. Ideally, this function would be executed directly in the DBMS. This approach offers several advantages:

Expand Down
2 changes: 1 addition & 1 deletion _posts/2023-08-04-adbc.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ excerpt_separator: <!--more-->
width=100
/>

*TLDR: DuckDB has added support for [Arrow Database Connectivity (ADBC)](https://arrow.apache.org/adbc/0.5.1/index.html), an API standard that enables efficient data ingestion and retrieval from database systems, similar to [Open Database Connectivity (ODBC)](https://learn.microsoft.com/en-us/sql/odbc/microsoft-open-database-connectivity-odbc?view=sql-server-ver16) interface. However, unlike ODBC, ADBC specifically caters to the columnar storage model, facilitating fast data transfers between a columnar database and an external application.*
_TL;DR: DuckDB has added support for [Arrow Database Connectivity (ADBC)](https://arrow.apache.org/adbc/0.5.1/index.html), an API standard that enables efficient data ingestion and retrieval from database systems, similar to [Open Database Connectivity (ODBC)](https://learn.microsoft.com/en-us/sql/odbc/microsoft-open-database-connectivity-odbc?view=sql-server-ver16) interface. However, unlike ODBC, ADBC specifically caters to the columnar storage model, facilitating fast data transfers between a columnar database and an external application._

Database interface standards allow developers to write application code that is independent of the underlying database management system (DBMS) being used. DuckDB has supported two standards that have gained popularity in the past few decades: [the core interface of ODBC](https://learn.microsoft.com/en-us/sql/odbc/reference/develop-app/interface-conformance-levels?view=sql-server-ver16) and [Java Database Connectivity (JDBC)](https://en.wikipedia.org/wiki/Java_Database_Connectivity). Both interfaces are designed to fully support database connectivity and management, with JDBC being catered for the Java environment. With these APIs, developers can query DBMS agnostically, retrieve query results, run prepared statements, and manage connections.

Expand Down
4 changes: 2 additions & 2 deletions _posts/2023-08-23-even-friendlier-sql.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,14 @@
layout: post
title: "Even Friendlier SQL with DuckDB"
author: Alex Monahan

---

<img src="/images/blog/ai_generated_star_trek_rubber_duck.png"
alt="Looks like a Duck ready to boldly go where databases have not gone before"
width=200
/>

TLDR; DuckDB continues to push the boundaries of SQL syntax to both simplify queries and make more advanced analyses possible. Highlights include dynamic column selection, queries that start with the FROM clause, function chaining, and list comprehensions. We boldly go where no SQL engine has gone before!
_TL;DR: DuckDB continues to push the boundaries of SQL syntax to both simplify queries and make more advanced analyses possible. Highlights include dynamic column selection, queries that start with the FROM clause, function chaining, and list comprehensions. We boldly go where no SQL engine has gone before!_

Who says that SQL should stay frozen in time, chained to a 1999 version of the specification? As a comparison, do folks remember what JavaScript felt like before Promises? Those didn’t launch until 2012! It’s clear that innovation at the programming syntax layer can have a profoundly positive impact on an entire language ecosystem.

Expand Down
2 changes: 1 addition & 1 deletion _posts/2023-09-15-asof-joins-fuzzy-temporal-lookups.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ title: "DuckDB's AsOf Joins: Fuzzy Temporal Lookups"
author: Richard Wesley
---

*TLDR: DuckDB supports AsOf Joins – a way to match nearby values. They are especially useful for searching event tables for temporal analytics.*
_TL;DR: DuckDB supports AsOf Joins – a way to match nearby values. They are especially useful for searching event tables for temporal analytics._

Do you have time series data that you want to join,
but the timestamps don't quite match?
Expand Down
2 changes: 1 addition & 1 deletion _posts/2023-10-27-csv-sniffer.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ excerpt_separator: <!--more-->
width="300"
/>

*TLDR: DuckDB is primarily focused on performance, leveraging the capabilities of modern file formats. At the same time, we also pay attention to flexible, non-performance-driven formats like CSV files. To create a nice and pleasant experience when reading from CSV files, DuckDB implements a CSV sniffer that automatically detects CSV dialect options, column types, and even skips dirty data. The sniffing process allows users to efficiently explore CSV files without needing to provide any input about the file format.*
_TL;DR: DuckDB is primarily focused on performance, leveraging the capabilities of modern file formats. At the same time, we also pay attention to flexible, non-performance-driven formats like CSV files. To create a nice and pleasant experience when reading from CSV files, DuckDB implements a CSV sniffer that automatically detects CSV dialect options, column types, and even skips dirty data. The sniffing process allows users to efficiently explore CSV files without needing to provide any input about the file format._

There are many different file formats that users can choose from when storing their data. For example, there are performance-oriented binary formats like Parquet, where data is stored in a columnar format, partitioned into row-groups, and heavily compressed. However, Parquet is known for its rigidity, requiring specialized systems to read and write these files.

Expand Down
2 changes: 1 addition & 1 deletion _posts/2023-11-03-db-benchmark-update.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ excerpt_separator: <!--more-->
---


*TL;DR: the H2O.ai db-benchmark has been updated with new results. In addition, the AWS EC2 instance used for benchmarking has been changed to a c6id.metal for improved repeatability and fairness across libraries. DuckDB is the fastest library for both join and group by queries at almost every data size.*
_TL;DR: the H2O.ai db-benchmark has been updated with new results. In addition, the AWS EC2 instance used for benchmarking has been changed to a c6id.metal for improved repeatability and fairness across libraries. DuckDB is the fastest library for both join and group by queries at almost every data size._


[Skip directly to the results](#results)
Expand Down

0 comments on commit d6ea72a

Please sign in to comment.