Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Primary Column Can't Be String #124

Closed
spencerfinnell opened this issue Sep 21, 2021 · 9 comments
Closed

Primary Column Can't Be String #124

spencerfinnell opened this issue Sep 21, 2021 · 9 comments
Assignees
Labels
component-query The Query class workflow-needs-research Needs to be researched
Milestone

Comments

@spencerfinnell
Copy link

When setting a primary column if the type of column is not an int, tinyint, etc, queries will not work. get_item_ids() always uses wp_parse_id_list which forces an integer value. This will destroy non-integer results.

Use case: Stripe object IDs are strings and it would be ideal to use those as the primary column when tracking the objects internally.

@spencerfinnell
Copy link
Author

spencerfinnell commented Sep 21, 2021

This also happens again in set_items() and shape_item_id().

@JJJ
Copy link
Collaborator

JJJ commented Sep 21, 2021

Most of Berlin is designed with the idea that the primary column is always numeric and auto-incrementing.

Stripe object IDs are strings and it would be ideal to use those as the primary column when tracking the objects internally.

Why is that ideal? I think I would still want/suggest a traditional numeric auto-incrementing id column as the primary key, and then add the Stripe object ID as the next indexed string column. What do you think?

Using auto-incrementing IDs in SQL brings some benefits:

  • Retrieves data faster/easier because each primary key is a simple, sequential integer (rather than a complex mix of other string values).
  • Avoid breaking stuff if other values change, as would happen if your primary key was a compound of other column values.
  • Ensure the uniqueness of each key without locking or race conditions, as auto-incremented IDs automatically avoid repetition and duplication.
  • Improve system performance by being a compact data type, allowing for optimized code paths that avoid skipping between columns to create keys.
  • Relationship tables (like meta data, etc...) make it easier to identify row connections with an int than a complex string.

An obvious point against numeric/autoincrement is what happened at Mt Gox back in the day.

@JJJ JJJ added component-query The Query class workflow-needs-research Needs to be researched labels Sep 21, 2021
@spencerfinnell
Copy link
Author

spencerfinnell commented Sep 22, 2021

@JJJ That all seems reasonable. I guess my main thinking was that I will always know the Stripe object ID so I can use it immediately on update and delete calls without first finding the relevant row via the Stripe object ID so I can then use auto incremented primary key.

i.e

$coupon = get_coupon_by( 'object_id', $object_id );
$db->delete_item( $coupon->id );

vs.

$db->delete_item( $object_id );

Is there enough caching in play where that doesn't matter? Or are the performance implications of a string based primary key worse?

@alexstandiford
Copy link
Collaborator

alexstandiford commented Sep 22, 2021

You can also set the stripe ID as a key in your table schema, so you'd get similar speed benefits if it were the primary key.

If you want to be able to delete a record using the Stripe object ID, I would consider extending the delete method to support the stripe ID, or create a a different method specifically for that.

I know it seems wasteful to have to fetch the record from the DB first, but that happens inside Berlin regardless, and it's caching mechanism is smart enough to only query once. It shouldn't have any significant impact on performance, but you still get the benefits mentioned by @JJJ

@ashleyfae
Copy link
Contributor

You could just use wpdb directly to delete in that instance. Their delete method supports a where clause.

$wpdb->delete( $table_name, array( 'object_id' => $object_id ), array( '%s' ) );

Translates into:

DELETE FROM my_table WHERE object_id = 'sub_123';

@alexstandiford
Copy link
Collaborator

alexstandiford commented Sep 22, 2021

What @ashleyfae is suggesting might be better, but it doesn't clear the item's cache. Make sure you take a look at the delete item method to see how it handles updating the cache for the record otherwise you'll end up with unexpected results should you make another query for that record in the same request.

@spencerfinnell
Copy link
Author

Thanks for the tips/info @ashleyfae and @alexstandiford. As with most things, after rethinking things a bit, I think I can get by with a "normal" primary key without issue. It would be nice to document this expected behavior or throw an error when things are misconfigured, similar to some of the object properties. I had to dig down a bit to see where/why things were failing.

@JJJ
Copy link
Collaborator

JJJ commented Sep 22, 2021

It would be nice to document this expected behavior or throw an error when things are misconfigured

Noted. Great feedback. 👍

JJJ added a commit that referenced this issue Jun 26, 2022
* Move filters into methods and their own section
* Improve docs, and add missing docs
* Default values for item_names to prevent fatals
* Add setup() method and move set_ methods out of __construct() and into it
* Minimize touches to this->columns for future Schema/Structure work
* Remove assumptions that primary column must be an int that uses absint/intval - see #124
* Add some todo's for MySQL 8 improvements
* Improve support for CURRENT_TIMESTAMP in relevant columns
* Add get_columns_field_by() to retrieve a single field from a matching array of values to a single key - primarily used for getting an array of column patterns when querying, to return an array of formats for sprintf()
* Improve readability of do_action_ref_array() calls
* Clean-up get_item_ids()
* Rename parse_where() to parse_query_vars()
* Add parse_where() and parse_join() – likely get renamed in the future
* Use wp_parse_list() instead of wp_parse_id_list() - likely needs its own handler
* Prevent fatals from return values of get_search_sql()
* Abstract repeated code into new get_in_sql() method to escape/prepare/format IN (%s) SQL
* Introduce parse_query_var() and use it in place of repeated query_vars[] touches - attempts to internally parse comma separated strings (might remove)
* Introduce undocumented $column->by check to allow a column to not be queried directly by its name
* Refactor parse_query_vars() to improve its internal patterns, for future abstraction
* Fallback in parse_fields() and parse_groupby() to prevent fatal errors
* Introduce parse_single_orderby, parse_limits, parse_join, and parse_where - refactor parse_orderby
* Some minor clean-up to shape_items()
* Bail early in get_item_fields() to avoid trying to filter empty fields
* Introduce validate_item_field() to call $column->validate(), and use it inside shape_item_id() and more. This centralizes validation and ensures they always return the same results.
* Update add_item() and update_item() to skip database if $save fails validation
* Update copy_item() to shape the item ID, as it is not done inside of get_item_raw()
* Update delete_item() to match other item changes above
* All _item() functions use get_columns_field_by() to get patterns to send into wpdb queries for proper formatting (including delete_all_item_meta) - see #137
* Update validate_item() to use validate_item_field()
* Use shape_item_id() in update_item_cache() - also use is_scalar() in place of is_numeric() when making assumptions about the shape of the primary column
* Stop shaping the ID inside of get_non_cached_ids(), as item IDs are (or should be) previously shaped
* Gut get_results() and make it use the query() method - this needs more work
JJJ added a commit that referenced this issue Jun 27, 2022
* Column: improvements to $pattern
* Always a string (no false)
* Add link to PHP docs
* Update description to remove "string replace"
* Set UUID to %s
* Base: prevent double prefixing in apply_prefix()
* Column: correct inline doc
* Column: rename private references from pattern to format.
* Autoloader: minor code cleanup
* Column: various improvements:
* Tons of inline & block docs
* Smarter default class values
* Column::args are saved during parse_args() for later reuse
* Prefer get_object_vars() over another array of args
* Add "extra" support to special_args()
* Add is_ methods for some other types
* Add is_extra() method for comparing extra values
* Add sanitize_extra() for allowing specific values
* Improve fallback support in sanitize_pattern()
* Improve fallback support in sanitize_validation()
* Remove function_exists check for gmdate() from validate_datetime()
* Legitimize validate_numeric() and use where appropriate
* Improve get_create_string() with support for binary, more types, null, etc...
* Base: introduce sanitize_column_name()
* Allows upper-case letters in table and column names.
* Swaps out sanitize_key() usage for a custom preg_replace: '/^[a-zA-Z0-9_\-]+$/'
* Table: use Base::sanitize_column_name()
* Also fix return value inline comment in count()
* Column: introduce validate() and validate_int()
* validate() centralizes column value validation into the most logical location, and validate_int() allows for falling back to $default in a way that intval obviously could not.
* Base: update regex.
* Base: add stash_args() method
* Also avoid errors in apply_prefix() if not a string.
* Column: use stash_args()
* Also bail early if no arguments to parse.
* Schema: add support for indexes.
* Move filters into methods and their own section
* Improve docs, and add missing docs
* Default values for item_names to prevent fatals
* Add setup() method and move set_ methods out of __construct() and into it
* Minimize touches to this->columns for future Schema/Structure work
* Remove assumptions that primary column must be an int that uses absint/intval - see #124
* Add some todo's for MySQL 8 improvements
* Improve support for CURRENT_TIMESTAMP in relevant columns
* Add get_columns_field_by() to retrieve a single field from a matching array of values to a single key - primarily used for getting an array of column patterns when querying, to return an array of formats for sprintf()
* Improve readability of do_action_ref_array() calls
* Clean-up get_item_ids()
* Rename parse_where() to parse_query_vars()
* Add parse_where() and parse_join() – likely get renamed in the future
* Use wp_parse_list() instead of wp_parse_id_list() - likely needs its own handler
* Prevent fatals from return values of get_search_sql()
* Abstract repeated code into new get_in_sql() method to escape/prepare/format IN (%s) SQL
* Introduce parse_query_var() and use it in place of repeated query_vars[] touches - attempts to internally parse comma separated strings (might remove)
* Introduce undocumented $column->by check to allow a column to not be queried directly by its name
* Refactor parse_query_vars() to improve its internal patterns, for future abstraction
* Fallback in parse_fields() and parse_groupby() to prevent fatal errors
* Introduce parse_single_orderby, parse_limits, parse_join, and parse_where - refactor parse_orderby
* Some minor clean-up to shape_items()
* Bail early in get_item_fields() to avoid trying to filter empty fields
* Introduce validate_item_field() to call $column->validate(), and use it inside shape_item_id() and more. This centralizes validation and ensures they always return the same results.
* Update add_item() and update_item() to skip database if $save fails validation
* Update copy_item() to shape the item ID, as it is not done inside of get_item_raw()
* Update delete_item() to match other item changes above
* All _item() functions use get_columns_field_by() to get patterns to send into wpdb queries for proper formatting (including delete_all_item_meta) - see #137
* Update validate_item() to use validate_item_field()
* Use shape_item_id() in update_item_cache() - also use is_scalar() in place of is_numeric() when making assumptions about the shape of the primary column
* Stop shaping the ID inside of get_non_cached_ids(), as item IDs are (or should be) previously shaped
* Gut get_results() and make it use the query() method - this needs more work
* Query: Introduce filter_search_columns()
* Switch it to using apply_filters_ref_array() - minor back-compat break
* Add direct Schema support
* Get columns directly from Schema
* Deprecate $columns var
* Introduce set_query_clause_defaults() for allowing the query & request clauses to be updated easier
* Add keys to query & request clauses
* Refactor the way that counts & searches are parsed
* Always include columns when count & groupby are used together
* Pass query_vars into more parse_ methods to further abstract their usages for future un-privating
* Override query_vars in parse_query() when counting
* Introduce parse_count()
* Rename parse_where/join to _clauses() suffix
* Pass arguments into default_item(), and use array_combine()
* Swap some var orders in prime_item_caches()
* All: update @copyright and README
@JJJ JJJ added this to the 2.1.0 milestone Jun 27, 2022
@JJJ JJJ self-assigned this Jun 27, 2022
@JJJ
Copy link
Collaborator

JJJ commented Jun 27, 2022

PR #140 merged a few enhancements for 2.1.0:

  • Primary column no longer forced to be an int
  • Primary column of ID no longer forced to id (yay wp_users:ID)
  • Column names no longer forced to be lowercase (yay wp_comments:comment_author_IP)
  • Table names no longer forced to be lowercase (eh, fine)

See #130 for discussion and work towards improving delete_item() 👍

@JJJ JJJ closed this as completed Jun 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component-query The Query class workflow-needs-research Needs to be researched
Projects
None yet
Development

No branches or pull requests

4 participants