-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: upgrade pgai extension to 0.7.0 #291
Conversation
|
67622e8
to
892220a
Compare
Dockerfile
Outdated
@@ -74,13 +74,15 @@ RUN set -ex; \ | |||
python3-dev \ | |||
apache-arrow-dev \ | |||
py3-pip; \ | |||
# Upgrade pip to 24.3.1 to match the version in the ha timescaledb-docker image. | |||
python3 -m pip install --upgrade pip==24.3.1 --break-system-packages; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without --break-system-packages
, it complains about being under a managed python venv. Not sure if this is the really idiomatic way to go or rather I should activate the venv somehow from here.
@@ -5,7 +5,7 @@ ORG=timescaledev | |||
PG_VER=pg16 | |||
PG_VER_NUMBER=$(shell echo $(PG_VER) | cut -c3-) | |||
PG_MAJOR_VERSION=$(shell echo $(PG_VER_NUMBER) | cut -d. -f1) | |||
ifeq ($(shell test $(PG_MAJOR_VERSION) -ge 17; echo $$?),0) | |||
ifeq ($(shell test $(PG_MAJOR_VERSION) -ge 16; echo $$?),0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the implications on this change? Without it, errors in pg16 build appear such as:
#16 53.42 In file included from /tmp/pip-install-ylxps7gq/pyarrow_996747a81b7645e4b8ae6232db0fd67b/pyarrow/src/arrow/python/arrow_to_pandas.cc:54:
#16 53.42 /tmp/pip-install-ylxps7gq/pyarrow_996747a81b7645e4b8ae6232db0fd67b/pyarrow/src/arrow/python/decimal.h:66:33: error: 'Decimal32' has not been declared
I've tried running the following test in order to understand why exactly this change suggested fixes the issue but didn't have any success yet:
- Test 1: Upgrade only to alpine 3.21, keep same clang version.
- Both Failing since clang19 is somehow required - Details.
- Test 2. Keep same alpine version, upgrade clang version (to 18, since 19 is not available in alpine 3.20)
I still don't understand why the compilation of pgvector asks for a clang version different than the actual specified in the Makefile 🤔. Is it Postgres who's expecting such a clang version? It seems so since the error message comes from executing a Makefile from Postgres source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless we find a better alternative, upgrading to Alpine 3.21 seems like a plan. However, I believe we are not using such a version in pg16 because of a reason.
Perhaps @svenklemm can share more insights 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing alpine version will require some adjustments in timescaledb CI but I think upgrading alpine is lesser evil than not having to adjust our ci. We pinned the alpine version to make backwards compat checks easier. This was causing problems cause the different alpine versions have their ssl stuff in different packages. But latest timescaledb versions are not affected anyway as we changed the linking to no longer link directly against ssl but indirectly through postgres.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the record, and for future readers, the reason why the error about libarrow is gone when upgrading to Alpine v3.21 is because the libarrow package on such an Alpine version contains headers that are compatible with the required pgai pyarrow dep (requires 19.0.0), meanwhile those headers of the libarrow package installed in Alpine v3.20 aren't compatible.
Thanks @JamesGuthrie for pointing that out 🙏 💯
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@svenklemm are you ok if we set the upgrade of alpine version here in this PR (as it is now) or want me do create a separate PR for that instead and merge it before merging this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's fine to do in this one
Dockerfile
Outdated
git clone --branch ${PGAI_VERSION} https://github.com/timescale/pgai.git /build/pgai; \ | ||
cd /build/pgai; \ | ||
# note: this is a hack. pyarrow will be built from source, so must be pinned to this arrow version \ | ||
echo pyarrow==$(pkg-config --modversion arrow) >> ./projects/extension/requirements.txt; \ | ||
echo pyarrow==$(pkg-config --modversion arrow) >> ./projects/extension/requirements-lock.txt; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We moved from having a requirements.txt
file to requirements-lock.txt
file with locked dependencies (built by uv
).
Even though without this change the install()
from build.py
will consider deps in requirements.txt
to be additionally installed (See install_old_py_deps(), this change is necessary for consistency sake; If tomorrow we stop considering installing old dependencies from requirements.txt
file, this will fail.
73b034b
to
515201a
Compare
f30d35d
to
a84c66a
Compare
Dockerfile
Outdated
if [ "$TARGETARCH" == "386" ]; then \ | ||
# note: pinned because pandas 2.2.0-2.2.3 on i386 is affected by https://github.com/pandas-dev/pandas/issues/59905 \ | ||
echo pandas==2.1.4 >> ./projects/extension/requirements.txt; \ | ||
sed -i '/dependencies= \[/a \ "pandas==2.1.4",' projects/extension/pyproject.toml; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ended up adding the dependency to the pyproject.toml
file as pip install
is not considering requirements-lock.txt
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative option (which doesn't require hacking around in the pyproject.toml) is to create a contstraints.txt
file and get pip to use that (by setting the PIP_CONSTRAINT
env var):
echo "pandas==2.1.4" > constraints.txt
export PIP_CONSTRAINT=$(pwd)/constraints.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative option (which doesn't require hacking around in the pyproject.toml) is to create a
contstraints.txt
file and get pip to use that (by setting thePIP_CONSTRAINT
env var):echo "pandas==2.1.4" > constraints.txt export PIP_CONSTRAINT=$(pwd)/constraints.txt
TIL! 🧠
I actually like your solution with constraints more than the hack because it exactly does what we need.
Changing it right now. Thanks for such a great suggestion @JamesGuthrie
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
YIL (yesterday I learned!) I also like it more than what we did before.
a84c66a
to
c558cf5
Compare
git clone --branch ${PGAI_VERSION} https://github.com/timescale/pgai.git /build/pgai; \ | ||
cd /build/pgai; \ | ||
# note: this is a hack. pyarrow will be built from source, so must be pinned to this arrow version \ | ||
echo pyarrow==$(pkg-config --modversion arrow) >> ./projects/extension/requirements.txt; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not needed anymore since we get the headers from the apache-arrow-dev
package and install from source using them. Am I right or perhaps missing something @JamesGuthrie ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still needed. Adding this to the requirements ensures that we install the version of pyarrow which corresponds to the version of libarrow which is available on the current platform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still needed. Adding this to the requirements ensures that we install the version of pyarrow which corresponds to the version of libarrow which is available on the current platform.
Good point 💯
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with these changes but we should make pyarrow global very soon
c558cf5
to
1b86dad
Compare
Signed-off-by: Sergio Moya <[email protected]>
1b86dad
to
d5ce28e
Compare
No description provided.