Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: upgrade pgai extension to 0.7.0 #291

Merged
merged 1 commit into from
Jan 22, 2025
Merged

feat: upgrade pgai extension to 0.7.0 #291

merged 1 commit into from
Jan 22, 2025

Conversation

smoya
Copy link
Contributor

@smoya smoya commented Jan 20, 2025

No description provided.

@CLAassistant
Copy link

CLAassistant commented Jan 20, 2025

CLA assistant check
All committers have signed the CLA.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@smoya smoya force-pushed the feat/upgradePgai branch 7 times, most recently from 67622e8 to 892220a Compare January 20, 2025 15:52
Dockerfile Outdated
@@ -74,13 +74,15 @@ RUN set -ex; \
python3-dev \
apache-arrow-dev \
py3-pip; \
# Upgrade pip to 24.3.1 to match the version in the ha timescaledb-docker image.
python3 -m pip install --upgrade pip==24.3.1 --break-system-packages; \
Copy link
Contributor Author

@smoya smoya Jan 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without --break-system-packages, it complains about being under a managed python venv. Not sure if this is the really idiomatic way to go or rather I should activate the venv somehow from here.

@@ -5,7 +5,7 @@ ORG=timescaledev
PG_VER=pg16
PG_VER_NUMBER=$(shell echo $(PG_VER) | cut -c3-)
PG_MAJOR_VERSION=$(shell echo $(PG_VER_NUMBER) | cut -d. -f1)
ifeq ($(shell test $(PG_MAJOR_VERSION) -ge 17; echo $$?),0)
ifeq ($(shell test $(PG_MAJOR_VERSION) -ge 16; echo $$?),0)
Copy link
Contributor Author

@smoya smoya Jan 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the implications on this change? Without it, errors in pg16 build appear such as:

#16 53.42   In file included from /tmp/pip-install-ylxps7gq/pyarrow_996747a81b7645e4b8ae6232db0fd67b/pyarrow/src/arrow/python/arrow_to_pandas.cc:54:
#16 53.42   /tmp/pip-install-ylxps7gq/pyarrow_996747a81b7645e4b8ae6232db0fd67b/pyarrow/src/arrow/python/decimal.h:66:33: error: 'Decimal32' has not been declared

I've tried running the following test in order to understand why exactly this change suggested fixes the issue but didn't have any success yet:

  • Test 1: Upgrade only to alpine 3.21, keep same clang version.
    • Both Failing since clang19 is somehow required - Details.
  • Test 2. Keep same alpine version, upgrade clang version (to 18, since 19 is not available in alpine 3.20)
    • PG16: make: clang-15: No such file or directory . Details.
    • PG17: Error on apk add postgres: 1.128 postgresql17-plpython3 . Details.

I still don't understand why the compilation of pgvector asks for a clang version different than the actual specified in the Makefile 🤔. Is it Postgres who's expecting such a clang version? It seems so since the error message comes from executing a Makefile from Postgres source.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless we find a better alternative, upgrading to Alpine 3.21 seems like a plan. However, I believe we are not using such a version in pg16 because of a reason.

Perhaps @svenklemm can share more insights 🙏

Copy link
Member

@svenklemm svenklemm Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing alpine version will require some adjustments in timescaledb CI but I think upgrading alpine is lesser evil than not having to adjust our ci. We pinned the alpine version to make backwards compat checks easier. This was causing problems cause the different alpine versions have their ssl stuff in different packages. But latest timescaledb versions are not affected anyway as we changed the linking to no longer link directly against ssl but indirectly through postgres.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, and for future readers, the reason why the error about libarrow is gone when upgrading to Alpine v3.21 is because the libarrow package on such an Alpine version contains headers that are compatible with the required pgai pyarrow dep (requires 19.0.0), meanwhile those headers of the libarrow package installed in Alpine v3.20 aren't compatible.

Thanks @JamesGuthrie for pointing that out 🙏 💯

Copy link
Contributor Author

@smoya smoya Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@svenklemm are you ok if we set the upgrade of alpine version here in this PR (as it is now) or want me do create a separate PR for that instead and merge it before merging this one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's fine to do in this one

Dockerfile Outdated
git clone --branch ${PGAI_VERSION} https://github.com/timescale/pgai.git /build/pgai; \
cd /build/pgai; \
# note: this is a hack. pyarrow will be built from source, so must be pinned to this arrow version \
echo pyarrow==$(pkg-config --modversion arrow) >> ./projects/extension/requirements.txt; \
echo pyarrow==$(pkg-config --modversion arrow) >> ./projects/extension/requirements-lock.txt; \
Copy link
Contributor Author

@smoya smoya Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We moved from having a requirements.txt file to requirements-lock.txt file with locked dependencies (built by uv).
Even though without this change the install() from build.py will consider deps in requirements.txt to be additionally installed (See install_old_py_deps(), this change is necessary for consistency sake; If tomorrow we stop considering installing old dependencies from requirements.txt file, this will fail.

@smoya smoya force-pushed the feat/upgradePgai branch 3 times, most recently from 73b034b to 515201a Compare January 21, 2025 11:08
@smoya smoya force-pushed the feat/upgradePgai branch 2 times, most recently from f30d35d to a84c66a Compare January 21, 2025 17:14
Dockerfile Outdated
if [ "$TARGETARCH" == "386" ]; then \
# note: pinned because pandas 2.2.0-2.2.3 on i386 is affected by https://github.com/pandas-dev/pandas/issues/59905 \
echo pandas==2.1.4 >> ./projects/extension/requirements.txt; \
sed -i '/dependencies= \[/a \ "pandas==2.1.4",' projects/extension/pyproject.toml; \
Copy link
Contributor Author

@smoya smoya Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ended up adding the dependency to the pyproject.toml file as pip install is not considering requirements-lock.txt.

cc @JamesGuthrie @jgpruitt @Askir

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative option (which doesn't require hacking around in the pyproject.toml) is to create a contstraints.txt file and get pip to use that (by setting the PIP_CONSTRAINT env var):

echo "pandas==2.1.4" > constraints.txt
export PIP_CONSTRAINT=$(pwd)/constraints.txt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative option (which doesn't require hacking around in the pyproject.toml) is to create a contstraints.txt file and get pip to use that (by setting the PIP_CONSTRAINT env var):

echo "pandas==2.1.4" > constraints.txt
export PIP_CONSTRAINT=$(pwd)/constraints.txt

TIL! 🧠

I actually like your solution with constraints more than the hack because it exactly does what we need.
Changing it right now. Thanks for such a great suggestion @JamesGuthrie

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YIL (yesterday I learned!) I also like it more than what we did before.

git clone --branch ${PGAI_VERSION} https://github.com/timescale/pgai.git /build/pgai; \
cd /build/pgai; \
# note: this is a hack. pyarrow will be built from source, so must be pinned to this arrow version \
echo pyarrow==$(pkg-config --modversion arrow) >> ./projects/extension/requirements.txt; \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed anymore since we get the headers from the apache-arrow-dev package and install from source using them. Am I right or perhaps missing something @JamesGuthrie ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still needed. Adding this to the requirements ensures that we install the version of pyarrow which corresponds to the version of libarrow which is available on the current platform.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still needed. Adding this to the requirements ensures that we install the version of pyarrow which corresponds to the version of libarrow which is available on the current platform.

Good point 💯

Copy link
Contributor

@cevian cevian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with these changes but we should make pyarrow global very soon

@smoya smoya merged commit d7dbb90 into main Jan 22, 2025
6 checks passed
@smoya smoya deleted the feat/upgradePgai branch January 22, 2025 10:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants