Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Categorical order in hierarchical axis not respected #6452

Open
1 task done
flying-sheep opened this issue Nov 15, 2024 · 3 comments
Open
1 task done

Categorical order in hierarchical axis not respected #6452

flying-sheep opened this issue Nov 15, 2024 · 3 comments
Labels
type: enhancement Minor feature or improvement to an existing feature

Comments

@flying-sheep
Copy link

flying-sheep commented Nov 15, 2024

ALL software version info

(this library, plus any other relevant software, e.g. bokeh, python, notebook, OS, browser, etc should be added within the dropdown below.)

Software Version Info
Using Python 3.12.7 environment at /home/phil/.local/share/hatch/env/virtual/dask millions of cells/cMqPOBoB/dask millions of cells
Package            Version
------------------ -----------
asttokens          2.4.1
bleach             6.2.0
bokeh              3.6.1
certifi            2024.8.30
charset-normalizer 3.4.0
colorcet           3.1.0
comm               0.2.2
contourpy          1.3.1
cycler             0.12.1
debugpy            1.8.8
decorator          5.1.1
executing          2.1.0
fonttools          4.55.0
holoviews          1.20.0
idna               3.10
ipykernel          6.29.5
ipython            8.29.0
ipywidgets         8.1.5
jedi               0.19.2
jinja2             3.1.4
jupyter-client     8.6.3
jupyter-core       5.7.2
jupyterlab-widgets 3.0.13
kiwisolver         1.4.7
linkify-it-py      2.0.3
markdown           3.7
markdown-it-py     3.0.0
markupsafe         3.0.2
matplotlib         3.9.2
matplotlib-inline  0.1.7
mdit-py-plugins    0.4.2
mdurl              0.1.2
mizani             0.13.0
nest-asyncio       1.6.0
numpy              2.1.3
packaging          24.2
pandas             2.2.3
panel              1.5.4
param              2.1.1
parso              0.8.4
patsy              1.0.1
pexpect            4.9.0
pillow             11.0.0
platformdirs       4.3.6
plotnine           0.14.1
prompt-toolkit     3.0.48
psutil             6.1.0
ptyprocess         0.7.0
pure-eval          0.2.3
pygments           2.18.0
pyparsing          3.2.0
python-dateutil    2.9.0.post0
pytz               2024.2
pyviz-comms        3.0.3
pyyaml             6.0.2
pyzmq              26.2.0
requests           2.32.3
scipy              1.14.1
six                1.16.0
stack-data         0.6.3
statsmodels        0.14.4
tornado            6.4.1
tqdm               4.67.0
traitlets          5.14.3
typing-extensions  4.12.2
tzdata             2024.2
uc-micro-py        1.0.3
urllib3            2.2.3
wcwidth            0.2.13
webencodings       0.5.1
widgetsnbextension 4.0.13
xyzservices        2024.9.0

Description of expected behavior and the observed behavior

When using multiple kvars, the category order is ignored:

Complete, minimal, self-contained example code that reproduces the issue

import pandas as pd
import holoviews as hv
hv.extension('bokeh')

cells_dtype = pd.CategoricalDtype(pd.array(["~1M", "~10M", "~100M"], dtype="string"), ordered=True)

df = pd.DataFrame(dict(
    cells=cells_dtype.categories.astype(cells_dtype),
    time=pd.array([2.99, 18.5, 835.2]),
    function=pd.array(["read", "read", "read"]),
))

hv.Bars(df, ["function", "cells"], ["time"])

Stack traceback and/or browser JavaScript console output

>>> df["cells"]
0      ~1M
1     ~10M
2    ~100M
Name: cells, dtype: category
Categories (3, string): [~1M < ~10M < ~100M]

Screenshots or screencasts of the bug in action

Image

  • I may be interested in making a pull request to address this
@hoxbro
Copy link
Member

hoxbro commented Nov 15, 2024

Sounds reasonable. I think the correct place to implement this is around here:

for group in grouped:
vals = group.dimension_values(ydim, False)
if len(vals) == 1:
orderings[vals[0]] = [vals[0]]
else:
for i in range(len(vals)-1):
p1, p2 = vals[i:i+2]
orderings[p1] = [p2]
if sort:
if vals.dtype.kind in ('i', 'f'):
sort = (np.diff(vals)>=0).all()
else:
sort = np.array_equal(np.sort(vals), vals)
if sort or one_to_one(orderings, ycoords):
ycoords = np.sort(ycoords)
elif not is_cyclic(orderings):
coords = list(itertools.chain(*sort_topologically(orderings)))
ycoords = coords if len(coords) == len(ycoords) else np.sort(ycoords)

@hoxbro hoxbro added the type: enhancement Minor feature or improvement to an existing feature label Nov 15, 2024
@flying-sheep
Copy link
Author

You don’t use typing so I’m going to ask for a bit more info before I try my hand at this.

Does dimension_values return a pandas array/series/index complete with ExtensionDtypes or a plain numpy array where all pandas information is destroyed?

Don’t get me wrong: If it’s the latter, I understand, it’s better to be independent of pandas to support alternatives like polars, but that’d mean that the information about categories and their order has to be passed down somewhere.

@philippjfr
Copy link
Member

You don’t use typing so I’m going to ask for a bit more info before I try my hand at this.

Really mostly an artifact of most of the code preceding the introduction of typing. Wish we'd change that, but that's obviously a huge lift.

Does dimension_values return a pandas array/series/index complete with ExtensionDtypes or a plain numpy array where all pandas information is destroyed?

We could use dataset.interface.dtype(dataset, dimension) to check if it's a category or extension dtype and then look up the category order on that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement Minor feature or improvement to an existing feature
Projects
None yet
Development

No branches or pull requests

3 participants