Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: series.reindex(mi) behaves different for series with Index and MultiIndex #60923

Open
3 tasks done
ssche opened this issue Feb 13, 2025 · 3 comments
Open
3 tasks done
Labels
Bug Index Related to the Index class or subclasses MultiIndex Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@ssche
Copy link
Contributor

ssche commented Feb 13, 2025

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

  • Create a series with Index and a MultiIndex to use for reindexing later
>>> series = pd.Series(
...   [26.7300, 24.2550],
...   index=pd.Index([81, 82], name='a')
... )
>>> series
a
81    26.730
82    24.255
dtype: float64
>>> series.index
Index([81, 82], dtype='int64', name='a')
>>> other_index = pd.MultiIndex(
...   levels=[
...     pd.Index([81, 82], name='a'),
...     pd.Index([np.nan], name='b'),
...     pd.Index([
...       '2018-06-01', '2018-07-01'
...     ], name='c')
...   ],
...   codes=[
...     [0, 0, 1, 1],
...     [0, 0, 0, 0],
...     [0, 1, 0, 1]
...   ],
...   names=['a', 'b', 'c']
... )
>>> other_index
MultiIndex([(81, nan, '2018-06-01'),
            (81, nan, '2018-07-01'),
            (82, nan, '2018-06-01'),
            (82, nan, '2018-07-01')],
           names=['a', 'b', 'c'])
  • reindex to MultiIndex (other_index) which expands series.index by two more levels.
  • unfortunately the reindex sets all values of the original series to NaN which can be fixed by turning series.index into a 1-level MultiIndex first
>>> series.reindex(other_index) # this removes all values of the series
a   b    c         
81  NaN  2018-06-01   NaN
         2018-07-01   NaN
82  NaN  2018-06-01   NaN
         2018-07-01   NaN
dtype: float64
  • apply to_mi(...) to turn the series.index into a 1-level MultiIndex
  • rerun reindex on the new series with MultiIndex and the values are maintained/filled as expected
>>> def to_mi(series):
...   if isinstance(series.index, pd.MultiIndex):
...     series_mi = series.index
...   else:
...     level_names = [series.index.name]
...     level_values = [series.index]
...     series_mi = pd.MultiIndex.from_arrays(level_values, names=level_names)
...   series_with_mi = pd.Series(series.values, index=series_mi, name=series.name)
...   return series_with_mi
... 
>>> series_mi = to_mi(series)
>>> series_mi
a 
81    26.730
82    24.255
dtype: float64
>>> series_mi.index
MultiIndex([(81,),
            (82,)],
           names=['a'])
>>> series_mi.reindex(other_index)
a   b    c         
81  NaN  2018-06-01    26.730
         2018-07-01    26.730
82  NaN  2018-06-01    24.255
         2018-07-01    24.255
dtype: float64

Issue Description

In the above case, series.reindex(multi_index) will turn the series values to NaN when the series has a single Index. However when the series index is converted to a 1-level MultiIndex prior to the reindex, the values are maintained and filled as expected.

In my opinion it shouldn't matter if a 1-level MultiIndex or an Index is used for a reindex - the outcomes should be the same.

As a further discussion point (here or elsewhere), this issue (and others) also begs the question why a distinction between Index and MultiIndex is necessary (I suspect there are historic reasons). I would imagine that many issues (and code) would go away if MultiIndex was used exclusively (even for 1-dimensional indices).

Expected Behavior

The missing levels in series_mi (compared to other_index) are added and the values of the partial index from the original series are used to fill the places of the added indices.

>>> series_mi.reindex(other_index)
a   b    c         
81  NaN  2018-06-01    26.730 # from index <81> of `series` (`series_mi`)
         2018-07-01    26.730 # from index <81> of `series` (`series_mi`)
82  NaN  2018-06-01    24.255 # from index <82> of `series` (`series_mi`)
         2018-07-01    24.255 # from index <82> of `series` (`series_mi`)
dtype: float64

Installed Versions

INSTALLED VERSIONS

commit : 3979e95
python : 3.11.11
python-bits : 64
OS : Linux
OS-release : 6.12.11-200.fc41.x86_64
Version : #1 SMP PREEMPT_DYNAMIC Fri Jan 24 04:59:58 UTC 2025
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : en_AU.UTF-8
LOCALE : en_AU.UTF-8

pandas : 3.0.0.dev0+1909.g3979e954a3.dirty
numpy : 1.26.4
dateutil : 2.9.0.post0
pip : 24.2
Cython : 3.0.11
sphinx : 8.1.3
IPython : 8.32.0
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.13.3
blosc : None
bottleneck : 1.4.2
fastparquet : 2024.11.0
fsspec : 2025.2.0
html5lib : 1.1
hypothesis : 6.125.2
gcsfs : 2025.2.0
jinja2 : 3.1.5
lxml.etree : 5.3.0
matplotlib : 3.10.0
numba : 0.61.0
numexpr : 2.10.2
odfpy : None
openpyxl : 3.1.5
psycopg2 : 2.9.10
pymysql : 1.4.6
pyarrow : 19.0.0
pyreadstat : 1.2.8
pytest : 8.3.4
python-calamine : None
pytz : 2025.1
pyxlsb : 1.0.10
s3fs : 2025.2.0
scipy : 1.15.1
sqlalchemy : 2.0.38
tables : 3.10.2
tabulate : 0.9.0
xarray : 2024.9.0
xlrd : 2.0.1
xlsxwriter : 3.2.2
zstandard : 0.23.0
tzdata : 2025.1
qtpy : None
pyqt5 : None

@ssche ssche added Bug Index Related to the Index class or subclasses MultiIndex Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 13, 2025
@rhshadrach
Copy link
Member

Thanks for the report! Is it possible to shorten your example? Making it as short as possible to demonstrate the behavior would certainly be appreciated.

@ssche
Copy link
Contributor Author

ssche commented Feb 13, 2025

Thanks for the feedback. I managed to shorten the example and provided additional comments.

@micheleuap
Copy link

micheleuap commented Feb 17, 2025

Adding to this because I've seen this situation before.

This is about the fact when calling reindex, it makes a difference if the calling frame has a regular index, or an "identical" single-level multi-index.

series = pd.Series([1, 2], index=pd.Index(["x", "y"], name="lvl1"))

# the same series as above, with the index converted to multi-index
series_mi = series.copy()
series_mi.index = pd.MultiIndex.from_frame(series.index.to_frame())

# a multi-index with a level not present in the index of the series
idx = pd.MultiIndex.from_product([["x", "y"], ["m", "n"]], names=["lvl1", "lvl2"])

# this returns nans
series.reindex(idx)

# this returns a series with values [1,1,2,2]
series_mi.reindex(idx)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Index Related to the Index class or subclasses MultiIndex Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

3 participants