Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: No warning in set_index() that previous index column is removed. #60973

Open
1 task done
ncotie opened this issue Feb 20, 2025 · 3 comments · May be fixed by #60990
Open
1 task done

DOC: No warning in set_index() that previous index column is removed. #60973

ncotie opened this issue Feb 20, 2025 · 3 comments · May be fixed by #60990
Assignees
Labels
Docs Indexing Related to indexing on series/frames, not to indexes themselves Needs Discussion Requires discussion from core team before further action

Comments

@ncotie
Copy link

ncotie commented Feb 20, 2025

Pandas version checks

  • I have checked that the issue still exists on the latest versions of the docs on main here

Location of the documentation

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.set_index.html

Documentation problem

set_index(), when applied to a DataFrame which already has a data column (non-default) assigned as index, will delete this data column from the DataFrame when assigning another data column to be the index.

While I find this behaviour inappropriate, I understand that reset_index() should be used before set_index(), in which case the original index column may be preserved.

The problem is that the documentation for set_index() does not mention this at all, so the user is left to discover the problem and then the way to avoid it.

Suggested fix for documentation

Add a comment in the set_index documentation to clarify that setting a data column as index, when there is already a different data column serving as index, will delete that data column, unless reset_index is performed first.

@ncotie ncotie added Docs Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 20, 2025
@SaraInCode
Copy link

take

@SaraInCode SaraInCode linked a pull request Feb 23, 2025 that will close this issue
4 tasks
@rhshadrach
Copy link
Member

Thanks for the report!

The problem is that the documentation for set_index() does not mention this at all, so the user is left to discover the problem and then the way to avoid it.

The documentation states:

The index can replace the existing index or expand on it.

Doesn't this count as a mention?

@rhshadrach rhshadrach added Indexing Related to indexing on series/frames, not to indexes themselves Needs Discussion Requires discussion from core team before further action and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 25, 2025
@ncotie
Copy link
Author

ncotie commented Feb 25, 2025

Well, in my reading, 'replacing the existing index' means substituting it in its function as index, but isn't the same as 'causes the existing index data to be deleted entirely'.

What I'm getting at is that I (and I suppose I'm not alone in this) wouldn't see it as normal that changing the role of index from one data column to another would delete data. Taken to the extreme, one could delete the DataFrame entirely by changing the index column from one to the next, until there were none left.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Indexing Related to indexing on series/frames, not to indexes themselves Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants