Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: deepcopy variables in clone of tax_benefit_system #1319

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

sylvainipp
Copy link

Technical changes

  • Change the copy of the variables in tax_benefit_system.clone() to a deepcopy

@sylvainipp
Copy link
Author

I think this is mainly used in survey-manager, @clallemand do you think it is appropriate ? I need it when creating a reform based on inflate_parameters (an openfisca-survey-manager function), otherwise I have a problem with variables losing their end_date.

@benjello
Copy link
Member

benjello commented Feb 7, 2025

A simple copy creates a new object but references the same memory addresses for nested objects. Changes to nested objects in the copy will affect the original.

Maybe you changed the variable on the original (or the copy) and got side effects.

But I agree, I cannot see situation where we want attributes changes transmitted between the copy and the original of a tbs.

@clallemand
Copy link
Contributor

I have no objection !

@sylvainipp
Copy link
Author

Thanks for your feedback ! The idea is that it is still possible to make the change in variables before cloning, and after it is possible to change only one of them. I don't think this transmission of changes is intentionally used by anyone, so I think it can be a minor version change even if it is technically breaking.

setup.py Outdated
@@ -70,7 +70,7 @@

setup(
name="OpenFisca-Core",
version="43.2.7",
version="43.2.8",
Copy link
Member

@benjello benjello Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a minor change not a bugfix (semver).

@sylvainipp
Copy link
Author

Hi @bonjourmauko, there are some errors in the tests due to default value of arrays, do you think it is linked to your recent changes ? I can't figure why it appears here and not in the previous PR (maybe new dependencies ?), but as you worked recently on it I wondered if you had an easy fix.

@bonjourmauko
Copy link
Member

Hi @sylvainipp, my first impression is that there is a casting problem only afecting conda. As testing in conda was recently introduced, it may well be the case that the error is pre-existing.

@@ -117,7 +117,7 @@ def _int_to_index(

"""
indices = enum_class.indices
values = numpy.array(value, copy=False)
values = numpy.asarray(value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need this here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has no link with the original modification, it was to fix one of the errors in the conda test (here : https://github.com/openfisca/openfisca-core/actions/runs/13202963909/job/36859430121)

@@ -171,7 +171,7 @@ def _str_to_index(
array([1, 1], dtype=uint8)

"""
values = numpy.array(value, copy=False)
values = numpy.asarray(value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need this here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idem

@bonjourmauko
Copy link
Member

bonjourmauko commented Feb 13, 2025

The changes look ok. The copy=false is to reuse the memoryview when possible, to save on ram. The best way to test if copying has a noticeable impact on performance, is to run a simulation with a huge dataset with both.

(Be aware that there is no right answer, and it depends a lot on the specs of the machine runing it.)

For the test, I can see that it is wrong, because we should not mix DTypes within an array. This part looks like a hack:

choicelist = [array('first_parent', dtype='<U12'), array('second_parent', dtype='<U13'), array('child', dtype='<U5'), 0]
condlist   = [array([ True, False]), array([False,  True]), array([False, False])]

Have you tried replacing 0 with an str default value? Or with an array or either intxx or str?

I bet there is a platform-level (low level) casting done differently within conda, or something of the sort.

Or! Another option that I can think of, is that deepcopy is bypassing de/indexation of EnumArray, and that for that reason you're ending up with a mixed DType array.

In either case, we should never have as a result a mixed DType array.

Let me know how it goes :)

@sylvainipp
Copy link
Author

Thank you for the feedback ! The code lines definitely look like the source of the error that concerns the absence of common dtype, my problem is that I can't find those lines, even with a search in vscode. Could you please tell me where it is written so I can try to fix it ?

@bonjourmauko
Copy link
Member

tests/core/parameters_fancy_indexing/test_fancy_indexing.py:177
tests/core/test_dump_restore.py:19

@bonjourmauko
Copy link
Member

Looking at the message given by NumPy, it looks like we're on the right track about the cause:

TypeError: Choicelist and default value do not have a common dtype: The DType <class 'numpy.dtypes._PyLongDType'> could not be promoted by <class 'numpy.dtypes.StrDType'>. This means that no common DType exists for the given inputs. For example they cannot be stored in a single array unless the dtype is object. The full list of DTypes is: (<class 'numpy.dtypes.StrDType'>, <class 'numpy.dtypes.StrDType'>, <class 'numpy.dtypes.StrDType'>, <class 'numpy.dtypes._PyLongDType'>)

@sylvainipp
Copy link
Author

Thanks ! I don't understand how it is possible but I see other lines at those place :
tests/core/parameters_fancy_indexing/test_fancy_indexing.py:177 : assert_near(P.single.owner[zone], [100, 200, 200, 100])
tests/core/test_dump_restore.py:19 simulation_dumper.dump_simulation(simulation, directory)
I'm sorry if I miss something obvious but I search this 0 that is not of the good dtype and I don't manage to find it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants