Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds Hook functionality #33

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

aneuway2
Copy link

  • hook Callable now added to flatten and unflatten
  • Fixed tests so that they point at local code instead of the installed package (which would be the last stable version)
  • Update readme for hook functionality
  • Update gitignore for PyCharm IDE

Andrew Bowman added 3 commits September 25, 2020 11:02
- `hook` Callable now added to flatten and unflatten
- Fixed tests so that they point at local code instead of the installed package (which would be the last stable version)
- Update readme for hook functionality
- Update gitignore for PyCharm IDE
@aneuway2
Copy link
Author

@ianlini lmk if theres anything you think should be changed on this or would need to be considered...otherwise this should be good to merge :-)

Copy link
Owner

@ianlini ianlini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution. Your idea is cool but the word "hook" is too ambiguous here.

Correct me if I am wrong. We originally support customized key generating method (reducer and splitter), but the values cannot be changed. Therefore, you are proposing this hook for changing the values.

I think this is not related to the flatten or unflatten process, but might be good to have for saving memory usage or computing time. We can do all the things we want in one dict creation process instead of having some intermediate dict and doing some post processing. If this is the purpose, and think there are some other ways that are more flexible and pythonic.

For example, we can make a generator Flatten.items():

{key: hook(key, value) for key, value in Flatten(d).items()}

I think this will make the code much more readable. I'm actually planning to make a Flatten class like this, so please tell me your suggestion.

README.rst Outdated Show resolved Hide resolved
README.rst Outdated Show resolved Hide resolved
@aneuway2
Copy link
Author

aneuway2 commented Oct 1, 2020

Thanks for your contribution. Your idea is cool but the word "hook" is too ambiguous here.

Correct me if I am wrong. We originally support customized key generating method (reducer and splitter), but the values cannot be changed. Therefore, you are proposing this hook for changing the values.

Right exactly. What are your thoughts on adding this to the splitter/reducer functions directly then @ianlini ?

def reducer(key, value):

I used hook as I was recently looking at httpx and they use a similar naming but im not tied to this and Im open to other suggestions if we leave as is.

I think this is not related to the flatten or unflatten process, but might be good to have for saving memory usage or computing time. We can do all the things we want in one dict creation process instead of having some intermediate dict and doing some post processing.

You've hit the nail on the head with saving memory usage. I'd prefer to do value processing as we do the packing and unpacking to save usage there.

If this is the purpose, and think there are some other ways that are more flexible and pythonic.

For example, we can make a generator Flatten.items():

{key: hook(key, value) for key, value in Flatten(d).items()}

I think this will make the code much more readable. I'm actually planning to make a Flatten class like this, so please tell me your suggestion.

I think the approach you've suggested above would add an additional level of processing, right? (since we'd create the items and then have to process them again with the hook)

Context on my current use case

My use case looks like this and is like vlookups (in excel) for api calls

  1. input flat dict
  2. if there are values that match some regex, look for an existing value in a separate "memory" dict and replace it with that value
  3. Run API calls with the unflattened dict
  4. Add memory values to the dict from API calls that come back
  5. Repeat steps 2-4 until there arent any values to look for any more

And I have a working class that looks something like this (simplified for clarity):

memory = {}
class JobProcessor:
    def __init__(self, flat_dict):
        self.completed_job = False
        self.has_lookup_value = False
        self.job = unflatten(flat_dict, splitter=self._splitter, hook=self.hook)
        self.info = self.job.get("dbg_info", {})

    def _splitter(self, flat_key):
        flat_key = re.sub(r"\[([0-9]+)\]", r".\g<1>", flat_key)  # dict[0] --> dict.0
        return flat_key.split(".")

    def _reducer(self, k1, k2):
        if k1 is None:
            return k2
        elif isinstance(k2, int) or k2.isdigit():
            return f"{k1}[{k2}]"
        else:
            return f"{k1}.{k2}"

    def hook(self, value, key):

        if not self.completed_job:
            matches_replaced = []
            rule = r"(some regex here)"
            search = re.findall(rule, value)
            for match in search:
                lookup_key = "(follows regex match groups/patterns which has the flat key in it)"
                value_in_memory = memory.get(lookup_key, {})
                if value_in_memory:
                    value = re.sub(rule, value_in_memory, value, count=1)
                    matches_replaced.append(True)
                else:
                    matches_replaced.append(False)
            if False in matches_replaced:
                self.has_lookup_value = True  # we skip this job until all values replaced (i.e not matching the regex)
        if self.completed_job:
            lookup_key = "(follows regex match groups/patterns that uses the key)"
            memory[lookup_key] = value

@ianlini
Copy link
Owner

ianlini commented Oct 4, 2020

I think the approach you've suggested above would add an additional level of processing, right? (since we'd create the items and then have to process them again with the hook)

In your code, you still need to create the key and value. Yielding items should be very fast, so I think it won't be the bottleneck in most cases.

@ianlini
Copy link
Owner

ianlini commented Oct 4, 2020

Right exactly. What are your thoughts on adding this to the splitter/reducer functions directly then @ianlini ?
def reducer(key, value):
I used hook as I was recently looking at httpx and they use a similar naming but im not tied to this and Im open to other suggestions if we leave as is.

Combining with splitter/reducer will be more complicated. Most of the time, people don't need this function, so they don't want to return the identical value. It's hard to make one of the return value optional.

It's also hard to explain how hook can be used because it is not abstract enough. People need to know a lot of implementation details to write the code correctly. For example, if I have a dict

d = {"a": {"b": {"c": "value}}}

People need to know that hook will be called like this:

hook(("c",), "value")
hook(("b", "c"), "value")
hook(("a", "b", "c"), "value")

If I want to hook something when the key is ("b", "c") with at least one preceding keys, I will write a hook function that will do something when the key is ("b", "c") with exactly one preceding key.

Other design issues:

  1. It's non-necessary to call hook so many times
  2. It's not intuitive to define the keys that we should do something at. I would rather use something like jsonpath to do this.
  3. If someday I change my implementation of flatten to reduce ("a", "b") first, and then ("a", "b", "c"), it will affect the definition of hook.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants