Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Pre-review for v1.3.0 release #31

Open
wants to merge 173 commits into
base: master
Choose a base branch
from

Conversation

ankostis
Copy link

@ankostis ankostis commented Oct 11, 2019

Continuation of #30, containing review-fixes in huyng#1.

  • ENH: heavily reinforced exception annotations ("jetsam"):
    • FIX: (8f3ec3a) outer graphs/ops do not override the inner cause.
    • ENH: retrofitted exception-annotations as a single dictionary, to print it in one shot
      (8f3ec3a & 8d0de1f)
    • enh: more data in a dictionary
  • REFACT(net): rename Delete-->Evict, removed Placeholder from nadanodes, privatize node-classes.
  • ENH(sideffects): make them always DIFFERENT from regular DATA, to allow to co-exist.
  • TCs: pytest now checks sphinx-site builds without any warnings.
  • FIX(plan): multithreading was broken due shared plan.executed: moved to new class Solution.
  • graphop in docs
  • typo(test): overriden-->overriDDen
  • DROP sideffects

were writing in text-mode in PY3. and failing as encoding error.
Tokens work as usual while solving the DAG but
they are never assigned any values to/from the operation's functions.

+ TC included.
+ Docs updated.\+ Added `modifiers` superclass to facilitate identification code.
+ refactored FunctionalOperation._compute().
+ _norm_kwargs:
  + use isinstance() instead of type-equality checks,
    to support new modier classes;
    + avoid excessive dict searches with local vars.
not really needed, better be explicit which modifier is searched.
receiving partial inputs, needed for other operations.
+ The x2 TCs added just before are now passing.
NOTE dict are not deterministic in <PY3.6.
So this commit would not improve determinism
in those pythons.

+ build: add `boltons` dependency for ndexedSet.
+ doc: mark all set usage if affect determinism.
+ e.g. see reproducibility problem in yahoo#14.
needed to refactor the new pruning algorithm.

- expected 2 TCs fail for yet-to-be-solved yahoo#24 & yahoo#25 bugs.
override intermediate data.

More changes only for newer pruning TCs:
+ refact(test): rename graph-->netop vars for results of compose(),
  to avoid of `graph.net.graph`.
+ Explain failure modes in v1.2.4 & this merged branch (yahoo#19 + yahoo#23).
not after compose().

+ All TCs pass ok.
+ NOTE this is not yet what is described in yahoo#21.
to pass +TC checking DeleteInst vary when inputs change.

- x4 TCs still failing, and need revamp of dag-solution.
+ Read the next doc-only commit to understand changes.
+ Renamed:
  + net.steps --> net.execution_plan.
  + (old)compile() --> _build_execution_plan()
  + _find_necessary_steps() --> (new)compile() + _solve_dag()
    compile() became the master function invoking _solve_dag &
    _build-execution_plan(), and do the caching.
+ refact(compute()): extract common tasks from sequential/parallel.
+ refact show_layers() to allow full-print, geting also string
  (not just printing), and using custom classes for representation.
+ Raise AssertionError when invalid class in plan.
  it's a logic error, not a language type-error.
Probaly unreported bug in v1.2.4 for '_neccessary_steps_cache`.
+ Pruning behaves correctly also when outputs given;
  this happens by breaking incoming provide-links
  to any given intermedediate inputs.
+ Unsatisfied detection now includes those without outputs
  due to broken links (above).
+ Remove some uneeded "glue" from unsatisfied-detection code,
  leftover from previous compile() refactoring.
+ Renamed satisfiable --> satisfied.
+ Improved unknown output requested raise-message.
+ x3 TCs PASS, x1 in yahoo#24 and the first x2 in yahoo#25.
- 1x TCs in yahoo#25 still FAIL, and need "Pinning" of given-inputs
  (the operation MUST and MUST NOT run in these cases).
numpy was used just for its assert_raise
+ FEAT(base): `jetsam()` machinery to colect from locals().
+ enh(net): jestam in compute().
+ enh: shorten annotation-names.
+ doc(net): explain why eviction cannot clan all solution
  (drop old todo).
+ refact(func): homogenize `resultS` in _compute() with other ops.
+ DROP(DOC): cloned debugging sections were
  not removed from composition.rst.
+ doc(debug): minot enhancements.
FAILs due to missing image in composition.rst.
... shortcut instead of demanding the usual singular lists.

+ change some sample calls in the docs.
+ ENH(net):add forgotten `outputs` jetsam earlier in compute().
+ refact: Had to rename JETSAMs to take adavntage of 1-1 mappngs.
+ refact: updated jetsam usage to reduce source-lines.
+ renamed jetsam(arg-names).
+ refact(net): use same loop-var in add_op(),
  to avoid copy-paste mistakes, like above.
@huyng
Copy link
Contributor

huyng commented Oct 14, 2019

I haven't merged the previous commits because they introduced that multithreading bug.

This PR is growing quite large and is combining several issues.

It would be better to break this up into several PRs:

  1. fixes for BUG: overriding intermediate data when no outputs asked #25 and BUG: pruning when output given unjustly drops ancestors #24 while not introducing the multithreading bug
  2. additional plotting & debugging niceties as long as it doesn't introduce too much complexity into the code or reduce performance of graph execution.

@ankostis
Copy link
Author

The fix for #24 & #25 is #26.
Please merge this asap.

@ankostis
Copy link
Author

ankostis commented Oct 14, 2019

I will try to fix the multithreading bug here, and backport to #26.

@huyng
Copy link
Contributor

huyng commented Oct 14, 2019

PR #26 is still causing the multithreading error from the following code. If I merge the code, it will break our master branch, which I'm trying to avoid here. We'll have to wait until the multithreading issue is resolved. I'd recommend that you work off of your PR#26's branch to resolve the multithreading issue before moving forward with other changes as it will make the review process easier to follow (since the PR deals primarily with the issues #24 and #25).

from multiprocessing.dummy import Pool
from operator import mul, sub
from graphkit import compose, operation

# Computes |a|^p.
def abspow(a, p):
    c = abs(a) ** p
    return c

# Compose the mul, sub, and abspow operations into a computation graph.
graph = compose(name="graph")(
    operation(name="mul1", needs=["a", "b"], provides=["ab"])(mul),
    operation(name="sub1", needs=["a", "ab"], provides=["a_minus_ab"])(sub),
    operation(name="abspow1", needs=["a_minus_ab"], provides=["abs_a_minus_ab_cubed"], params={"p": 3})(abspow)
)

pool = Pool(10)
graph.set_execution_method("parallel")
pool.map(lambda i: graph({'a': 2, 'b': 5}, ["a_minus_ab","abs_a_minus_ab_cubed"]), range(100))
~/Huyng/graphkit/graphkit/network.py in <lambda>(op)
    501
    502             done_iterator = pool.imap_unordered(
--> 503                                 lambda op: (op,op._compute(cache)),
    504                                 upnext)
    505             for op, result in done_iterator:

~/Huyng/graphkit/graphkit/functional.py in _compute(self, named_inputs, outputs)
     17
     18     def _compute(self, named_inputs, outputs=None):
---> 19         inputs = [named_inputs[d] for d in self.needs if not isinstance(d, optional)]
     20
     21         # Find any optional inputs in named_inputs.  Get only the ones that

~/Huyng/graphkit/graphkit/functional.py in <listcomp>(.0)
     17
     18     def _compute(self, named_inputs, outputs=None):
---> 19         inputs = [named_inputs[d] for d in self.needs if not isinstance(d, optional)]
     20
     21         # Find any optional inputs in named_inputs.  Get only the ones that

KeyError: 'a_minus_ab'

@ankostis
Copy link
Author

Exactly!

I will work on the tip though, to ensure my changes remain future-proof, and will backport them to 26.

bc subgraph was taken on plain string outputs.

+ minor upd err-msg terminology.
to allow the same name to be used as regular and sideffect data node.

+ ENH: sideffect-strclass now contains its class-name.
+ DOC: a lot, comply with docstrings 1-liners;
  changes also on optionals.
+ ENH(TCs): +x2 elaborate TCs, check SEs differ from data.
but cannot etablish call correctnes, relying on recetn TCs.
due to shared `Plan.executed`.

+ fix: enable forgotten check in overrides-TC.
@ankostis
Copy link
Author

Just pushed a fix here for the new "multithreading" TC.

And my mistake on #26, it never shared executed variable, so it passes ok with the new "multithreading" TC, you may merge it.

Note that plotting `executed` has not been undone,
it is not receiving ever any set now.
@ankostis
Copy link
Author

ankostis commented Jun 10, 2021

For reference i continued the implementation of thi PR in pygraphkit/graphtik project, which have departed since, as described in pygraphkit#1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants