Pyan3: Python 3 support

Technologicat · Nov 13, 2017 · 8646855 · 8646855
1 parent 26faced
commit 8646855
Show file tree

Hide file tree

Showing 10 changed files with 2,162 additions and 909 deletions.
diff --git a/README.md b/README.md
@@ -1,86 +1,107 @@
-pyan - Static Analysis of function and method call dependencies
-===============================================================
+# Pyan3: Offline call graph generator for Python 3
 
-`pyan` is a Python module that performs static analysis of Python code
-to determine a call dependency graph between functions and methods.
-This is different from running the code and seeing which functions are
-called and how often; there are various tools that will generate a call graph
-in that way, usually using debugger or profiling trace hooks - for example:
-https://pycallgraph.readthedocs.org/
+Generate approximate call graphs for Python programs.
 
-This code was originally written by Edmund Horner, and then modified by Juha Jeronen.
-See the notes at the end of this file for licensing info, the original blog posts,
-and links to their repositories.
+Pyan takes one or more Python source files, performs a (rather superficial) static analysis, and constructs a directed graph of the objects in the combined source, and how they define or use each other. The graph can be output for rendering, mainly by GraphViz.
 
-Command-line options
---------------------
+*And now it is available for Python 3!*
 
-*Output format* (one of these is required)
+[![Example output](graph0.png "Example: GraphViz rendering of Pyan output (click for .svg)")](graph0.svg)
 
-- `--dot` Output to GraphViz
-- `--tgf` Output in Trivial Graph Format
+**Defines** relations are drawn with *dotted gray arrows*.
 
-*GraphViz only options*
+**Uses** relations are drawn with *black solid arrows*.
 
-- Color nodes automatically (`-c` or `--colored`).
-  A HSL color model is used, picking the hue based on the top-level namespace (effectively, the module).
-  The colors start out light, and then darken for each level of nesting.
-  Seven different hues are available, cycled automatically.
-- Group nodes in the same namespace (`-g` or `--grouped`, `-e` or `--nested-groups`).
-  GraphViz clusters are used for this. The namespace name is used as the cluster label.
-  Groups can be created as standalone (`-g` or `--grouped`, always inside top-level graph)
-  or nested (`-e` or `--nested-groups`). The nested mode follows the namespace structure of the code.
+**Nodes** are always filled, and made translucent to clearly show any arrows passing underneath them. This is especially useful for large graphs with GraphViz's `fdp` filter. If colored output is not enabled, the fill is white.
 
-*Generation options*
+In **node coloring**, the [HSL](https://en.wikipedia.org/wiki/HSL_and_HSV) color model is used. The **hue** is determined by the *top-level namespace* the node is in. The **lightness** is determined by *depth of namespace nesting*, with darker meaning more deeply nested. Saturation is constant. The spacing between different hues depends on the number of files analyzed; better results are obtained for fewer files.
 
-- Disable generation of links for “defines” relationships (`-n` or `--no-defines`).
-  This can make the resulting graph look much clearer, when there are a lot of “uses” relationships.
-  This is especially useful for layout with `fdp`.
-  To enable (the default), use `-u` or `--defines`
-- Disable generation of links for “uses” relationships (`-N` or `--no-uses`).
-  Can be useful for visualizing just where functions are defined.
-  To enable (the default), use `-u` or `--uses`
+**Groups** are filled with translucent gray to avoid clashes with any node color.
 
-*General*
+The nodes can be **annotated** by *filename and source line number* information.
 
-- `-v` or `--verbose` for verbose output
-- `-h` or `--help` for help
+## Note
 
-Drawing Style
--------------
+The static analysis approach Pyan takes is different from running the code and seeing which functions are called and how often. There are various tools that will generate a call graph that way, usually using a debugger or profiling trace hooks, such as [Python Call Graph](https://pycallgraph.readthedocs.org/).
 
-The “defines” relations are drawn with gray arrows,
-so that it’s easier to visually tell them apart from the “uses” relations
-when there are a lot of edges of both types in the graph.
+In Pyan3, the analyzer was ported from `compiler` ([good riddance](https://stackoverflow.com/a/909172)) to a combination of `ast` and `symtable`, and slightly extended.
 
-Nodes are always filled (white if color disabled), and made translucent to clearly show arrows passing underneath them.
-This is useful for large graphs with the fdp filter.
 
-Original blog posts
--------------------
+# Usage
 
-- https://ejrh.wordpress.com/2011/12/23/call-graphs-in-python/
-- https://ejrh.wordpress.com/2012/01/31/call-graphs-in-python-part-2/
-- https://ejrh.wordpress.com/2012/08/18/coloured-call-graphs/
+See `pyan --help`.
 
+Example:
 
-Original source repositories
-----------------------------
+`pyan *.py --uses --no-defines --colored --grouped --annotated --dot >myuses.dot`
 
-- Edmund Horner's original code is now best found in his github repository at:
-  https://github.com/ejrh/ejrh/blob/master/utils/pyan.py.
-- Juha Jeronen's repository is at:
-  https://yousource.it.jyu.fi/jjrandom2/miniprojects/blobs/master/refactoring/
-- Daffyd Crosby has also made a repository with both versions, but with two files and no history:
-  https://github.com/dafyddcrosby/pyan
-- Since both original repositories have lots of other software,
-  I've made this clean version combining their contributions into my own repository just for pyan.
-  This contains commits filtered out of their original repositories, and reordered into a logical sequence:
-  https://github.com/davidfraser/pyan
+Then render using your favorite GraphViz filter, mainly `dot` or `fdp`:
 
-Licensing
----------
+`dot -Tsvg myuses.dot >myuses.svg`
 
-This code is made available under the GNU GPL, v2. See the LICENSE.md file,
-or consult https://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html for more information.
+#### Troubleshooting
+
+If GraphViz complains about *trouble in init_rank*, try adding `-Gnewrank=true`, as in:
+
+`dot -Gnewrank=true -Tsvg myuses.dot >myuses.svg`
+
+Usually either old or new rank works; this is a long-standing GraphViz issue with complex graphs.
+
+
+# Features
+
+*Items tagged with ☆ are new in Pyan3.*
+
+**Graph creation**:
+
+ - Nodes for functions and classes
+ - Edges for defines
+ - Edges for uses
+ - Grouping to represent defines, with or without nesting
+ - Coloring of nodes by top-level namespace
+   - Unlimited number of hues ☆
+
+**Analysis**:
+
+ - Name lookup across the given set of files
+ - Nested function definitions
+ - Nested class definitions ☆
+ - Assignment tracking with lexical scoping  
+   - E.g. if `self.a = MyFancyClass()`, the analyzer knows that any references to `self.a` point to `MyFancyClass`
+   - All binding forms are supported (assign, augassign, for, comprehensions, generator expressions) ☆
+ - Simple item-by-item tuple assignments like `x,y,z = a,b,c` ☆
+ - Chained assignments `a = b = c` ☆
+ - Local scope for lambda, listcomp, setcomp, dictcomp, genexpr ☆
+   - Keep in mind that list comprehensions gained a local scope (being treated like a function) only in Python 3. Thus, Pyan3, when applied to legacy Python 2 code, will give subtly wrong results if the code uses list comprehensions.
+ - Source filename and line number annotation ☆
+   - The annotation is appended to the node label. If grouping is off, namespace is included in the annotation. If grouping is on, only source filename and line number information is included, because the group title already shows the namespace.
+
+## TODO
+
+ - This version is currently missing the PRs from [David Fraser's repo](https://github.com/davidfraser/pyan).
+
+The analyzer **does not currently support**:
+
+ - Nested attribute accesses like `self.a.b` (will detect as reference to `*.b` of an unknown object `self.a`).
+ - Tuples/lists as first-class values (will ignore any assignment of a tuple/list to a single name).
+ - Starred assignment `a,*b,c = d,e,f,g,h` (will detect some item from the RHS).
+ - Additional unpacking generalizations ([PEP 448](https://www.python.org/dev/peps/pep-0448/), Python 3.5+).
+ - Type hints ([PEP 484](https://www.python.org/dev/peps/pep-0484/), Python 3.5+).
+ - Use of `self` is detected by the literal name `self`, not by capturing the name of the first argument of a method definition.
+ - Async definitions are detected, but passed through to the corresponding non-async analyzers; could be annotated.
+ - Cython; could strip or comment out Cython-specific code as a preprocess step, then treat as Python (will need to be careful to get line numbers right).
+
+# Authors
+
+Original [pyan.py](https://github.com/ejrh/ejrh/blob/master/utils/pyan.py) by Edmund Horner. [Original post with explanation](http://ejrh.wordpress.com/2012/01/31/call-graphs-in-python-part-2/).
+
+[Coloring and grouping](https://ejrh.wordpress.com/2012/08/18/coloured-call-graphs/) for GraphViz output by Juha Jeronen.
+
+[Git repository cleanup](https://github.com/davidfraser/pyan/) by David Fraser.
+
+This Python 3 port and refactoring to separate modules by Juha Jeronen.
+
+# License
+
+[GPL v2](LICENSE.md), as per [comments here](https://ejrh.wordpress.com/2012/08/18/coloured-call-graphs/).
 
diff --git a/graph0.png b/graph0.png