Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generates algorithmic environment from algpseudocode #152

Merged
merged 14 commits into from
Dec 10, 2022

Conversation

ZibingZhang
Copy link
Contributor

@ZibingZhang ZibingZhang commented Dec 6, 2022

Overview

Generates algorithmic environment from algpseudocode.

Details

Creates a new class algorithm_codegen that should generate $\LaTeX$ code in the following form:

Example

    def collatz(n):
        iterations = 0
        while n > 1:
            if n % 2 == 0:
                n = n / 2
            else:
                n = 3 * n + 1
            iterations = iterations + 1
        return iterations

image

\begin{algorithmic} ... \end{algorithmic}

Things to take into consideration:

  • The configs won't be the same for this and function_codegen, at least for the time being. Namely use_signature will be disregarded if set.
  • For the time being, this functionality is exposed through get_latex(style=Style.ALGORITHM, but there should be a way to use it as @latexify.algorithm. But what should the return type be? str? LatexifiedFunction?
  • There are multiple algorithmic environments, how should be specify which ones to target? Which ones should be supported?

References

Described by this issue: #57

Blocked by

None

@ZibingZhang
Copy link
Contributor Author

@odashi let me know what you think about this approach. Looking for a bit of early feedback before continuing.

@odashi
Copy link
Collaborator

odashi commented Dec 7, 2022

@ZibingZhang Could you open the pull request instead of leaving it as a draft? I'm using PR's status to manage my taskboard and sometimes draft PRs are overlooked.

@odashi
Copy link
Collaborator

odashi commented Dec 7, 2022

Thanks! I will take a look at the code later today. Some comments about the description:

else n % 2 == 1:

This may be a typo.

@latexify.algorithm

If we provide this function, it must provide a way to correctly print the expression onto Jupyter. Since algorithm environmets are not supported by Jupyter (MathJax), we need to simulate it by ourselves.

This means that the algorithm version of "LatexifiedFunction" needs two codegens for __str__ and _repr_latex_, but this is far beyond the scope of this PR IMO.

Which environment

algorithmic is fine I think. It is okay to start considering to support other envs when they are actually requested.

At this point, I guessed the enum value should be ALGORITHMIC rather than ALGORITHM to avoid confusion in the future.

@ZibingZhang
Copy link
Contributor Author

ZibingZhang commented Dec 7, 2022

used internally only when we need to delegate the codegen logic to it

I'm not sure if this is possible, due to the recursive nature of ASTs. If you look at the first commit I had something like

def visit_SomeNode(node):
    return self._function_codegen.visit(node)

But something of this nature won't work since recursive calls will end up in the FunctionCodegen methods instead of back in the AlgorithmCodegen methods.

I think to avoid the weird is, relationship, they both need to inherit from some private base class which implements the codegen of basic expressions. I'm not sure if we can use composition here, at least I'm not able to think of a way to get it working.

@ZibingZhang ZibingZhang marked this pull request as ready for review December 7, 2022 06:33
@odashi
Copy link
Collaborator

odashi commented Dec 7, 2022

I'm not sure if this is possible

It should work as far as the subtree processing is completely delegated. Since both NodeVisitor and our FunctionCodegen are stateless and don't change the given AST, we can invoke visit as a usual function.

Unfortunately there is no private inheritance in Python, and many techniques available in other languages (such as C++) can't be used.

@ZibingZhang
Copy link
Contributor Author

ZibingZhang commented Dec 7, 2022

It should work as far as the subtree processing is completely delegated. Since both NodeVisitor and our FunctionCodegen are stateless and don't change the given AST, we can invoke visit as a usual function.

Consider visit_Attribute:

class ACodegen:
    def visit_Attribute(self, node: ast.Attribute) -> str:
        vstr = self.visit(node.value)
        astr = self._identifier_converter.convert(node.attr)[0]
        return vstr + "." + astr
    
    def visit_If(self, node: ast.If) -> str:
        raise Error("Unsupported")

class BCodegen:
    def visit_Attribute(self, node: ast.Attribute) -> str:
        return self._a_codegen.visit(node)

    def visit_If(self, node: ast.If) -> str:
        return "..."

If we have some structure like:

{ "attribute": { "value": { "if": "..." } }

The call stack would look like

BCodegen.visit_Attribute 
-> ACodegen.visit_Attribute
-> ACodegen.visit_If
-> Error

whereas the desired behavior would be

BCodegen.visit_Attribute 
-> ACodegen.visit_Attribute
-> BCodegen.visit_If

Basically any visit_Node function with a self.visit wouldn't work properly when delegated to because self refers to the delegated class.

edit: I guess this could be solved like so?

class BCodegen:
    def visit_Attribute(self, node: ast.Attribute) -> str:
        return ACodegen.visit_Attribute(self, node)

but at that point you may as well just use inheritance, since you're relying on ACodegen and BCodegen having the same type, and instance / class variables.

@odashi
Copy link
Collaborator

odashi commented Dec 7, 2022

Before continuing discussion, we need to understand how the Python AST is constructed. Overall, all ASTs look like as follows:

[mod subtree] -> [stmt subtree] -> [expr subtree]

where stmt subtree never generate mod and expr never generate stmt. (the official guidance)

In FunctionCodegen and AlgorithmCodegen, we need to handle stmt and expr to generate the final LaTeX, but once the codegen walks into expr, the subroutine processes only expr and it never meet any stmt branch in this recursion. That means we can completely separate expr processing into another library, and this is why I proposed ExpressionCodegen in #57.

In this PR, AlgorithmCodegen need to implement every visitor for stmt nodes by itself, but can delegate every processing for expr to another. Since the current codebase implements every rule in FunctionCodegen, we can simply delegate everything to it, and we don't need additional information to invoke expr visitors in FunctionCodegen.

@odashi
Copy link
Collaborator

odashi commented Dec 7, 2022

The contrast between stmt and expr is important I guess, because codegen rules to control the overall style (the appearance of the overall algorithm) is basically implemented by only stmt rules.

@ZibingZhang
Copy link
Contributor Author

That makes sense ty for explaining!

The only issue I can maybe see down the line if AlgorithmCodegen decides that an expression should take one form but FunctionCodegen decides it should take another, but I can't think of any examples off the top of my head, so maybe it's a non-issue.

@ZibingZhang
Copy link
Contributor Author

visit_comprehension has to be in ExpressionCodegen for some reason (didn't really look into it), but it's an ast.stmt.

Do you think it's possible to merge this early somehow (i.e. without completing AlgorithmicCodegen). Either by

  1. merging as is (following review & edits) and following up with a PR to expand upon the algorithmic codegen while adding tests
  2. splitting this PR into 2, first part just factoring out ExpressionCodegen, next part the AlgorithmicCodegen implementation

Just worried about trying to deal with future merge conflicts, since this change is changing a fundamental part of the pipeline, and is touching a bunch of files.

@ZibingZhang ZibingZhang requested a review from odashi December 7, 2022 15:21
@odashi
Copy link
Collaborator

odashi commented Dec 8, 2022

but it's an ast.stmt

I think it's AST, not stmt. This is because the class is used to aggregate a set of rules (in this case, a sub-expression of a comprehension in some other expr), and appears only within a limited context. This kind of subtree can't be treated as an expr in the AST, e.g. a + <comprehension> is illegal.

We should be able to treat this kind of subtrees safely as a part of either stmt or expr because they shouldn't violate the entire rule of the AST: stmt -> expr. In this case it is okay to assume that this is expr.

@odashi
Copy link
Collaborator

odashi commented Dec 8, 2022

ExpressionCodegen should be separated into other pull request since this is an independent feature.

@ZibingZhang
Copy link
Contributor Author

ExpressionCodegen should be separated into other pull request since this is an independent feature.

Sounds good. Will tackle this within the next few days!

@odashi
Copy link
Collaborator

odashi commented Dec 8, 2022

Thanks! It looks your repository breaks the blame history when splitting FunctionCodegen. Since it makes debugging hard, make sure that the history is preserved appropriately:
https://stackoverflow.com/questions/3887736/keep-git-history-when-splitting-a-file

Copy link
Collaborator

@odashi odashi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dummy, please re-request afterwards

node = ast.parse(
textwrap.dedent(
"""
while True:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed whitespacing issues in function_codegen_test.py to match this style, don't know what I was doing when I wrote those tests...

@ZibingZhang ZibingZhang requested a review from odashi December 10, 2022 08:56

from typing import Any, Callable

from latexify import frontend
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this file should be a unit test frontend_{algorithmic?}_test.py because frontend is not intended to be used directly.
Or it'd be okay to use just exported functions and Style.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can move it, but function_expansion_test and regression_test basically use frontend directly, as they use frontend.function.

Similarly, this file uses function.get_latex. Both of these functions are exported in __init__.py, so it feels somewhat like an integration test, because we're testing end-to-end behavior from function to generated $LaTeX$ output

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it turns that the issue comes from the old implementation of these tests, will consider it later.

src/integration_tests/algorithmic_style_test.py Outdated Show resolved Hide resolved
src/integration_tests/algorithmic_style_test.py Outdated Show resolved Hide resolved
src/integration_tests/algorithmic_style_test.py Outdated Show resolved Hide resolved
"""Visit a While node."""
if node.orelse:
raise exceptions.LatexifyNotSupportedError(
"Codegen does not support while statements with an else clause."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tip: Users don't usually understand what "codegen" (or any other inner routines) is, so simplifying the message is more user-friendly.

Suggested change
"Codegen does not support while statements with an else clause."
"While statement with the else clause is not supported."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair, was trying to model it off the error message in function_codegen about only having assign nodes in multiline functions. Unsure how to change that message as well to be more user friendly

src/latexify/codegen/algorithmic_codegen_test.py Outdated Show resolved Hide resolved
src/latexify/codegen/algorithmic_codegen_test.py Outdated Show resolved Hide resolved
src/latexify/frontend.py Outdated Show resolved Hide resolved
@@ -173,7 +190,7 @@ def expression(
This function is a shortcut for `latexify.function` with the default parameter
`use_signature=False`.
"""
kwargs["use_signature"] = kwargs.get("use_signature", False)
kwargs["style"] = Style.EXPRESSION
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function no longer need to invoke function, just invoking get_latex directly is fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expression returns a LatexifiedFunction, while get_latex returns str. I think we still need the extra work that function provides to wrap the result of get_latex, no?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry I was a bit confused! I will consider these implementations later.

Copy link
Collaborator

@odashi odashi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe good to merge, thanks!


from typing import Any, Callable

from latexify import frontend
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it turns that the issue comes from the old implementation of these tests, will consider it later.

@odashi odashi merged commit 93068ef into google:main Dec 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants