Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/fix long output prints #1513

Conversation

OlawumiSalaam
Copy link
Contributor

Thank you for taking your time to contribute to Ersilia, just a few checks before we proceed

  • Have you followed the guidelines in our Contribution Guide
  • Have you written new tests for your core changes, as applicable?
  • Have you successfully ran tests with your changes locally?

Description

This PR addresses an issue where large outputs clutter the terminal, making it difficult to interpret results effectively. Specifically, some models, such as descriptors, produce outputs with thousands of columns which floods the terminal and reduce readability.

Changes to be made
Added truncate_output Function:
Handles truncation for large outputs, summarizing them with ellipses (...) and providing context about omitted parts.
Ensures clarity and concise representation of outputs.
Modified Output Handling:
Applied truncate_output to both regular and generator outputs.
Ensured the truncated version is passed to echo or print_result_table as applicable.

Steps taken to implement the fix
Created a utility function, truncate_output, within the output handling module.
Updated the generator loop and regular output sections to conditionally truncate results based on size.
Verified that truncated outputs integrate seamlessly with echo and print_result_table functions.
Status

  • Implementation complete.
  • Tested some models: eos4wt0, eos4u6p, eos7w6n, eos4avb, eos3b5e, and eos69p9.
  • All tests pass successfully when run locally.

To do

Review and address feedback

Is this pull request related to any open issue? If yes, replace issueID below with the issue ID

fixes #1455

@@ -10,6 +10,42 @@
from . import ersilia_cli


def truncate_output(output, max_items=10, max_chars=500):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this function to ersilia/utils/terminal.py?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay. please what is the logic behind that for learning purpose?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function you have created is terminal utility, right? So putting it in its own module makes the code clean and managable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Abellegese that is indeed the case, @OlawumiSalaam

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Abellegese Thank you very much for this. I am still learning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats very much appreciated @OlawumiSalaam. Keep learning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DhanshreeA
I have moved the truncate_output function to ersilia/utils/terminal.py as suggested and updated the imports.

else:
echo("Something went wrong", fg="red")
else:
if as_table:
print_result_table(result)
if isinstance(result, dict):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we do this? I wouldn't touch this functionality just yet, we need to deal with print_result_table separately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DhanshreeA
This is explanation of my logic, if it helps. Let me know if you want me to modify anything

if isinstance(result, types.GeneratorType):
    for result in mdl.run(input=input, output=output, batch_size=batch_size):
        if result is not None:
            truncated = truncate_output(result)  # Truncate the output
            if as_table:
                print_result_table(truncated)  # Print truncated result as table
            else:
                echo(truncated)  # Print truncated raw output
        else:
            echo("Something went wrong", fg="red")  # Print error if result is None
else:
    #truncated = truncate_output(result)  # Optional: Truncate non-generator output
    if as_table:
        print_result_table(result)  # Print the entire result as a table
    else:
        try:
            echo(truncated)  # Print truncated result
        except Exception as e:
            echo(f"Error: {e}", fg="red")  # Print error if echo fails
            print_result_table(truncated)  # Fallback: Print truncated result as table

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @OlawumiSalaam I mean just try to understand, in this PR you have changed 1 file, run command. I am only seeing you trunctated results in per generator, the other case is not addressed. The print output as table (specifically fixed in my PR) handle printing results from three file types, csv, json and hdf5 also a normal json displayed in the terminal. Maybe you need to create a more general trunctor to handle truncating this files output so that the print table function can display them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DhanshreeA @Abellegese
If I understand properly, your suggestion is to extend the functionality of the truncate to handle the other three file types, csv, json and hdf5 so the print table function can display them to?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes @OlawumiSalaam. But the files shouldn't be modified because users want them.

@DhanshreeA
Copy link
Member

@OlawumiSalaam can we write a small unit test for the truncate_output method in a test/test_utils.py file?

@OlawumiSalaam
Copy link
Contributor Author

OlawumiSalaam commented Jan 17, 2025

@OlawumiSalaam can we write a small unit test for the truncate_output method in a test/test_utils.py file?

I have written the unit test. Please review and Please let me know if there is anything else to adjust. Thanks

Copy link
Contributor

@Abellegese Abellegese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also not clear what the code is testing. So currently you applied truncation for the case the mdl.run returns generator. This generator always returns a json based output from the conventional runner, given as below:

[
{
    "input": {
        "key": "NQQBNZBOOHHVQP-UHFFFAOYSA-N",
        "input": "C1=C(SC(=N1)SC2=NN=C(S2)N)[N+](=O)[O-]",
        "text": "C1=C(SC(=N1)SC2=NN=C(S2)N)[N+](=O)[O-]"
    },
    "output": {
        "outcome": [
            261.313
        ]
    }
}
{
    "input": {
        "key": "HEFNNWSXXWATRW-UHFFFAOYSA-N",
        "input": "CC(C)CC1=CC=C(C=C1)C(C)C(=O)O",
        "text": "CC(C)CC1=CC=C(C=C1)C(C)C(=O)O"
    },
    "output": {
        "outcome": [
            206.28499999999997
        ]
    }
}
]

In the output: { outcome section there are several datastructure currently supported by Ersilia such as given below. In the above output case, the datastructure is Single.

{
    "Single": lambda x: isinstance(x, list) and len(x) == 1,
    "List": lambda x: isinstance(x, list)
    and len(x) > 1
    and all(isinstance(item, (str, int, float)) for item in x),
    "Flexible List": lambda x: isinstance(x, list)
    and all(isinstance(item, (str, int, float)) for item in x),
    "Matrix": lambda x: isinstance(x, list)
    and all(
        isinstance(row, list)
        and all(isinstance(item, (str, int, float)) for item in row)
        for row in x
    ),
    "Serializable Object": lambda x: isinstance(x, dict),
}

So I believe you should somehow make the truncation function to be able to work with them. Also in the unittest it there should be a test to test this structures based output.

Comment on lines 194 to 201
elif isinstance(output, str):
if len(output) > max_chars:
suffix = " ... (truncated)"
truncated = output[
: max_chars - len(suffix) - 1
].rstrip() # Adjust for space
return f"{truncated} {suffix}"
return output
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @OlawumiSalaam can you explain the case this code truncates or gets used?

Copy link
Contributor Author

@OlawumiSalaam OlawumiSalaam Jan 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Abellegese
Thank you for your detailed feedback.
This is my first task where I am diving deeper into the Ersilia codebase, and I am still in the process of fully understanding the purpose and structure of each part of the code. Initially, I wasn’t certain whether all models produced the same type of output, so I aimed to create a generic truncation function that could handle common data structures.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I understand you clearly now, i should modify the function to handle the supported data structures (e.g., Single, List, Matrix, Serializable Object) as shared from the feedback.

@DhanshreeA
Copy link
Member

@OlawumiSalaam This test is failing on your branch:

Traceback (most recent call last):
File "/usr/share/miniconda/envs/test/bin/ersilia", line 8, in
sys.exit(cli())
File "/usr/share/miniconda/envs/test/lib/python3.10/site-packages/click/core.py", line 1161, in call
return self.main(*args, **kwargs)
File "/usr/share/miniconda/envs/test/lib/python3.10/site-packages/click/core.py", line 1082, in main
rv = self.invoke(ctx)
File "/usr/share/miniconda/envs/test/lib/python3.10/site-packages/click/core.py", line 1697, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/share/miniconda/envs/test/lib/python3.10/site-packages/click/core.py", line 1443, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/share/miniconda/envs/test/lib/python3.10/site-packages/click/core.py", line 788, in invoke
return __callback(*args, **kwargs)
File "/home/runner/work/ersilia/ersilia/ersilia/cli/commands/init.py", line 31, in wrapper
return func(*args, **kwargs)
File "/home/runner/work/ersilia/ersilia/ersilia/cli/commands/run.py", line 79, in run
echo(truncated)
File "/home/runner/work/ersilia/ersilia/ersilia/cli/echo.py", line 85, in echo
text = emoji.emojize(text)
File "/usr/share/miniconda/envs/test/lib/python3.10/site-packages/emoji/core.py", line 219, in emojize
return pattern.sub(replace, string)
TypeError: expected string or bytes-like object

truncated is supposed to return text output. I believe that's not what's happening with your recent changes causing the echo function to fail.

@OlawumiSalaam
Copy link
Contributor Author

@DhanshreeA I will push the updated changes

@OlawumiSalaam
Copy link
Contributor Author

@DhanshreeA @Abellegese
Thank you for you help. I have addresses the feedbacks. Please kindly review the updated task. Thank you.

@OlawumiSalaam
Copy link
Contributor Author

Hi @DhanshreeA
I am struggling with my current task but I have updated the PR based on your feedback and Abel's feedback. Abel was on a call with me yesterday and he was very helpful. Although, I am working on the unit test to test the functionality. Work is a bit slow due to internet issues. I have pushed what I did so far. I want to let you know what is happening at my end.
As soon as I finish the unit test, I will update you and push it to the PR
Thank you

@DhanshreeA
Copy link
Member

@OlawumiSalaam I think the confusion should resolve once I merge @Abellegese's PR since both of you are working on terminal utilities, specifically those that go within the run command.

@DhanshreeA
Copy link
Member

@OlawumiSalaam could you look into the conflicts on this PR? I would first recommend checking out the master branch, seeing how the code works especially with the --as_table functionality now working, and then taking it from there. Specifically, I would suggest doing the following:

  1. Run a few different models as ersilia run -i "<some input>" --as_table, or even as `ersilia run -i "" -o "output.csv" --as_table"
  2. You can experiment both with a couple of different models, such as eos3b5e, eos7w6n, eos5guo etc, and with different output formats (csv, json, h5)
  3. Then figure out how the truncate_output function would fit into this picture, especially in the case of when an output file is requested, this could mean, using your function INSIDE the print_result_table function, as opposed to passing it truncated values from your function.

@OlawumiSalaam
Copy link
Contributor Author

@DhanshreeA
I have seen the way print_result_table works on the terminal. even though a table is rendered, the terminal is still clustered and not very easy to follow. When an output file is specified (e.g., -o "output.csv"): Should truncation still apply here? Or should it only apply to the terminal display?

@GemmaTuron
Copy link
Member

GemmaTuron commented Jan 22, 2025

Hi @DhanshreeA and @OlawumiSalaam

I think there is a misunderstanding here. This is not what I meant by fixing the long output printing. I do not want to modify anything on the CLI, only the logger.DEBUG
I am afraid we are introducing bugs here that we do not want. The terminal should always print the entire output upon the "run" command when no output is specified. Please do not modify anything about how the terminal is currently showing the outputs, not even -as-table or anything like that

@OlawumiSalaam
Copy link
Contributor Author

@GemmaTuron
This is well understood.

@DhanshreeA
Copy link
Member

Closing this PR as we were working on the wrong thing all along. @OlawumiSalaam will create a new PR.

@DhanshreeA DhanshreeA closed this Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

🐛 Bug: Ersilia debug logs should limit printing model result
4 participants