-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean Chains Stack Traces and consolidate logging config. Fixes BT-13465 BT-13378 #1353
Conversation
425c2fb
to
0878078
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
) | ||
local_config_handler.LocalConfigHandler.set_dynamic_config( | ||
definitions.DYNAMIC_CHAINLET_CONFIG_KEY, json.dumps(chainlet_to_predict_url) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, this doesn't have to have all the predict URLs of dependencies populated before mounting and running? I noticed you put entrypoint_artifact
at the end of the chainlet_artifacts
list but what about the topology of other Chainlets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chains are always DAGs by construction and _get_ordered_dependencies
gives you a topological sorting.
That's also how it worked before we had dynamic config map, because then all URLs were directly baked into the generated truss config.
@@ -117,6 +121,8 @@ async def run_remote( | |||
), | |||
simple_default_arg: list[str] = ["a", "b"], | |||
) -> tuple[int, str, int, shared_chainlet.SplitTextOutput, list[str]]: | |||
logging.info("User log root.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you intend to leave these in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeas, that's in the integration test chain, I like to have this around when debugging/changing anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good
return not any(path in msg for path in excluded_paths) | ||
|
||
|
||
class _AccessJsonFormatter(jsonlogger.JsonFormatter): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like DISABLE_JSON_LOGGING is false by default right? Does that mean that these access logs will be logged as json by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works differently: there is always access (and default) logging.
DISABLE_JSON_LOGGING
switches all log formatters from JSON (previous static behavior) to plain formatting (new, optional, non-default behavior).
For example if you run local docker integration tests, the JSON formatter escapes newlines and stack traces (like in the PR description) are basically unreadable, you need to copy the string into a python interpreter and print it in a formatted way in order to see anything - that's why I added this option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it, thanks
truss/templates/shared/log_config.py
Outdated
if record.name == "uvicorn.access" and record.args and len(record.args) == 5: | ||
client_addr, method, raw_path, version, status = record.args | ||
path_decoded = urllib.parse.unquote(str(raw_path)) | ||
new_message = f"Incoming {client_addr} - {method} {path_decoded} HTTP/{version} {status}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did we need to make this change? I think if we're touching this at all we should probably exclude the IP Address either, that's been there for a long time and is not meaningful to userse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noticed that the two formatters were inconsistent, it's now Handled request from
for both.
The for chainlets there are access logs and logs from httpx/aiohttp for outgoing requests, but the logs lines of each are not very self-explanatory in terms of whether it's outgoing or incoming. So to clearly disambiguate this, I modified the access log. Also it is not really clear from the access log if this is logged before or after the request is done (somewhat implied by having status code though).
Also the escaped paths looked crappy /v1/models/model%3Apredict
...
I'm open to changing the wording and fields though - for example we could add trace IDs (might be more useful than client addr). WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, these are internal IPs so I think we should drop them (fine to do in follow-up). Trace IDs are at least useful to us, so makes sense to include
7165440
to
ac6ecca
Compare
🚀 What
Chains uses custom stack trace filtering to include only user-code and remove all framework (chains, truss, fastapi) frames from the stack (which can be quite bloated). However, this was only effective for the error response, the exception logging in truss server did not apply the same filters and lead to huge stack traces. They were noisy (-> annoying) and users didn't find the culprit (e.g. not seeing 502 errors from RPCs) or used catch all try-excepts that would prevent the log pollution but also make debugging impossible.
Therefore there is high value in adding a special case in the truss server. Even thought this is at the moment only possible with a sketchy ad-hoc import. After TaT and having both truss and chains code available in a common lib, it can be improved.
New logs (error chained over 3 hops):
Old log (same error):
This change also contains some changes to the uvicorn logging config and other touchups:
Optionally disable json logging in local docker runs (via env var). In particular JSON contained in the logs was totally unreadable locally when escaped.
Make clear in the logs what is a handled HTTP request (there can be confusion if HTTP clients on the server log outgoing requests).
There was some confusion around where logging is setup: in the server main file, inside the server and via a config dict passesd to uvicorn: it's now all done via the same config passed to uvicorn, but in order to get pre-startup logging, it has to be additionally configured in the server's
__init__
- but both times using the identical way of setting it up.Simplify chains docker deploy sequence.
💻 How
🔬 Testing