-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How about something like this for overload protection? [For review] #114
How about something like this for overload protection? [For review] #114
Conversation
Looks interesting, wouldn't that kind of backpreasure consome more memory on the traced node? |
Also, how much work would it be, having the eb jar, read some startup cmdline flag, and then just outputting the contents to a shell? We could store the ModFunc's being traced from the gui, in the .erlyberly file, so that the cmdline startup option, knows what to trace. |
Hey @aboroska, what feature does this add which is not in the original overload_protection branch? I am cautious about queueing trace logs on the remote side, like we used to do. If there is an overload problem it should fail fast, stop dbg and ditch the message queue to preserve the node as much as possible. Queueing them may take more resources from a node already in trouble. I am planning on putting load shedding in the UI so it cannot be overloaded so easily either. |
@ruanpienaar it is easy to trace to a port using dbg but I don't know how overload protection would work. That is probably where redbug would be best. |
There are two aspects of the overload and this PR addresses both: 1., The GUI and the channel to the GUI can get overloaded. Sending only a certain amount of traces per tick solves that. 2., Flooding the node which is debugged. Suspending the trace when too many logs arrived helps here. Additionally, this PR addresses one more thing: 3., Knowing which trace went out of control. It does not drop the messages that caused the flood. The original overload_protection branch before this PR tries to answer number 2 only, if I read it right. The memory issue with queueing: here my intention was that this queue would be short enough. The length can even be automatic based on the available memory on the node. It should be a buffer to absorb intermittent trace spikes. If the load stays high, the queue quickly fills up and tracing stops. The goal is to send as much to the GUI as possible, and only throttle the sending when there are spikes. The length of the queue should be adjusted such a way (maybe based on node memory) that it does not take away too many resources. Those logs might park anyway in the mailbox of the process. Note that the queue module has an efficient implementation, likely uses only two lists of pointers. If the GUI was polling, to better regulate its load (probably a more robust solution for 1) the node would still queue the logs. BTW the polling would require only a small modification to this PR on the Erlang side and obviously more in the Java part. Shall we go on the GUI polling route? |
Could we poll the trace results on the simple OTP node in interface on eb ? |
Concerning the UI polling, this is how erlyberly used to work and it was a pain on the java side. It introduces latency so I don't want to go down that path. |
What about a less "intense" ui version? Start observer, choose a node, trace pattern and process details, it's pretty quick. Even the "save traces to file" feature is super useful. I cannot really comment on how to speed up javaFx, i'll try and look into this. |
I haven't compared observer tracing to erlyberly tracing. If erlyberly is a lot slower it is probably because of the three levels of Observable list: list -> filtered list -> sorted list. And the way that properties are used. Does observer show the arguments and results in the table? Converting those to strings is probably where most time is spent. Say that, I haven't measured it at all. Measuring this and improving it is a goal in this piece of work, I don't want to code around it until it is tackled directly. It might be a non-issue in the future. |
Storing every trace log as a string means that all data has effectively two copies, expecting that strings and binaries make up most of the data for messages. |
I get you a example from observer |
I'll* |
I'm going to put this PR on hold until I get round to looking at overload issues in the UI. The UI will need to load shed, whatever we do on the remote end. I think it will be able to drop messages it can't handle quickly but I'll need to measure. In the original overload_protection branch, when tracing is suspended the collector process should send the messages on the queue to the UI so that at least the traces we have already "paid for" get through, even if the UI then decides to drop them because it is overloaded. We're getting a bit side tracked talking about observer, eb needs to be the wireshark for erlang rather than competing with existing tools. I want to keep the number of "modes" that eb can be in relatively small so there is less to test. |
This solution does not use an extra helper process, to save copying the logs that would happen with message passing. Unfortunately, not using an auxiliary process makes the code slightly complex.
When the absolute threshold (LOGS_SUSPEND_THRESHOLD) reached the tracer is suspended, but the accumulated logs are still sent in processable chunks (LOGS_PER_TICK) so, the cause of the trace overload can be analysed. Otherwise we might not know which traces caused the flood.
There is a tick every LOGS_LOAD_CHECK_TICK_MS milliseconds and maximum LOGS_PER_TICK logs are sent to the GUI between two ticks. The extra incoming logs are removed from the mailbox and queued.
Tested and seems to work. Left the io:formats in and set the hardcoded limits to low to help testing. In the future the limits could be made configurable perhaps.
My "test strategy" was the following: start an Elixir shell, trace the Enum module, hit enter a few times in the Elixir shell.
Thoughts?