How about something like this for overload protection? [For review] #114

aboroska · 2016-06-04T20:28:03Z

This solution does not use an extra helper process, to save copying the logs that would happen with message passing. Unfortunately, not using an auxiliary process makes the code slightly complex.

When the absolute threshold (LOGS_SUSPEND_THRESHOLD) reached the tracer is suspended, but the accumulated logs are still sent in processable chunks (LOGS_PER_TICK) so, the cause of the trace overload can be analysed. Otherwise we might not know which traces caused the flood.

There is a tick every LOGS_LOAD_CHECK_TICK_MS milliseconds and maximum LOGS_PER_TICK logs are sent to the GUI between two ticks. The extra incoming logs are removed from the mailbox and queued.

Tested and seems to work. Left the io:formats in and set the hardcoded limits to low to help testing. In the future the limits could be made configurable perhaps.

My "test strategy" was the following: start an Elixir shell, trace the Enum module, hit enter a few times in the Elixir shell.

Thoughts?

ruanpienaar · 2016-06-05T10:03:45Z

Looks interesting, wouldn't that kind of backpreasure consome more memory on the traced node?

ruanpienaar · 2016-06-05T10:06:27Z

Also, how much work would it be, having the eb jar, read some startup cmdline flag, and then just outputting the contents to a shell?

We could store the ModFunc's being traced from the gui, in the .erlyberly file, so that the cmdline startup option, knows what to trace.

andytill · 2016-06-05T11:23:34Z

Hey @aboroska, what feature does this add which is not in the original overload_protection branch?

I am cautious about queueing trace logs on the remote side, like we used to do. If there is an overload problem it should fail fast, stop dbg and ditch the message queue to preserve the node as much as possible. Queueing them may take more resources from a node already in trouble. I am planning on putting load shedding in the UI so it cannot be overloaded so easily either.

andytill · 2016-06-05T11:26:46Z

@ruanpienaar it is easy to trace to a port using dbg but I don't know how overload protection would work. That is probably where redbug would be best.

aboroska · 2016-06-05T17:14:37Z

There are two aspects of the overload and this PR addresses both:

1., The GUI and the channel to the GUI can get overloaded. Sending only a certain amount of traces per tick solves that.

2., Flooding the node which is debugged. Suspending the trace when too many logs arrived helps here.

Additionally, this PR addresses one more thing:

3., Knowing which trace went out of control. It does not drop the messages that caused the flood.

The original overload_protection branch before this PR tries to answer number 2 only, if I read it right.

The memory issue with queueing: here my intention was that this queue would be short enough. The length can even be automatic based on the available memory on the node. It should be a buffer to absorb intermittent trace spikes. If the load stays high, the queue quickly fills up and tracing stops. The goal is to send as much to the GUI as possible, and only throttle the sending when there are spikes. The length of the queue should be adjusted such a way (maybe based on node memory) that it does not take away too many resources. Those logs might park anyway in the mailbox of the process.

Note that the queue module has an efficient implementation, likely uses only two lists of pointers.

If the GUI was polling, to better regulate its load (probably a more robust solution for 1) the node would still queue the logs. BTW the polling would require only a small modification to this PR on the Erlang side and obviously more in the Java part. Shall we go on the GUI polling route?

ruanpienaar · 2016-06-05T18:00:26Z

Could we poll the trace results on the simple OTP node in interface on eb ?

andytill · 2016-06-05T19:10:45Z

Concerning the UI polling, this is how erlyberly used to work and it was a pain on the java side. It introduces latency so I don't want to go down that path.

ruanpienaar · 2016-06-06T08:48:36Z

What about a less "intense" ui version?
How does observer and wx achieve crazy amounts of tracing in their trace window?

Start observer, choose a node, trace pattern and process details, it's pretty quick.

Even the "save traces to file" feature is super useful.

I cannot really comment on how to speed up javaFx, i'll try and look into this.

andytill · 2016-06-06T08:54:19Z

I haven't compared observer tracing to erlyberly tracing. If erlyberly is a lot slower it is probably because of the three levels of Observable list: list -> filtered list -> sorted list. And the way that properties are used. Does observer show the arguments and results in the table? Converting those to strings is probably where most time is spent.

Say that, I haven't measured it at all. Measuring this and improving it is a goal in this piece of work, I don't want to code around it until it is tackled directly. It might be a non-issue in the future.

andytill · 2016-06-06T08:56:33Z

Storing every trace log as a string means that all data has effectively two copies, expecting that strings and binaries make up most of the data for messages.

ruanpienaar · 2016-06-06T09:08:39Z

I get you a example from observer

ruanpienaar · 2016-06-06T09:09:01Z

I'll*

andytill · 2016-06-07T20:20:26Z

I'm going to put this PR on hold until I get round to looking at overload issues in the UI. The UI will need to load shed, whatever we do on the remote end. I think it will be able to drop messages it can't handle quickly but I'll need to measure.

In the original overload_protection branch, when tracing is suspended the collector process should send the messages on the queue to the UI so that at least the traces we have already "paid for" get through, even if the UI then decides to drop them because it is overloaded.

We're getting a bit side tracked talking about observer, eb needs to be the wireshark for erlang rather than competing with existing tools. I want to keep the number of "modes" that eb can be in relatively small so there is less to test.

andytill · 2016-07-31T15:03:15Z

Hey @aboroska, I'm going to close this now that the ideas in this PR have been rolled into the main overload PR #112, which is now merged.

Add proof of concept overload protection (hardcoded toy limits)

ea6d8c0

andytill closed this Jul 31, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How about something like this for overload protection? [For review] #114

How about something like this for overload protection? [For review] #114

aboroska commented Jun 4, 2016 •

edited

Loading

ruanpienaar commented Jun 5, 2016

ruanpienaar commented Jun 5, 2016 •

edited

Loading

andytill commented Jun 5, 2016

andytill commented Jun 5, 2016

aboroska commented Jun 5, 2016

ruanpienaar commented Jun 5, 2016

andytill commented Jun 5, 2016

ruanpienaar commented Jun 6, 2016

andytill commented Jun 6, 2016

andytill commented Jun 6, 2016

ruanpienaar commented Jun 6, 2016

ruanpienaar commented Jun 6, 2016

andytill commented Jun 7, 2016

andytill commented Jul 31, 2016

How about something like this for overload protection? [For review] #114

How about something like this for overload protection? [For review] #114

Conversation

aboroska commented Jun 4, 2016 • edited Loading

ruanpienaar commented Jun 5, 2016

ruanpienaar commented Jun 5, 2016 • edited Loading

andytill commented Jun 5, 2016

andytill commented Jun 5, 2016

aboroska commented Jun 5, 2016

ruanpienaar commented Jun 5, 2016

andytill commented Jun 5, 2016

ruanpienaar commented Jun 6, 2016

andytill commented Jun 6, 2016

andytill commented Jun 6, 2016

ruanpienaar commented Jun 6, 2016

ruanpienaar commented Jun 6, 2016

andytill commented Jun 7, 2016

andytill commented Jul 31, 2016

aboroska commented Jun 4, 2016 •

edited

Loading

ruanpienaar commented Jun 5, 2016 •

edited

Loading