-
Notifications
You must be signed in to change notification settings - Fork 253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf data idle time #393
Comments
Hi @prim! Thanks for providing clear repro instructions along with what you expect the output to be. I'll take a look next time I sit down to work on speedscope |
I see the same thing, here's a thread with just 189 samples over two minutes:
But it looks like it ran the entire time: Perf also records either on a frequency or at an interval of event count, so you don't necessarily have a fixed rate of samples:
But for |
@prim @DanielThomas I started trying to investigate this by updating the test environment to be able to generate the You can see the progress and the output I'm getting here: #401 (comment) So, at the moment, I'm stuck. Ideally, if you can identify what I'm doing wrong and figure out how to fix the dockerized output generation, that would be great. Failing that, if you can provide the full output of one of your profiles with the incorrect display in speedscope, that would be a different way of me making progres. Thanks! |
Here's a system wide profile that demonstrates the problem: The
|
Hmm, okay interesting. I think I understand the high level problem, but want to write it out to make sure I understand. When a thread is idle because the OS scheduler isn't scheduling it either because it's sleeping or blocked on a certain kind of syscall like This is problematic because speedscope uses these samples to infer what's happening in continuous time. This isn't specific to Lines 512 to 548 in 6493c5f
I'm not sure what the appropriate resolution here is. There are, I believe, two separate issues.
|
I haven't checked the documentation or code, the following are my guesses. The output of perf script is one by one sampling data. The first line of each sampling data consists of thread name, thread ID, timestamp, sampling interval, and sampling events, followed by N lines of function call stack.
Therefore, we can know that the sampling interval of @DanielThomas 's output is 0.20408163 seconds, which is 5 samples per second. I use this little script below to process the output of the perf script and let speedscope render what I think is correct. speedscope also has problems when thread names contain spaces. # encoding: utf8
def split_line(line):
line = line.strip()
line = line.replace("\t", "")
for i in xrange(8):
line = line.replace(" ", " ")
ps = line.rsplit(" ", 4)
return ps
def format_time(begin, now):
n = int(now - begin)
minute = n / 60
second = n % 60
return "%02d:%02d" % (minute, second)
class Thread(object):
def __init__(self, id, interval):
self.id = id
self.interval = interval
self.except_amount = int(1 / interval)
self.stats = {}
self.min_ts = None
self.max_ts = None
self.ts_list = []
self.cpu_usage = {}
def handle(self, ps):
ts = float(ps[2][:-1])
second = int(ts)
self.stats[second] = self.stats.get(second, 0) + 1
if self.min_ts is None:
self.min_ts = ts
self.max_ts = ts
def output(self):
print "thread", self.id
print "last seconds", self.max_ts - self.min_ts
M = self.stats
e = self.except_amount
for sec, n in sorted(M.items()):
r = n * 1.0 / e
# print sec, n, e, r
# if r > 0.3 and 0:
# print self.id, format_time(self.min_ts, sec), n, e, r
self.cpu_usage[sec] = r
class PerfScript(object):
def __init__(self, file_path):
self.file_path = file_path
self.thread_names = {}
self.replace_space_in_filename()
self.init()
def replace_space_in_filename(self):
m = {}
with open(self.file_path, "r") as rfile:
for line in rfile:
if "cpu-clock" in line:
ps = split_line(line)
filename = ps[0]
if " " in filename:
m[filename] = filename.replace(" ", "-")
with open(self.file_path, "rb") as rfile:
content = rfile.read()
for k, v in m.items():
content = content.replace(k, v)
with open(self.file_path, "wb") as wfile:
wfile.write(content)
def init(self):
with open(self.file_path, "r") as rfile:
for line in rfile:
if "cpu-clock" in line:
ps = split_line(line)
thread_id = ps[1]
dt = int(ps[3]) / 1000000000.
self.event_ps = ps
self.interval = dt
self.thread_names[thread_id] = ps[0]
def with_idle(self, out_file_path): [13/1933]
lines = []
dt = self.interval
thread_last_ts = {}
with open(self.file_path, "r") as rfile:
for line in rfile:
if "cpu-clock" in line:
ps = split_line(line)
ts = float(ps[2][:-1])
thread_id = ps[1]
last_ts = thread_last_ts.get(thread_id)
if last_ts is not None:
while True:
diff = ts - last_ts
# print "T", diff, dt, last_ts, ts
if diff > 0.0 and diff > 2 * dt:
last_ts += dt
# print "A", last_ts
self.event_ps[0] = self.thread_names[thread_id]
self.event_ps[1] = thread_id
self.event_ps[2] = str(last_ts) + ":"
lines.append(" ".join(self.event_ps) + "\n")
# lines.append("\t 0x0 [unknown] ([unknown])\n")
lines.append("\n")
else:
break
last_ts = ts
thread_last_ts[thread_id] = last_ts
lines.append(line)
with open(out_file_path, "w+") as wfile:
wfile.write("".join(lines))
def stat(self):
threads = {}
with open(self.file_path, "r") as rfile:
for line in rfile:
if "cpu-clock" in line:
ps = split_line(line)
thread_id = ps[1]
thread = threads.get(thread_id)
if thread is None:
thread = threads[thread_id] = Thread(thread_id, self.interval)
thread.handle(ps)
for k, v in threads.items():
v.output()
ts = set()
for k, v in threads.items():
ts = ts.union(v.cpu_usage.keys())
for k in sorted(ts):
s = 0
for _, v in threads.items():
s += v.cpu_usage.get(k, 0)
print "total cpu", k, s
continue
if s < 0.3:
continue
print "total cpu", k, s
for _, v in threads.items():
print v.id, v.cpu_usage.get(k, 0)
print
p = PerfScript("perf.script")
p.with_idle("perf.script.idle")
p.stat() I'm not sure if other sampling profile tools will not do sampling when the program is idle like perf. In fact, I'm trying to use speedscope to analyze CPU glitches in my program and locate the cause of high CPU consumption in a moment. Like the |
Rather than estimate the sampling frequency, I'd just look at the smallest period seen between samples on any individual thread and use as the minimum time resolution for all threads. Off-CPU time is an entirely different concern, just having on-CPU time work hugely helpful on it's own. |
Oh, I was wrong. 20408163 refers to 0.020408163 seconds. |
Perf has the ability to report context switches too, via If you additionally do |
speedscope provides a great time order view.
But for perf data, when the program is idle, speedscope will think that the time is consumed by the next sampled call stack. This is wrong, the program should be idle at this time.
The result now looks like this:
A more correct result should look like this:
The text was updated successfully, but these errors were encountered: