Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
mbaldessari committed Dec 16, 2014
0 parents commit c3d8b96
Show file tree
Hide file tree
Showing 27 changed files with 2,320 additions and 0 deletions.
10 changes: 10 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
*.pyc
*.pdf
*.pyo
build_python
build/
dist/
myfiles/
outputs/
rpm-build.sh
*.swp
272 changes: 272 additions & 0 deletions COPYING

Large diffs are not rendered by default.

5 changes: 5 additions & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
include COPYING
include src/pcp2pdf.bash
include src/pcp2pdf.conf
include src/pcp2pdf.1
include src/pcplogo.png
10 changes: 10 additions & 0 deletions PKG-INFO
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
Metadata-Version: 1.0
Name: pcp2pdf
Version: 0.1
Summary: Convert PCP archive files to pdf
Home-page: https://github.com/mbaldessari/pcp2pdf
Author: Michele Baldessari
Author-email: [email protected]
License: GPLv2
Description: UNKNOWN
Platform: UNKNOWN
6 changes: 6 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
pcp2pdf
========

Creates a PDF report out of PCP archive files collected via pmlogger

Here is a demo pdf: http://acksyn.org/software/pcp2pdf/output.pdf
122 changes: 122 additions & 0 deletions TODO
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
- Rename all the fonts sections/names to proper sensible names

- Refactor PcpStyle class so that no style setting/action/method is
done directly in PcpStats

- Right now we use a fairly big DPI in order to make the images a bit higher quality
This takes more time and consumes a lot more memory. Is there no better way?

- Interval estimation seems to be wrong when interval is extremely low (see tests/naslog.0)

- Add events support so that we have an automatic marker when an event is present
in an archive

- Go through all the FIXMEs

- Performance of the parsing step
while tinkering with pcp2pdf one of my goals is to be able
to render a fairly big archive which also includes per process metrics.
On one of my servers such a daily archive file is around 800MB or so.

Currently my archive parsing looks more or less like this ~250 liner:
https://gist.github.com/mbaldessari/30dc7ae2fe46d9b804f2

I basically use a function that returns a big dictionary in the following
form:
{ metric1: {'indom1': [(ts0, ts1, .., tsN), (v0, v1, .., vN)],
....
'indomN': [(ts0, ts1, .., tsN), (v0, v1, .., vN)]},
metric2: {'indom1': [(ts0, ts1, .., tsX), (v0, v1, .., vX)],
....
'indomN': [(ts0, ts1, .., tsX), (v0, v1, .., vX)]}...}

Profiling is giving me the following numbers:
"""
Parsing files: 20140908.0 - 764.300262451 MB
Before parsing: usertime=0.066546 systime=0.011918 mem=12.34375 MB
After parsing: usertime=1161.825736 systime=2.364544 mem=1792.53125 MB
"""
So roughly 1170 seconds, i.e. ~20 mins

Profiling of parse()
725026682 function calls in 1169.003 seconds

Ordered by: cumulative time
List reduced from 72 to 15 due to restriction <15>

ncalls tottime percall cumtime percall filename:lineno(function)
1 124.550 124.550 1169.003 1169.003 ./fetch.py:140(parse)
29028435 111.320 0.000 693.559 0.000 ./fetch.py:113(_extract_value)
57876970 134.777 0.000 539.856 0.000 /usr/lib64/python2.7/site-packages/pcp/pmapi.py:379(get_vlist)
146339015 367.384 0.000 367.384 0.000 /usr/lib64/python2.7/ctypes/__init__.py:496(cast)
28848535 34.114 0.000 312.698 0.000 /usr/lib64/python2.7/site-packages/pcp/pmapi.py:384(get_inst)
57876970 89.000 0.000 254.070 0.000 /usr/lib64/python2.7/site-packages/pcp/pmapi.py:374(get_vset)
29028435 168.361 0.000 179.190 0.000 /usr/lib64/python2.7/site-packages/pcp/pmapi.py:1724(pmExtractValue)
29028435 61.598 0.000 141.777 0.000 /usr/lib64/python2.7/site-packages/pcp/pmapi.py:364(get_valfmt)
233809633 33.780 0.000 33.780 0.000 {_ctypes.POINTER}
58013698 9.081 0.000 9.081 0.000 {method 'append' of 'list' objects}
519120 4.551 0.000 8.556 0.000 /usr/lib64/python2.7/site-packages/pcp/pmapi.py:1243(pmLookupDesc)
36257575 8.167 0.000 8.167 0.000 {_ctypes.byref}
1442 6.801 0.005 6.803 0.005 /usr/lib64/python2.7/site-packages/pcp/pmapi.py:1578(pmFetch)
518760 4.574 0.000 4.884 0.000 /usr/lib64/python2.7/site-packages/pcp/pmapi.py:1172(pmNameID)
518760 1.280 0.000 2.924 0.000 /usr/lib64/python2.7/site-packages/pcp/pmapi.py:369(get_numval)

real 19m31.860s
user 19m24.391s
sys 0m2.566s
"""

While 20 minutes to parse such a big archive might be relatively ok, I was wondering
what options I have to save time. The ones I can currently think of are:

1) Split the time interval parsing over multiple CPUs. I can divide the archive in subintervals (one per
cpu) and have each CPU do its own subinterval parsing and I stitch everything together at the end.
This is the approach I currently use to create the graph images that go in the pdf (as matplotlib
isn't the fastest thing on the planet)

2) Implement a function in the C python bindings which returns a dictionary as described above.
This would save me all the ctypes/__init__ costs and probably I would shave some time off as there
would be less python->C function calls.

3) See if I can use Cython tricks to speed up things

4) Ignore this issue as with a big archive, creating the actual graph images and the pdf will
also take a fairly long time

NB: I've tried using pmFetchArchive() but a) there was no substantial
difference and b) pmFetchArchive() does not allow interpolation so
users could not specify a custom time interval

- Stable block device naming. Currently we either put sda,sdb,... or dm-0, dm-1 in
the indoms for certain pmdas. As neither is stable, could we try to fix
it by using the wwwid when available?

- Write a custom qa test that creates a pdf with certain metrics (exercising most of the
options). The test could be running strings on the pdf and see if all the metrics
names are present (and not the metrics excluded)

- Currently we need a running pmcd instance to try and fetch help text.
Is there a smarter way?

- Work on moving some of the basic archive parsing functionality
in the pcp python bindings themselves

- Add a default set of custom graphs that are always automatically included
This could be done with the concept of 'profiles'. So something like:
pcp2pdf --profile='storage' would focus only on the stuff relevant for storage
performance. This would include relevant storage metrics and also some custom
metrics that correlate storage metrics with other relevant metrics

- Verify that all the maths in rate conversion are always correct. Or check if there is
a way to disable the rate conversion or do it automatically via PCP libs

- In the progress bar add a % of completion (we know start_time and end_time, so it should be easy)

- Run the code through pylint and pyflake

- Add test cases that exercise all possible command arguments, use some of the
archive files under qa/

- Have the possibility to use a config file to customize how the pdf looks

- Add bash autocompletion
7 changes: 7 additions & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
[global]
quiet=1

[egg_info]
tag_build =
tag_date = 0
tag_svn_revision = 0
65 changes: 65 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
#!/usr/bin/python

try:
from setuptools import setup
except ImportError:
from distutils.core import setup


def discover_and_run_tests():
import os
import sys
import unittest

# get setup.py directory
setup_file = sys.modules['__main__'].__file__
setup_dir = os.path.abspath(os.path.dirname(setup_file))

# use the default shared TestLoader instance
test_loader = unittest.defaultTestLoader

# use the basic test runner that outputs to sys.stderr
test_runner = unittest.TextTestRunner()

# automatically discover all tests
if sys.version_info < (2, 7):
raise "Must use python 2.7 or later"
test_suite = test_loader.discover(setup_dir)

# run the test suite
test_runner.run(test_suite)

from setuptools.command.test import test

class DiscoverTest(test):
def finalize_options(self):
test.finalize_options(self)
self.test_args = []
self.test_suite = True

def run_tests(self):
discover_and_run_tests()

config = {
'name': 'pcp2pdf',
'version': '0.1',
'author': 'Michele Baldessari',
'author_email': '[email protected]',
'url': 'https://github.com/mbaldessari/pcp2pdf',
'license': 'GPLv2',
'package_dir': {'': 'src'},
'packages': ['pcp2pdf'],
'scripts': ['src/bin/pcp2pdf'],
'data_files': [('/etc/bash_completion.d', ['src/pcp2pdf.bash']),
('/etc/pcp/pcp2pdf/', ['src/pcp2pdf.conf']),
('/usr/share/pcp2pdf/', ['src/pcplogo.png'])],
'cmdclass': {'test': DiscoverTest},
'classifiers': [
"Development Status :: 3 - Alpha",
"Topic :: Utilities",
"License :: OSI Approved :: GNU General Public License v2 (GPLv2)",
"Programming Language :: Python",
],
}

setup(**config)
16 changes: 16 additions & 0 deletions src/bin/pcp2pdf
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
#!/usr/bin/python
# pcp2pdf - a script to interact with the Fedora Packaging system
#
# Copyright (C) 2014 Michele Baldessari
# Author(s): Michele Baldessari <[email protected]>
#
# This program is free software; you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by the
# Free Software Foundation; either version 2 of the License, or (at your
# option) any later version. See http://www.gnu.org/copyleft/gpl.html for
# the full text of the license.

from pcp2pdf.__main__ import main

if __name__ == "__main__":
main()
147 changes: 147 additions & 0 deletions src/pcp2pdf.1
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
'\"macro stdmacro
.\"
.\" Copyright (c) 2014 Red Hat.
.\"
.\" This program is free software; you can redistribute it and/or modify it
.\" under the terms of the GNU General Public License as published by the
.\" Free Software Foundation; either version 2 of the License, or (at your
.\" option) any later version.
.\"
.\" This program is distributed in the hope that it will be useful, but
.\" WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
.\" or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
.\" for more details.
.\"
.\"
.TH PCP2PDF 1 "PCP" "Performance Co-Pilot"
.SH NAME
\f3pcp2pdf\f1 \- create a pdf report from a PCP archive
.SH SYNOPSIS
\f3pcp2pdf\f1
[\f3\-grV?\f1]
[\f3\-i\f1 \f2include metrics\f1]
[\f3\-e\f1 \f2exclude metrics\f1]
[\f3\-c\f1 \f2custom graphs\f1]
[\f3\-o\f1 \f2output filename\f1]
[\f3\-S\f1 \f2start time\f1]
[\f3\-T\f1 \f2finish time\f1]
[\f3\-t\f1 \f2time interval\f1]
[\f3\-l\f1 \f2custom label\f1]
[\f3\-m\f1 \f2max indoms\f1]
\f3\-a\f1 \f2archive\f1
.SH DESCRIPTION
.B pcp2pdf
creates a PDF file containing a graph plotting each metric contained in
the PCP archive.
.PP
.B pcp2pdf
only operates on archive files and cannot use live data.
.PP
.B pcp2pdf
needs a locally running.
.I pmcd
in order to fetch the help description of each metric. Without it the
description help texts won't be present.
.PP
By default
.B pcp2pdf
will graph all the metrics contained in an archive file. Use the
.B \-\-include
and
.B \-\-exclude
options to manipulate which metrics should be present in the output.
.PP
Other options control the specific information to be reported.
.TP 5
.BI "\-i|\-\-include " metrics
Includes metrics which match the specified regular expression.
For example:
.I \-\-include 'network.*'
will include only metrics starting with 'network.'. The option can be specified
multiple times. If only
.B \-\-include
is specified, only the matching metrics will
be included in the output. If both
.B \-\-include
and
.B \-\-exclude
are specified, first all excluded metrics are evaluted, and then the ones
explicitely included.
.TP
.BI "\-e|\-\-exclude " metrics
Excludes metrics which match the specified regular expression. For example:
.B \-\-exclude 'proc.*'
will exclude all metrics starting with 'proc.'.
The option can be specified multiple times. If only
.B \-\-exclude
is specified, all metrics are shown except the specified ones
.TP
.BI "\-o|\-\-output"
Output file name (default:
.I output.pdf
)
.TP
.BI "\-c|\-\-custom " custom_graphs
Add custom graphs with multiple metrics. For example:
.I \-\-custom 'traffic:network.tcp.outrsts:.*,network.tcp.ofoqueue.*,network.interfaces.out.*:eth[0-9]'
will create a 'traffic' page with the each matched metric and the corresponding matched indom
single graph. The general syntax is the following:
.I \-\-custom '<label>:<metric1_regex>:<indom1_regex>,...<metricN_regex>:<indomN_regex>'
The option can be specified multiple times. This makes it easy to try and correlate metrics that normally
would not appear on the same graph. The different metrics' values need to be in similar scales or the
graph will be not too useful.
.TP
.BI "\-r|\-\-raw"
Disable the rate conversion for all the metrics that have the
.I PM_SEM_COUNTER
semantic associated with them. By default those are converted via the
following formula:
.I (value(T) \- value(T\-1)) / (T \- T\-1)
By setting this option the aforementioned conversion will not take place.
.TP
.BI "\l|\-\-label " labels
Adds one or more labels to a graph at specified time.
For example:
.I \-\-label 'foo:2014-01-01 13:45:03' \-\-label 'bar:2014-01-02 13:15:15'
will add two extra labels on every graph at those times.
This is usually useful for correlation analysis. The time format is
specified in the
.B PCPIntro
man page.
.TP
.BI "\-g|\-\-groupindom"
Adds pages where each page has all the metrics belonging to the same indom.
WORKINPROGRESS
.TP
.BI "\-S|\-\-start " time
Sets the start of the time window. See
.BR PCPIntro (1)
for the accepted
.I time
formats
.TP
.BI "\-T|\-\-finish " time
Sets the end of the time window. See
.BR PCPIntro (1)
for the accepted
.I time
formats
.TP
.BI "\-t|\-\-interval " delta
Sets the sampling interval. See
.BR PCPIntro (1)
for the accepted
.I time
interval formats
.TP
.BI "\-a|\-\-archive " file
Sets the PCP archive name to be parsed
.TP
.BI "\-n|\-\-nohistogram"
Disable the frequency histogram graphs (enabled by default)
.TP
.BI "\-V|\-\-version"
Display version number and exit
.TP
.BI "\-?|\-\-help"
Show the usage message and exit
Loading

0 comments on commit c3d8b96

Please sign in to comment.