SStatGen | Log Parsing, Statistical Analysis, Performance Graphs

Subscribe To Corey's Blog Subscribe To Goldblog goldb.org home

*Note: I have built lots of custom tools over the past few years for doing similar things as this. This was my attempt (summer '06) to create a reusable core for building more extensive tools on top of. I have been using various parts of the code for real work since, but never put together a complete release for distribution. If anyone is interested in the codebase, please contact me.


StatGen

StatGen is a toolset and API written in Python.
It is used for:

  • parsing log files
  • running statistical calculations on numeric sequences and time series
  • generating performance graphs

How It Works

StatGen is built with the following components:

  • Python
  • Microsoft Log Parser 2.2 (win32 binary)
  • Matplotlib (pure python library, has a dependency on python numeric)

Installation/Setup

  1. install Python
  2. install Python Numeric
  3. install Matplotlib
  4. unzip StatGen.zip

Scripting With The Python API:

Import The StatGen Module

To begin working with the API, you need to import the 'statgen' module into your namespace. So assuming you are working on a script in the same directory as statgen.py, you would just do:

import statgen


LogParser Objects

Most of your interactions with the API are done through the LogParser class (inside stagen.py). This includes a wrapper for MS Log Parser 2.2 which gets called when we instantiate the LogParser class.

The constructor takes 2 arguments, an input type (in this example, our local Event Log: EVT), and a SQL Query. See MS Log Parser documentation for details on possible input types and query syntax.

This is a pretty lame query; hopefully you are logging something worth analyzing. But this returns data every time it is run and needs no setup, so it serves as a good example.


query tips:

  • use TimeGenerated as the first column in all of your queries
  • use an 'ordered by TimeGenerated' in all queries so we can convert directly to a time series
lp = statgen.LogParser('EVT', 'select TimeGenerated, EventID from System order by TimeGenerated')


Example: Headers, Footers

Here we print the result set column headers and footers returned from an MS Log Parser query. This is a good place to start to make sure that our setup is working and that our query is correct.

#!/usr/bin/env python

import statgen

lp = statgen.LogParser('EVT', 'select TimeGenerated, EventID from System order by TimeGenerated')
print lp.headers
print lp.footers

Output:

>>
['TimeGenerated,EventID']
['Elements processed: 66', 'Elements output:    66', 'Execution time:     0.00 seconds']


Example: Data Set

Here we print the data set that was returned from an MS Log Parser query. We use the 'print_dataset()' method. We could have also used the dataset() method to get a reference to the data set directly and then printed it.

#!/usr/bin/env python

import statgen

lp = statgen.LogParser('EVT', 'select TimeGenerated, EventID from System order by TimeGenerated')
lp.print_dataset()

Output:

>>
2006-06-06 10:24:58,6006
2006-06-06 10:25:24,4201
2006-06-06 10:25:28,6005
2006-06-06 10:25:28,6009
2006-06-06 10:25:48,35
2006-06-06 10:27:12,7035
...


Example: Data Row

Here we print a row from the data set that was returned from an MS Log Parser query.

#!/usr/bin/env python

import statgen

lp = statgen.LogParser('EVT', 'select TimeGenerated, EventID from System order by TimeGenerated')
print lp.dataset_row(5)

Output:

>>
2006-06-06 10:27:12,7035


Example: Data Column

Here we print a column from the data set that was returned from an MS Log Parser query. We use a 'for' loop to iterate over the column and print each item to stdout.

#!/usr/bin/env python

import statgen

lp = statgen.LogParser('EVT', 'select TimeGenerated, EventID from System order by TimeGenerated')
for item in lp.dataset_column(1):
   print item

Output:

>>
6006
4201
6005
6009
35
7035
...


Example: Average - Data Column

Here we run a calculation against a column from the data set that was returned from an MS Log Parser query.

#!/usr/bin/env python

import statgen

lp = statgen.LogParser('EVT', 'select TimeGenerated, EventID from System order by TimeGenerated')
print lp.avg(1)

Output:

>>
6439.18461538


Example: Time Series

Here we slice up the data set into a time series of 10 second intervals. We then call dump_series() to print it to stdout.

#!/usr/bin/env python

import statgen

lp = statgen.LogParser('EVT', 'select TimeGenerated, EventID from System order by TimeGenerated')
ts = lp.timeseries(10)
ts.dump_series()

Output:

>>
[6006.0]
[]
[4201.0]
[6005.0, 6009.0]
[]
[35.0]
...


Example: Time Series Calculation

Here we run a calculation against each interval in our time series. We then call dump_calced_series() to print the calculation results to stdout.

#!/usr/bin/env python

import statgen

lp = statgen.LogParser('EVT', 'select TimeGenerated, EventID from System order by TimeGenerated')
ts = lp.timeseries(10)
ts.calc_series('AVG')
ts.dump_calced_series()

Output:

>>
6006.0
0
4201.0
6007.0
0
35.0
...


Example: Graphing Time Series Calculations (to an image)

Here we run a calculation against each interval in our time series. We then call graph_series_image() to create a graph of our results and save it as a png image.

#!/usr/bin/env python

import statgen

lp = statgen.LogParser('EVT', 'select TimeGenerated, EventID from System order by TimeGenerated')
ts = lp.timeseries(10)
ts.calc_series('AVG')
ts.graph_series_image()

Output:

time series graph

Example: Graphing Time Series Calculations (to a Tk GUI panel)

Here we run a calculation against each interval in our time series. We then call graph_series_tk() to create a graph of our results and display it in a GUI Panel (using Tk). This has features for more advanced viewing (zoom, etc).

#!/usr/bin/env python

import statgen

lp = statgen.LogParser('EVT', 'select TimeGenerated, EventID from System order by TimeGenerated')
ts = lp.timeseries(10)
ts.calc_series('AVG')
ts.graph_series_tk()

Output:

time series graph

Copyright © 2006-2007 Corey Goldberg  |