On Dolphins, Panda's and Bugs

Like any good OpenSource project the MySQL Bugs website is open for anyone to search through. This ofcourse doesn't include the security bugs.

There is a second collection of bugs in the My Oracle Support and these bugs are only accesseble by customers with a support contract. Even when I have access to MOS I still prefer to use the community bugs site. For service requests etc. I would use MOS.

The openness of the bugs database is one of the topic the IOUG MySQL Council discusses with Oracle.

The bugs database has more to offer than just information about initial bugs:

Bugs Statistics: This has a big matrix with components and states, some per developer stats and some totals (Did you known the fastest bug closure was 9 seconds?).
Bugs Tide This gives you per month statistics.

For both there are some filter option for version and whether to include feature requests.

You might want to read life cycle of a MySQL bug by Valeriy Kravchuk if you're not familiar with the different states of MySQL bugs.

The bugs website offers search results not only in the HTML reports as shown on the website but also offers RSS and CSV format so it can be consumed by computer programs.

To demostrate this I wrote this script. It fetches the CSV for my bugs and generates some graphs. As you might (or might not) notice this doesn't include my older bug reports as I have two different reporter-id's as I had different email accounts for my MySQL account and my My Oracle account before MySQL/Sun was merged with Oracle.

This page is made with IPython Notebook, which is a tool which allows you to combine code, output, graphs and text. You can also view this notebook here.

%pylab inline

Populating the interactive namespace from numpy and matplotlib

import requests
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from StringIO import StringIO
from datetime import datetime

This is the URL as found in the bugs website. Feel free to modify the parameters.

bugs_url = "http://bugs.mysql.com/search-csv.php?status=all&severity=all&mine=9242646"

Fetch the data from the URL with the Requests library
Convert the CSV data to a Pandas DataFrame (a DataFrame is a special kind of table)

bugs_req = requests.get(bugs_url)
csv_file = StringIO()
csv_file.write(bugs_req.text)
csv_file.seek(0)
bugsdf = pd.read_csv(csv_file)

Let's have a look at the data.

bugsdf.head()

Now we have the data, let's make a horizontal barplot for the number of bugs per category (the Type column).

We change the size of the figure as the default is too small to be readable.

fig = plt.figure(figsize=(8,10), dpi=100)
bugsdf.Type.value_counts(ascending=True).plot(kind='barh')

<matplotlib.axes.AxesSubplot at 0x394de90>

The Version column has a the text for the version. Let's grab the first three characters to get the major version of for the bug. This is not really perfect as it will only return 5.1 if the string is '5.1.30, 5.5.16', but it's good enough for now.

The function will be mapped to the Version column of the dataframe. And we will save the result in a new column called major_version.

def getversion(inputver):
    if isinstance(inputver, float):
        return inputver
    return inputver[:3]
bugsdf['major_version'] = bugsdf.Version.map(getversion)

Besides Pandas we can use matplotlib's pyplot, which is a bit like MATLAB.

Let's create a heatmap for bug status and category

compstat = bugsdf.groupby('major_version').Type.value_counts().unstack().T
fig = plt.figure(figsize=(15,10), dpi=100)
plt.pcolor(compstat, vmin=0, vmax=5, cmap='Blues')
plt.yticks(np.arange(0.5, len(compstat.index), 1), compstat.index)
plt.xticks(np.arange(0.5, len(compstat.columns), 1), compstat.columns)

([<matplotlib.axis.XTick at 0x3c067d0>,
  <matplotlib.axis.XTick at 0x3d87410>,
  <matplotlib.axis.XTick at 0x3f2d550>,
  <matplotlib.axis.XTick at 0x3f2da50>,
  <matplotlib.axis.XTick at 0x3f2df50>,
  <matplotlib.axis.XTick at 0x3f15490>,
  <matplotlib.axis.XTick at 0x3f15990>,
  <matplotlib.axis.XTick at 0x3f15e90>,
  <matplotlib.axis.XTick at 0x40983d0>,
  <matplotlib.axis.XTick at 0x4098b10>,
  <matplotlib.axis.XTick at 0x3d8f2d0>],
 <a list of 11 Text xticklabel objects>)

Now we can create a heatmap which compares major versions and components.

fig = plt.figure(figsize=(8,10), dpi=100)
plt.pcolor(compstat, vmin=0, vmax=10, cmap='Blues')
plt.yticks(np.arange(0.5, len(compstat.index), 1), compstat.index)
plt.xticks(np.arange(0.5, len(compstat.columns), 1), compstat.columns)

([<matplotlib.axis.XTick at 0x3f15bd0>,
  <matplotlib.axis.XTick at 0x3f176d0>,
  <matplotlib.axis.XTick at 0x42bcad0>,
  <matplotlib.axis.XTick at 0x42bcfd0>,
  <matplotlib.axis.XTick at 0x4459d50>,
  <matplotlib.axis.XTick at 0x4455710>,
  <matplotlib.axis.XTick at 0x4451310>,
  <matplotlib.axis.XTick at 0x42cdc50>,
  <matplotlib.axis.XTick at 0x42c9810>,
  <matplotlib.axis.XTick at 0x42c51d0>,
  <matplotlib.axis.XTick at 0x42bdb10>],
 <a list of 11 Text xticklabel objects>)

Now we have the major version, let's filter on MySQL 5.6 and then graph the different values for the Status field.

bugsdf[bugsdf.major_version == '5.6'].Status.value_counts().plot(kind='bar')

<matplotlib.axes.AxesSubplot at 0x4451e10>

Or maybe check the status for all versions.

bugsdf.groupby('major_version').Status.value_counts().unstack().plot(kind='barh', stacked=True)

<matplotlib.axes.AxesSubplot at 0x446f790>

That's not really helpful, let's remove the Closed ones.

bugsdf[bugsdf.Status != 'Closed'].groupby('major_version').Status.value_counts().unstack().plot(kind='bar', stacked=True)

<matplotlib.axes.AxesSubplot at 0x4464650>

The Entered and Modified fields are not yet in a 'real' date format. So these must be coverted before we can use them.

bugsdf['Entered'] = pd.to_datetime(bugsdf.Entered)
bugsdf['Modified'] = pd.to_datetime(bugsdf.Modified)

Which are the oldest open bugs?

The - before the bugsdf.Status.isin reverses the result, so it behave like 'is not in'.

bugsdf[-bugsdf.Status.isin(['Closed', 'Duplicate', 'Won\'t fix', 'Can\'t repeat'])].sort(columns='Entered')[:5]

Now let's find bugs which are open and not modified in a some time.

bugsdf[-bugsdf.Status.isin(['Closed', 'Duplicate', 'Won\'t fix', 'Can\'t repeat'])] \
      [bugsdf.Modified < datetime(2013, 6, 1)] \
      [bugsdf.Modified > datetime(1970, 1, 1)]

Besides searching for bugs by reporter you can also use this to search on other conditions.

	ID	Entered	Modified	Type	Status	Severity	Version	OS	Summary	Assign First	Assign Last
0	68132	2013-01-21 13:53:28	2013-03-26 16:14:20	Server: Docs	Closed	S3	5.6, 5.5.29	Any	Documentation about verification with GnuPG is...	Philip	Olson
1	68199	2013-01-28 08:23:59	2013-05-13 13:10:00	bugs.mysql.com	Closed	S3	5.7	Any	Month-by-month (tide) stats for 5.7 absent	Sveta	Smirnova
2	68200	2013-01-28 08:54:53	2013-03-10 16:03:34	Server: Privileges	Closed	S3	5.6.9-rc	Any	Password logging doesn't work as documented	NaN	NaN
3	68536	2013-03-01 10:29:46	2013-07-31 13:24:26	MySQL Workbench	Closed	S3	5.2.47	Any	Better support IPv6 addresses for new connections	NaN	NaN
4	68571	2013-03-05 09:08:23	2014-02-26 09:41:24	Monitoring: Advisors/Rules	Verified	S3	NaN	Microsoft Windows (Win7 SP1)	False positive for 32-binary check on win64	NaN	NaN

	ID	Entered	Modified	Type	Status	Severity	Version	OS	Summary	Assign First	Assign Last	major_version
4	68571	2013-03-05 09:08:23	2014-02-26 09:41:24	Monitoring: Advisors/Rules	Verified	S3	NaN	Microsoft Windows (Win7 SP1)	False positive for 32-binary check on win64	NaN	NaN	NaN
5	68574	2013-03-05 13:42:26	2014-02-23 11:25:48	Server: InnoDB	Verified	S3	5.5.30, 5.6.16	Microsoft Windows (win7 sp1 ent)	No instrumentation for InnoDB files in P_S on ...	Marc	Alff	5.5
19	68925	2013-04-11 05:57:29	2013-04-11 06:15:40	Client	Verified	S3	5.6.10, 5.5.29	Any	Compatibility issue with mysql history ("\040"...	NaN	NaN	5.6
22	69147	2013-05-05 10:07:16	2013-07-28 14:43:57	Server: I_S	Verified	S4	5.7	Any	Variable defaults in information_schema	NaN	NaN	5.7
33	69223	2013-05-14 06:22:48	2013-06-24 20:03:16	Server: DDL	Verified	S4	5.6.10	Any	Give a warning on CREATE TABLE without PRIMARY...	NaN	NaN	5.6

	ID	Entered	Modified	Type	Status	Severity	Version	OS	Summary	Assign First	Assign Last	major_version
19	68925	2013-04-11 05:57:29	2013-04-11 06:15:40	Client	Verified	S3	5.6.10, 5.5.29	Any	Compatibility issue with mysql history ("\040"...	NaN	NaN	5.6
35	69226	2013-05-14 10:31:09	2013-05-14 16:09:09	Server: Options	Verified	S3	5.6.11	Any	Status variable for SSL/TLS implementation	NaN	NaN	5.6
36	69314	2013-05-24 21:33:48	2013-05-25 07:55:24	Server: Options	Verified	S3	5.7.1	Any	ignore-db-dir option doesn't work for database...	NaN	NaN	5.7

Daniël's Database Blog

Monday, June 23, 2014

On Dolphins, Panda's and Bugs

On Dolphins, Panda's and Bugs

No comments:

Post a Comment