in070304
-----Original Message-----
From: John Murtari [mailto:jmurtari@thebook.com]
Sent: Saturday, February 24, 2007 3:15 PM
To: Bennett, Barbara; ANSI, ATTN PSA Dept.
Cc: Keith W. Hare; Deutsch, Donald
Subject: Public Review Comments on Proposed SQL Standard: ISO/IEC FCD
9075-2
DATE: February 24, 2007
REGARDING:
Public Review Comment on Proposed SQL standard: ISO/IEC FCD 9075-2
Title: Information technology - Database Languages - SQL - Part
2:
Foundation (SQL/Foundation)
TO: "Barbara Bennett" <bbennett@itic.org>, "ANSI, ATTN PSA
Dept." <psa@ansi.org>
CC: "Keith W. Hare" <Keith@jcc.com>, "Don Deutsch"
<Donald.deutsch@oracle.com>
SUBMITTED BY: John Murtari, jmurtari@thebook.com,
Software
Workshop Inc., 55 E. Genesee St., Baldwinsville, NY 13027
Office:
315-635-1968, x-211
Part I: SUMMARY OF PROPOSED CHANGE
That the SQL standard be extended with the addition of an optional
language statement to display increased statistical data regarding
database server activity. This change to be referred to as
'Extended Statistics'.
1. Justification:
The advent of the Web has seen an explosion of SQL database usage in a
user community which has less training as DBAs and has high performance
requirements. At present statistical reporting is a very implementation
specific area. If statistics are available they are usually at
the server level, with very little support for reporting activity at
the individual database or user. Some implementations require the
DBA configure complex modules to achieve usage recording.
The goal of this change is to provide DBAs and also application
developers with an easy way to monitor database server activity. To
improve user satisfaction with their SQL database by making it easier
to monitor performance and identify bad application usage patterns and
other bottlenecks.
2. Syntax Summary/Examples:
The summary syntax for the proposed new statement is:
SHOW STATISTICS ( * | Var list) FROM Class [WHERE Var ( '<' |
'>' | '=' ) num]
[LIKE 'Instance pattern'] [ORDER BY Var]
[HISTORY] [LIMIT num]
In usage a DBA could enter:
SHOW STATISTICS select, update, insert FROM user WHERE
select > 100 HISTORY LIMIT 24
and see a display of all user activity for the last 24 hours in hourly
increments with the number of queries of each type issued by each
system user.
An implementation specific configuration element would allow the DBA to
call for collection of specified data, e.g.
statistics_class_list="
user,max-50,time-60,units-m,(queries,select,update,bytes_sent),
db,max-50,time-10,units-h,(bytes_sent,aborted_clients,qcache_hits),
host,max-5,time-3,units-d,(select,update,connects,bytes_sent)"
In the above we want the server to track information on three different
Classes: user, db, and host.
For each Class, the 'max-' precedes the number of Instances of that
Class to be supported. The 'time-' precedes the number of
historical time units to store and 'units-' designates the period of
interest:
'm' minutes, 'h' hours, and 'd' days.
3. Alternatives:
The present standard supports the INFORMATION_SCHEMA as a way to
support meta data collection from the server.
standard: SELECT SCHEMA_NAME AS `Database` FROM
INFORMATION_SCHEMA.SCHEMATA
[WHERE SCHEMA_NAME LIKE 'wild']
MySQL alternate: SHOW DATABASE [LIKE 'wild']
This may present a valid approach, although with a goal of ease of use,
the amount of syntax required may be intimidating. Perhaps either
implementation could be allowed.
4. Performance:
The collection of statistical data with finer granularity and the
storage of historical data will have a performance impact on the
database server. The amount of impact is server dependent.
This definition allows each implementation to define what can be
collected and gives DBAs control over the extent and frequency of data
collection.
5. Security:
At present some database servers allow any user access to gross figures
on server performance, i.e. total traffic, number of queries,
etc... This proposed change would allow data to be collected and
reported on a per user/host/database basis. This would definitely
call for better control and each implementation should support a
configuration element which specifies the users allowed to have
statistical access, e.g.
statistics_users="john,domenic,christina"
Part II: SYNTAX and SEMANTIC DETAILS
(Note: I apologize for not including BNF productions for the proposed
syntax. Time was limited as these review comments were prepared.)
1. Overview
In Extended Statistics the different types of items that can be
reported on are called Classes. In its present version these are:
db - a database.
user - a user.
host - the host machine of a connection.
server - a database server.
Within a Class, specific instantiations are called instances. An
instance of a user could be domenic or christina. Implementations
would be allowed to define additional Classes.
For each instance we are allowed to track individual items called
Vars. These could be items such as: number of select queries,
bytes sent, cache hits, aborted connections. Some Vars would be
shared among Classes, other could only have meaning when a specific
class is selected.
The implementation would define the total available set of Classes and
Vars supported. A configuration element would allow the DBA to
configure the extent of data collection, e.g.
statistics_class_list="
user,max-50,time-60,units-m,(queries,select,update,bytes_sent),
db,max-50,time-10,units-h,(bytes_sent,aborted_clients,qcache_hits),
host,max-5,time-3,units-d,(select,update,connects,bytes_sent)"
For each Class, the 'max-' precedes the number of Instances of that
Class to be supported. The 'time-' precedes the number of
historical time units to store and 'units-' designates the period of
interest:
'm' minutes, 'h' hours, and 'd' days.
2. Language syntax
A goal of the implementation was to keep the syntax used to query the
statistics as close to standard SQL as possible. The syntax
described below should look very familiar, especially if you replace
SHOW STATISTICS with SELECT.
SHOW STATISTICS ( * | Var list) FROM Class [WHERE var ( '<' | '>'
| '=' ) num]
[LIKE 'Instance pattern'] [ORDER BY Var]
[HISTORY] [LIMIT num]
The significant differences are the use of LIKE to match specific
Instances and the new keyword HISTORY to produce historical output.
The LIMIT option operates differently depending on whether HISTORY has
been specified. Without HISTORY it operates as in normal SQL,
limiting the number of result rows displayed. If HISTORY is
chosen it functions as a time limit, e.g. show me just the last 3 time
units for each user.
3. Usage examples
# Limiting output to just two rows.
mysql> SHOW STATISTICS * FROM user LIMIT 2;
+-------------+--------+--------+------------+
| user | select | update |
Bytes_sent |
+-------------+-----------------+------------+
| kelly |
8 | 0 |
27350 |
| brian |
1 | 0 |
28303 |
+-------------+--------+--------+------------+
2 rows in set (0.00 sec)
# Using HISTORY to see historical data. 0 represents the current hour,
# -1, -2, etc. are the prior hours. Use of WHERE clause to remove #
uninteresting data.
mysql> SHOW STATISTICS
mysql> select,insert,Queries,Connects,Qcache_hits,Bytes_sent
FROM user WHERE select > 5000
HISTORY limit 2;
+----------+-------+--------+--------+---------+----------+-------------+------------+
| user | hours | select | insert | Queries |
Connects | Qcache_hits | Bytes_sent |
+----------+-------+--------+--------+---------+----------+-------------+------------+
| soulmate | 0 | 5289 |
127 | 14623 |
548 |
7432 |
5775053 |
| soulmate | -1 | 22502 | 590
| 60238 | 2717 |
28737 | 25642797 |
| mandala | -1 | 39282 |
659 | 156091 | 1971 |
111875 | 117291646 |
| handymao | -3 | 5082 |
21 | 11549 |
991 |
2998 |
6681964 |
+----------+-------+--------+--------+---------+----------+-------------+------------+
4 rows in set (0.00 sec)
# Using LIKE to limit hosts displayed. The LIKE clause automatically
matches # against the instance of the class.
mysql> SHOW STATISTICS * FROM host LIKE '%kids%' ORDER BY Bytes_sent;
+--------------------+--------+--------+------------+
|
host
| select | update | Bytes_sent |
+--------------------+--------+--------+------------+
| www.akidsright.org | 8 |
0 | 53113 |
| s1.kids-right.org | 2 |
0 |
495 |
+--------------------+--------+--------+------------+
2 rows in set (0.00
sec)
Part III: FUTURE OPTIONS / EXPERIENCE / EVALUATION
A working implementation of Extended Statistics will soon be released
for the MySQL database server. Committee members may have access
for test purposes to a prerelease version of a MySQL server supporting
Extended Statistics.
In our experience as a hosting provider, we have many different users
accessing MySQL servers in support of their websites. We were
frustrated by an inability to easily track high server utilization to a
specific user/database. Extended Statistics has allowed our DBAs
to monitor usage and has become a very useful tool.
While the statistical data itself is read only, we can already see a
need to allow a DBA to define and redefine Classes and Vars to be
tracked while the server is running. It would seem a very small
reach to also define a 'trigger' mechanism that would alert a DBA when
usage patterns exceeded a preset threshold.
I welcome feedback from all on this proposal and the guidance from
committee members who have much more database and standards experience.
Best regards!
--
John ___________________________________________________________________
John
Murtari
Software Workshop Inc.
jmurtari@thebook.com 315.635.1968(x-211) "TheBook.Com" (TM)
http://www.thebook.com/