Notes for developers¶

`sqlarray` — Implementation of `SQLarray`¶

SQLarray is a thin wrapper around pysqlite SQL tables. The main features ares that SELECT queries can return numpy.recarrays and the SQLarray.selection() method returns a new SQLarray instance.

numpy arrays can be stored in sql fields which allows advanced table aggregate functions such as histogram.

A number of additional SQL functions are defined.

TODO:	Make object saveable (i.e. store the database on disk instead of memory or dump the memory db and provide a load() method Use hooks for the pickling protocol to make this transparent.

Module content¶

class recsql.sqlarray.KRingbuffer(capacity, *args, **kwargs)¶

Ring buffer with key lookup.

Basically a ringbuffer for the keys and a dict (k,v) that is cleaned up to reflect the keys in the Ringbuffer.

append(k, v)¶

clear()¶: Reinitialize the KRingbuffer to empty.

class recsql.sqlarray.Ringbuffer(capacity, iterable=None)¶: Ring buffer of size capacity; ‘pushes’ data from left and discards on the right.

class recsql.sqlarray.SQLarray(name=None, records=None, filename=None, columns=None, cachesize=5, connection=None, is_tmp=False, **kwargs)¶

A SQL table that returns (mostly) rec arrays.

The SQLarray can be initialized from

an iterable of records (tuples), given in the records keyword argument, and the column names (provided in columns);
a string that contains a simple reStructured Text table (see recsql.rest_table for details);
a numpy.recarray.

Note

SQLite only understands standard Python types and hence has problems with many of the NumPy data types such as numpy.int64. When loading a recarray fails we try to convert all data types automatically to Python types (using recsql.convert.irecarray_to_py()). This might loose precision and/or even fail. It is also slow for larger arrays.

The class takes the following arguments:

SQLarray([name[, records[, columns[, cachesize=5, connection=None, dbfile=":memory:"]]]])¶

Arguments:

name: table name (can be referred to as ‘__self__’ in SQL queries)
records: numpy record array that describes the layout and initializes the table OR any iterable (and then columns must be set, too) OR a string that contains a single, simple reStructured text table (and the table name is set from the table name in the reST table.) If None then simply associate with existing table name.
filename: Alternatively to records, read a reStructured table from filename.
columns: sequence of column names (only used if records does not have attribute dtype.names) [None]
cachesize: number of (query, result) pairs that are cached [5]
connection: If not None, reuse this connection; this adds a new table to the same database, which allows more complicated queries with cross-joins. The table’s connection is available as the attribute T.connection. [None]
dbfile: Normally the db is held in memory (”:memory:”) but if a filename is provided then the underlying SQLite db is held on disk and can be accessed and restored (see SQLarray.save()). Only works with connection = None [”:memory:”]
is_tmp: True: create a tmp table; False: regular table in db [False]

Bugs:

InterfaceError: Error binding parameter 0 - probably unsupported type

In this case the recarray contained types such as numpy.int64 that are not understood by sqlite and which we were not able to convert to a Python type (using recsql.convert.irecarray_to_py()). Either convert the data manually (by setting the numpy dtypes yourself on the recarray, or better: feed a simple list of tuples (“records”) to this class in records. Make sure that these tuples only contain standard Python types. Together with records you will also have to supply the names of the data columns in the keyword argument columns.

If you are reading from a file then it might be simpler to use recsql.sqlarray.SQLarray_fromfile().

SELECT(fields, *args, **kwargs)¶

Execute a simple SQL SELECT statement and returns values as new numpy rec array.

The arguments fields and the additional optional arguments are simply concatenated with additional SQL statements according to the template:

SELECT <fields> FROM __self__ [args]

The simplest fields argument is "*".

Example:

Create a recarray in which students with average grade less than 3 are listed:

result = T.SELECT("surname, subject, year, avg(grade) AS avg_grade",
               "WHERE avg_grade < 3", "GROUP BY surname,subject",
               "ORDER BY avg_grade,surname")

The resulting SQL would be:

SELECT surname, subject, year, avg(grade) AS avg_grade FROM __self__
     WHERE avg_grade < 3
     GROUP BY surname,subject
     ORDER BY avg_grade,surname

Note how one can use aggregate functions such avg().

The string ‘__self__’ is automatically replaced with the table name (T.name); this can be used for cartesian products such as

LEFT JOIN __self__ WHERE ...

Note

See the documentation for sql() for more details on the available keyword arguments and the use of ? parameter interpolation.

close()¶

Clean up (if no more connections to the db exist).

For in-memory: Delete the underlying SQL table from the in-memory database.
For on-disk: save and close connection

connection_count¶

Number of currently open connections to the database.

(Stored in table sqlarray_master.)

has_table(name)¶: Return True if the table name exists in the database.

limits(variable)¶: Return minimum and maximum of variable across all rows of data.

merge(recarray, columns=None)¶

Merge another recarray with the same columns into this table.

Arguments:	recarray numpy record array that describes the layout and initializes the table
Returns:	n number of inserted rows
Raises:	Raises an exception if duplicate and incompatible data exist in the main table and the new one.

merge_table(name)¶

Merge an existing table in the database with the __self__ table.

Executes as 'INSERT INTO __self__ SELECT * FROM <name>'. However, this method is probably used less often than the simpler merge().

Arguments:	name name of the table in the database (must be compatible with __self__)
Returns:	n number of inserted rows

recarray¶: Return underlying SQL table as a read-only record array.

save()¶

Commit changes to file.

Only works if the SQLarray was created with they dbfile = FILENAME keyword. There is currently no way to save a in-memory db.

`recsql.rest_table` — Parse a simple reST table¶

Turn a restructured text simple table into a numpy array. See the Example below for how the table must look like. The module allows inclusion of parameters and data in the documentation itself in a natural way. Thus the parameters are automatically documented and only exist in a single place. The idea is inspired by literate programming and is embodied by the DRY (“Do not repeat yourself”) principle.

Limitations¶

Note that not the full specifications of the original restructured text simple table are supported. In order to keep the parser simple, the following additional restriction apply:

All row data must be on a single line.
Column spans are not supported.
Headings must be single legal SQL and python words as they are used as column names.
Do not use TAB characters to format the table but use spaces. (TABs do not have a well-defined width in number of spaces and will thus lead to wrongly parsed tables.)
The delimiters are used to extract the fields. Only data within the range of the ‘=====’ markers is used. Thus, each column marker must span the whole range of input. Otherwise, data will be lost.
The keyword ‘Table’ must precede the first marker line and the table name must be provided in square brackets; the table name should be a valid SQL identifier.
Currently, only a single table can be present in the string.
Autoconversion of list fields might not always work...

Example¶

The following table is converted:

Table[laureates]: Physics Nobel prize statistics.
=============  ==========  =========
name           age         year
=============  ==========  =========
A. Einstein    42          1921
P. Dirac       31          1933
R. P. Feynman  47          1965
=============  ==========  =========

with

>>> import recsql.rest_table as T
>>> P = T.Table2array(T.__doc__)
>>> P.recarray()
rec.array([(u'A. Einstein', 42, 1921), (u'P. Dirac', 31, 1933),
     (u'R. P. Feynman', 47, 1965)],
    dtype=[('name', '<U52'), ('age', '<i4'), ('year', '<i4')])

Module content¶

The only class that the user really needs to know anything about is recsql.rest_table.Table2array.

class recsql.rest_table.Table2array(string=None, **kwargs)¶

Primitive parser that converts a simple reST table into numpy.recarray.

The table must be the only table in the text. It must look similar to the example below (variable parts in angle brackets, optional in double brackets, everything else must be there, matching is case sensitive, ‘....’ signifies repetition in kind):

Table[<NAME>]: <<CAPTION>>
============  ===========  ======================  ....
<COLNAME 1>   <COLNAME 2>  ....                    ....
============  ===========  ======================  ....
<VALUE>       <VALUE>      <VALUE> <VALUE> ....
....
....
============  ===========  ======================  ....

Rows may not span multiple lines. The column names must be single words and legal python names (no spaces, no dots, not starting with a number).

Field values are converted to one of the following python types: int, float, or str.

If a value is quote with single or double quotation marks then the outermost quotation marks are stripped and the enclosed value treated as a string.

Note

Values such as 001 must be quoted as ‘001’ or they will be interpreted as integers (1 in this case).

__init__(string=None, **kwargs)¶

Table2array(string) –> parser

Arguments:

string: string to be parsed
filename: read from filename instead of string
autoconvert: EXPERIMENTAL. True: replace certain values with special python values (see convert.Autoconverter) and possibly split values into lists (see sep). False: leave everything as it is (numbers as numbers and strings as strings).
mode: mode of the Autoconverter
sep: If set and autoconvert = True then split field values on the separator (using split()) before possible autoconversion. (NOT WORKING PROPERLY YET)

recarray()¶: Return a recarray from the (parsed) string.

exception recsql.rest_table.ParseError¶: Signifies a failure to parse.

`recsql.csv_table` — Parse a simple CSV table¶

Turn a CSV table into a numpy array.

Uses csv (requires python 2.6 or better).

class recsql.csv_table.Table2array(filename=None, tablename='CSV', encoding='utf-8', **kwargs)¶

Read a csv file and provide conversion to a numpy.recarray.

Depending on the arguments, autoconversion of values can take place. See recsql.convert.Autoconverter for details.
Table column headers are always read from the first row of the file.
Empty rows are discarded.

__init__(filename=None, tablename='CSV', encoding='utf-8', **kwargs)¶

Initialize the class.

Arguments:	filename CSV file (encoded with encoding) name name of the table autoconvert EXPERIMENTAL. `True`: replace certain values with special python values (see `convert.Autoconverter`) and possibly split values into lists (see sep). `False`: leave everything as it is (numbers as numbers and strings as strings). mode mode of the `Autoconverter`

recarray()¶: Returns data as numpy.recarray.

recsql.csv_table.make_python_name(s, default=None, number_prefix='N', encoding='utf-8')¶

Returns a unicode string that can be used as a legal python identifier.

Arguments:	s string default use default if s is `None` number_prefix string to prepend if s starts with a number

`recsql.convert` — converting entries of tables and arrays¶

class recsql.convert.Autoconverter(mode='fancy', mapping=None, active=True, sep=False, **kwargs)¶

Automatically convert an input value to a special python object.

The Autoconverter.convert() method turns the value into a special python value and casts strings to the “best” type (see besttype()).

The defaults for the conversion of a input field value to a special python value are:

value	python
‘—’	`None`
‘none’
‘None’
‘’
‘True’	`True`
‘x’
‘X’
‘yes’
‘False’	`False`
‘-‘
‘no’

If the sep keyword is set to a string instead of False then values are split into tuples. Probably the most convenient way to use this is to set sep = True (or None) because this splits on all white space whereas sep = ‘ ‘ would split multiple spaces.

Example

With sep = True: ‘foo bar 22 boing —’ –> (‘foo’, ‘boing’, 22, None)
With sep = ‘,’: 1,2,3,4 –> (1,2,3,4)

convert(x)¶: Convert x (if in the active state)

active¶: If set to True then conversion takes place; False just returns besttype() applid to the value.

__init__(mode='fancy', mapping=None, active=True, sep=False, **kwargs)¶

Initialize the converter.

Arguments:

mode

defines what the converter does

“simple”

convert entries with besttype()

“singlet”

convert entries with besttype() and apply mappings

“fancy”

first splits fields into lists, tries mappings, and does the stuff that “singlet” does

“unicode”

convert all entries with to_unicode()

mapping

any dict-like mapping that supports lookup. If None then the hard-coded defaults are used

active or autoconvert

initial state of the Autoconverter.active toggle. False deactivates any conversion. [True]

sep: character to split on (produces lists); use True or None (!) to split on all white space.
encoding: encoding of the input data [utf-8]
percentify: convert “34.4%” into 0.344 [True]

recsql.convert.besttype(x, encoding='utf-8', percentify=True)¶

Convert string x to the most useful type, i.e. int, float or unicode string.

If x is a quoted string (single or double quotes) then the quotes are stripped and the enclosed string returned. The string can contain any number of quotes, it is only important that it begins and ends with either single or double quotes.

percentify = True turns “34.4%” into the float 0.344.

Note

Strings will be returned as Unicode strings (using unicode()), based on the encoding argument, which is utf-8 by default.

recsql.convert.to_unicode(obj, encoding='utf-8')¶

Convert obj to unicode (if it can be be converted)

from http://farmdev.com/talks/unicode/

recsql.convert.irecarray_to_py(a)¶

Slow conversion of a recarray into a list of records with python types.

Get the field names from a.dtype.names.

Returns:	iterator so that one can handle big input arrays

recsql.convert.pyify(typestr)¶: Return a Python type that most closely represents the type encoded by typestr

recsql.convert.to_pytypes(a)¶

Return array a with all types set to basic Python types.

pyify() is applied to all dtypes of the numpy.ndarray and then recast with numpy.ndarray.astype(). This approach can loose precision.

SQL support¶

`recsql.sqlfunctions` — Functions that enhance a SQLite db¶

This module contains new SQL functions to be added to a SQLite database that can be used in the same way as the builtin functions.

Example:

Add the functions to an existing connection in the following way (assuming that the db connection is available in self.connection):

from sqlfunctions import *
self.connection.create_function("sqrt", 1, _sqrt)
self.connection.create_function("sqr", 1, _sqr)
self.connection.create_function("periodic", 1, _periodic)
self.connection.create_function("pow", 2, _pow)
self.connection.create_function("match", 2, _match)   # implements MATCH
self.connection.create_function("regexp", 2, _regexp) # implements REGEXP
self.connection.create_function("fformat",2,_fformat)
self.connection.create_aggregate("std",1,_Stdev)
self.connection.create_aggregate("stdN",1,_StdevN)
self.connection.create_aggregate("median",1,_Median)
self.connection.create_aggregate("array",1,_NumpyArray)
self.connection.create_aggregate("histogram",4,_NumpyHistogram)
self.connection.create_aggregate("distribution",4,_NormedNumpyHistogram)
self.connection.create_aggregate("meanhistogram",5,_MeanHistogram)
self.connection.create_aggregate("stdhistogram",5,_StdHistogram)
self.connection.create_aggregate("minhistogram",5,_MinHistogram)
self.connection.create_aggregate("maxhistogram",5,_MaxHistogram)
self.connection.create_aggregate("medianhistogram",5,_MedianHistogram)
self.connection.create_aggregate("zscorehistogram",5,_ZscoreHistogram)

Module content¶

recsql.sqlfunctions.regularized_function(x, y, func, bins=None, range=None)¶

Compute func() over data aggregated in bins.

(x,y) –> (x’, func(Y’)) with Y’ = {y: y(x) where x in x’ bin}

First the data is collected in bins x’ along x and then func is applied to all data points Y’ that have been collected in the bin.

Arguments:	x abscissa values (for binning) y ordinate values (func is applied) func a numpy ufunc that takes one argument, func(Y’) bins number or array range limits (used with number of bins)
Returns:	F,edges function and edges (midpoints = 0.5*(edges[:-1]+edges[1:]))

`sqlutils` – Helper functions¶

Helper functions that are used throughout the recsql package.

How to use the sql converters and adapters:

Declare types as ‘NumpyArray’:

cur.execute("CREATE TABLE test(a NumpyArray)")
cur.execute("INSERT INTO test(a) values (?)", (my_array,))

or as column types:

cur.execute('SELECT a as "a [NumpyArray]" from test')

Module content¶

class recsql.sqlutil.FakeRecArray(iterable, columns)¶

Pseudo recarray that is used to feed SQLarray:

Must only implement:

recarray.dtype.names sequence of column names iteration yield records

recsql.sqlutil.adapt_numpyarray(a)¶: adapter: store numpy arrays in the db as ascii pickles

recsql.sqlutil.adapt_object(a)¶: adapter: store python objects in the db as ascii pickles

recsql.sqlutil.convert_numpyarray(s)¶: converter: retrieve numpy arrays from the db as ascii pickles

recsql.sqlutil.convert_object(s)¶: convertor: retrieve python objects from the db as ascii pickles

Notes for developers¶

sqlarray — Implementation of SQLarray¶

Module content¶

recsql.rest_table — Parse a simple reST table¶

Limitations¶

Example¶

Module content¶

recsql.csv_table — Parse a simple CSV table¶

recsql.convert — converting entries of tables and arrays¶

SQL support¶

recsql.sqlfunctions — Functions that enhance a SQLite db¶

Module content¶

sqlutils – Helper functions¶

Module content¶

`sqlarray` — Implementation of `SQLarray`¶

`recsql.rest_table` — Parse a simple reST table¶

`recsql.csv_table` — Parse a simple CSV table¶

`recsql.convert` — converting entries of tables and arrays¶

`recsql.sqlfunctions` — Functions that enhance a SQLite db¶

`sqlutils` – Helper functions¶