# openquake.baselib package¶

## general¶

Utility functions of general interest.

class `openquake.baselib.general.``AccumDict`(dic=None, accum=None, **kw)[source]

Bases: `dict`

An accumulating dictionary, useful to accumulate variables:

```>> acc = AccumDict()
>> acc += {'a': 1}
>> acc += {'a': 1, 'b': 1}
>> acc
{'a': 2, 'b': 1}
>> {'a': 1} + acc
{'a': 3, 'b': 1}
>> acc + 1
{'a': 3, 'b': 2}
>> 1 - acc
{'a': -1, 'b': 0}
>> acc - 1
{'a': 1, 'b': 0}
```

Also the multiplication has been defined:

```>> prob1 = AccumDict(a=0.4, b=0.5)
>> prob2 = AccumDict(b=0.5)
>> prob1 * prob2
{'a': 0.4, 'b': 0.25}
>> prob1 * 1.2
{'a': 0.48, 'b': 0.6}
>> 1.2 * prob1
{'a': 0.48, 'b': 0.6}
```

It is very common to use an AccumDict of accumulators; here is an example using the empty list as accumulator:

```>>> acc = AccumDict(accum=[])
>>> acc['a'] += [1]
>>> acc['b'] += [2]
>>> sorted(acc.items())
[('a', [1]), ('b', [2])]
```

The implementation is smart enough to make (deep) copies of the accumulator, therefore each key has a different accumulator, which initially is the empty list (in this case).

`apply`(func, *extras)[source]

>> a = AccumDict({‘a’: 1, ‘b’: 2}) >> a.apply(lambda x, y: 2 * x + y, 1) {‘a’: 3, ‘b’: 5}

class `openquake.baselib.general.``CallableDict`(keyfunc=<function <lambda>>, keymissing=None)[source]

Bases: `collections.OrderedDict`

A callable object built on top of a dictionary of functions, used as a smart registry or as a poor man generic function dispatching on the first argument. It is typically used to implement converters. Here is an example:

```>>> format_attrs = CallableDict()  # dict of functions (fmt, obj) -> str
```
```>>> @format_attrs.add('csv')  # implementation for csv
... def format_attrs_csv(fmt, obj):
...     items = sorted(vars(obj).items())
...     return '\n'.join('%s,%s' % item for item in items)
```
```>>> @format_attrs.add('json')  # implementation for json
... def format_attrs_json(fmt, obj):
...     return json.dumps(vars(obj))
```

format_attrs(fmt, obj) calls the correct underlying function depending on the fmt key. If the format is unknown a KeyError is raised. It is also possible to set a keymissing function to specify what to return if the key is missing.

For a more practical example see the implementation of the exporters in openquake.calculators.export

`add`(*keys)[source]

Return a decorator registering a new implementation for the CallableDict for the given keys.

exception `openquake.baselib.general.``CodeDependencyError`[source]

Bases: `exceptions.Exception`

exception `openquake.baselib.general.``DeprecationWarning`[source]

Bases: `exceptions.UserWarning`

Raised the first time a deprecated function is called

class `openquake.baselib.general.``DictArray`(imtls)[source]

Bases: `_abcoll.Mapping`

A small wrapper over a dictionary of arrays serializable to HDF5:

```>>> d = DictArray({'PGA': [0.01, 0.02, 0.04], 'PGV': [0.1, 0.2]})
>>> from openquake.baselib import hdf5
>>> with hdf5.File('/tmp/x.h5', 'w') as f:
...      f['d'] = d
...      f['d']
<DictArray
PGA: [ 0.01  0.02  0.04]
PGV: [ 0.1  0.2]>
```

The DictArray maintains the lexicographic order of the keys.

`new`(array)[source]

Convert an array of compatible length into a DictArray:

```>>> d = DictArray({'PGA': [0.01, 0.02, 0.04], 'PGV': [0.1, 0.2]})
>>> d.new(numpy.arange(0, 5, 1))  # array of lenght 5 = 3 + 2
<DictArray
PGA: [0 1 2]
PGV: [3 4]>
```
class `openquake.baselib.general.``WeightedSequence`(seq=())[source]

Bases: `_abcoll.MutableSequence`

A wrapper over a sequence of weighted items with a total weight attribute. Adding items automatically increases the weight.

`insert`(i, item_weight)[source]

Insert an item with the given weight in the sequence

classmethod `merge`(ws_list)[source]

Merge a set of WeightedSequence objects.

Parameters: ws_list – a sequence of :class: openquake.baselib.general.WeightedSequence instances a `openquake.baselib.general.WeightedSequence` instance
`openquake.baselib.general.``assert_close`(a, b, rtol=1e-07, atol=0, context=None)[source]

Compare for equality up to a given precision two composite objects which may contain floats. NB: if the objects are or contain generators, they are exhausted.

Parameters: a – an object b – another object rtol – relative tolerance atol – absolute tolerance
`openquake.baselib.general.``assert_independent`(package, *packages)[source]
Parameters: package – Python name of a module/package packages – Python names of modules/packages

Make sure the package does not depend from the packages.

`openquake.baselib.general.``block_splitter`(items, max_weight, weight=<function <lambda>>, kind=<function nokey>)[source]
Parameters: items – an iterator over items max_weight – the max weight to split on weight – a function returning the weigth of a given item kind – a function returning the kind of a given item

Group together items of the same kind until the total weight exceeds the max_weight and yield WeightedSequence instances. Items with weight zero are ignored.

For instance

```>>> items = 'ABCDE'
>>> list(block_splitter(items, 3))
[<WeightedSequence ['A', 'B', 'C'], weight=3>, <WeightedSequence ['D', 'E'], weight=2>]
```

The default weight is 1 for all items.

`openquake.baselib.general.``ceil`(a, b)[source]

Divide a / b and return the biggest integer close to the quotient.

Parameters: a – a number b – a positive number the biggest integer close to the quotient
`openquake.baselib.general.``deprecated`(message)[source]

Return a decorator to make deprecated functions.

Parameters: message – the message to print the first time the deprecated function is used.

Here is an example of usage:

```>>> @deprecated('Use new_function instead')
... def old_function():
...     'Do something'
```

Notice that if the function is called several time, the deprecation warning will be displayed only the first time.

`openquake.baselib.general.``distinct`(keys)[source]

Return the distinct keys in order.

`openquake.baselib.general.``get_array`(array, **kw)[source]

Extract a subarray by filtering on the given keyword arguments

`openquake.baselib.general.``git_suffix`(fname)[source]
Returns: if Git repository found
`openquake.baselib.general.``group_array`(array, *kfields)[source]

Convert an array into an OrderedDict kfields -> array

`openquake.baselib.general.``groupby`(objects, key, reducegroup=<type 'list'>)[source]
Parameters: objects – a sequence of objects with a key value key – the key function to extract the key value reducegroup – the function to apply to each group an OrderedDict {key value: map(reducegroup, group)}
```>>> groupby(['A1', 'A2', 'B1', 'B2', 'B3'], lambda x: x[0],
...         lambda group: ''.join(x[1] for x in group))
OrderedDict([('A', '12'), ('B', '123')])
```
`openquake.baselib.general.``groupby2`(records, kfield, vfield)[source]
Parameters: records – a sequence of records with positional or named fields kfield – the index/name/tuple specifying the field to use as a key vfield – the index/name/tuple specifying the field to use as a value an list of pairs of the form (key, [value, ...]).
```>>> groupby2(['A1', 'A2', 'B1', 'B2', 'B3'], 0, 1)
[('A', ['1', '2']), ('B', ['1', '2', '3'])]
```

Here is an example where the keyfield is a tuple of integers:

```>>> groupby2(['A11', 'A12', 'B11', 'B21'], (0, 1), 2)
[(('A', '1'), ['1', '2']), (('B', '1'), ['1']), (('B', '2'), ['1'])]
```
`openquake.baselib.general.``humansize`(nbytes, suffixes=('B', 'KB', 'MB', 'GB', 'TB', 'PB'))[source]

Return file size in a human-friendly format

`openquake.baselib.general.``import_all`(module_or_package)[source]

If module_or_package is a module, just import it; if it is a package, recursively imports all the modules it contains. Returns the names of the modules that were imported as a set. The set can be empty if the modules were already in sys.modules.

`openquake.baselib.general.``nokey`(item)[source]

Dummy function to apply to items without a key

`openquake.baselib.general.``run_in_process`(code, *args)[source]

Run in an external process the given Python code and return the output as a Python object. If there are arguments, then code is taken as a template and traditional string interpolation is performed.

Parameters: code – string or template describing Python code args – arguments to be used for interpolation the output of the process, as a Python object
`openquake.baselib.general.``safeprint`(*args, **kwargs)[source]

Convert and print characters using the proper encoding

`openquake.baselib.general.``search_module`(module, syspath=['/usr/local/bin', '/var/lib/jenkins/jobs/master_oq-hazardlib/workspace_master_2400', '/usr/lib/python2.7', '/usr/lib/python2.7/plat-x86_64-linux-gnu', '/usr/lib/python2.7/lib-tk', '/usr/lib/python2.7/lib-old', '/usr/lib/python2.7/lib-dynload', '/usr/local/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages/PILcompat'])[source]

Given a module name (possibly with dots) returns the corresponding filepath, or None, if the module cannot be found.

Parameters: module – (dotted) name of the Python module to look for syspath – a list of directories to search (default sys.path)
`openquake.baselib.general.``split_in_blocks`(sequence, hint, weight=<function <lambda>>, key=<function nokey>)[source]

Split the sequence in a number of WeightedSequences close to hint.

Parameters: sequence – a finite sequence of items hint – an integer suggesting the number of subsequences to generate weight – a function returning the weigth of a given item key – a function returning the key of a given item

The WeightedSequences are of homogeneous key and they try to be balanced in weight. For instance

```>>> items = 'ABCDE'
>>> list(split_in_blocks(items, 3))
[<WeightedSequence ['A', 'B'], weight=2>, <WeightedSequence ['C', 'D'], weight=2>, <WeightedSequence ['E'], weight=1>]
```
`openquake.baselib.general.``split_in_slices`(number, num_slices)[source]
Parameters: number – a positive number to split in slices num_slices – the number of slices to return (at most) a list of slices
```>>> split_in_slices(4, 2)
[slice(0, 2, None), slice(2, 4, None)]
>>> split_in_slices(5, 1)
[slice(0, 5, None)]
>>> split_in_slices(5, 2)
[slice(0, 3, None), slice(3, 5, None)]
>>> split_in_slices(2, 4)
[slice(0, 1, None), slice(1, 2, None)]
```
`openquake.baselib.general.``writetmp`(content=None, dir=None, prefix='tmp', suffix='tmp')[source]

Create temporary file with the given content.

Please note: the temporary file must be deleted by the caller.

Parameters: content (string) – the content to write to the temporary file. dir (string) – directory where the file should be created prefix (string) – file name prefix suffix (string) – file name suffix a string with the path to the temporary file

## hdf5¶

class `openquake.baselib.hdf5.``ByteCounter`(nbytes=0)[source]

Bases: `object`

A visitor used to measure the dimensions of a HDF5 dataset or group. Use it as ByteCounter.get_nbytes(dset_or_group).

classmethod `get_nbytes`(dset)[source]
class `openquake.baselib.hdf5.``File`(name, mode=None, driver=None, libver=None, userblock_size=None, swmr=False, **kwds)[source]

Bases: `h5py._hl.files.File`

Subclass of `h5py.File` able to store and retrieve objects conforming to the HDF5 protocol used by the OpenQuake software. It works recursively also for dictionaries of the form name->obj.

```>>> f = File('/tmp/x.h5', 'w')
>>> f['dic'] = dict(a=dict(x=1, y=2), b=3)
>>> dic = f['dic']
>>> dic['a']['x'].value
1
>>> dic['b'].value
3
>>> f.close()
```
`save`(nodedict, root='')[source]

Save a node dictionary in the .hdf5 file, starting from the root dataset. A common application is to convert XML files into .hdf5 files, see the usage in `openquake.commands.to_hdf5`.

Parameters: nodedict – a dictionary with keys ‘tag’, ‘attrib’, ‘text’, ‘nodes’
`set_nbytes`(key, nbytes=None)[source]

Set the nbytes attribute on the HDF5 object identified by key.

classmethod `temporary`()[source]

Returns a temporary hdf5 file, open for writing. The temporary name is stored in the .path attribute. It is the user responsability to remove the file when closed.

class `openquake.baselib.hdf5.``LiteralAttrs`[source]

Bases: `object`

A class to serialize a set of parameters in HDF5 format. The goal is to store simple parameters as an HDF5 table in a readable way. Each parameter can be retrieved as an attribute, given its name. The implementation treats specially dictionary attributes, by storing them as attrname.keyname strings, see the example below:

```>>> class Ser(LiteralAttrs):
...     def __init__(self, a, b):
...         self.a = a
...         self.b = b
>>> ser = Ser(1, dict(x='xxx', y='yyy'))
>>> arr, attrs = ser.__toh5__()
>>> for k, v in arr:
...     print('%s=%s' % (k, v))
a=1
b.x='xxx'
b.y='yyy'
>>> s = object.__new__(Ser)
>>> s.__fromh5__(arr, attrs)
>>> s.a
1
>>> s.b['x']
'xxx'
```

The implementation is not recursive, i.e. there will be at most one dot in the serialized names (in the example here a, b.x, b.y).

class `openquake.baselib.hdf5.``PickleableSequence`(objects)[source]

Bases: `_abcoll.Sequence`

An immutable sequence of pickleable objects that can be serialized in HDF5 format. Here is an example, using the LiteralAttrs class defined in this module, but any pickleable class would do:

```>>> seq = PickleableSequence([LiteralAttrs(), LiteralAttrs()])
>>> with File('/tmp/x.h5', 'w') as f:
...     f['data'] = seq
>>> with File('/tmp/x.h5') as f:
...     f['data']
(<LiteralAttrs >, <LiteralAttrs >)
```
`openquake.baselib.hdf5.``array_of_vstr`(lst)[source]
Parameters: lst – a list of strings or bytes an array of variable length ASCII strings
`openquake.baselib.hdf5.``cls2dotname`(cls)[source]

The full Python name (i.e. pkg.subpkg.mod.cls) of a class

`openquake.baselib.hdf5.``create`(hdf5, name, dtype, shape=(None, ), compression=None, fillvalue=0, attrs=None)[source]
Parameters: hdf5 – a h5py.File object name – an hdf5 key string dtype – dtype of the dataset (usually composite) shape – shape of the dataset (can be extendable) compression – None or ‘gzip’ are recommended attrs – dictionary of attributes of the dataset a HDF5 dataset
`openquake.baselib.hdf5.``dotname2cls`(dotname)[source]

The class associated to the given dotname (i.e. pkg.subpkg.mod.cls)

`openquake.baselib.hdf5.``extend`(dset, array)[source]

Extend an extensible dataset with an array of a compatible dtype.

Parameters: dset – an h5py dataset array – an array of length L the total length of the dataset (i.e. initial length + L)
`openquake.baselib.hdf5.``extend3`(hdf5path, key, array, **attrs)[source]

Extend an HDF5 file dataset with the given array

`openquake.baselib.hdf5.``get_nbytes`(dset)[source]

If the dataset has an attribute ‘nbytes’, return it. Otherwise get the size of the underlying array. Returns None if the dataset is actually a group.

## node¶

This module defines a Node class, together with a few conversion functions which are able to convert NRML files into hierarchical objects (DOM). That makes it easier to read and write XML from Python and viceversa. Such features are used in the command-line conversion tools. The Node class is kept intentionally similar to an Element class, however it overcomes the limitation of ElementTree: in particular a node can manage a lazy iterable of subnodes, whereas ElementTree wants to keep everything in memory. Moreover the Node class provides a convenient dot notation to access subnodes.

The Node class is instantiated with four arguments:

1. the node tag (a mandatory string)
2. the node attributes (a dictionary)
3. the node value (a string or None)
4. the subnodes (an iterable over nodes)

If a node has subnodes, its value should be None.

For instance, here is an example of instantiating a root node with two subnodes a and b:

```>>> from openquake.baselib.node import Node
>>> a = Node('a', {}, 'A1')
>>> b = Node('b', {'attrb': 'B'}, 'B1')
>>> root = Node('root', nodes=[a, b])
>>> root
<root {} None ...>
```

Node objects can be converted into nicely indented strings:

```>>> print(root.to_str())
root
a 'A1'
b{attrb='B'} 'B1'
```

The subnodes can be retrieved with the dot notation:

```>>> root.a
<a {} A1 >
```

The value of a node can be extracted with the ~ operator:

```>>> ~root.a
'A1'
```

If there are multiple subnodes with the same name

```>>> root.append(Node('a', {}, 'A2'))  # add another 'a' node
```

the dot notation will retrieve the first node.

It is possible to retrieve the other nodes from the ordinal index:

```>>> root[0], root[1], root[2]
(<a {} A1 >, <b {'attrb': 'B'} B1 >, <a {} A2 >)
```

The list of all subnodes with a given name can be retrieved as follows:

```>>> list(root.getnodes('a'))
[<a {} A1 >, <a {} A2 >]
```

It is also possible to delete a node given its index:

```>>> del root[2]
```

A node is an iterable object yielding its subnodes:

```>>> list(root)
[<a {} A1 >, <b {'attrb': 'B'} B1 >]
```

The attributes of a node can be retrieved with the square bracket notation:

```>>> root.b['attrb']
'B'
```

It is possible to add and remove attributes freely:

```>>> root.b['attr'] = 'new attr'
>>> del root.b['attr']
```

Node objects can be easily converted into ElementTree objects:

```>>> node_to_elem(root)
<Element 'root' at ...>
```

Then is trivial to generate the XML representation of a node:

```>>> from xml.etree import ElementTree
>>> print(ElementTree.tostring(node_to_elem(root)).decode('utf-8'))
<root><a>A1</a><b attrb="B">B1</b></root>
```

Generating XML files larger than the available memory requires some care. The trick is to use a node generator, such that it is not necessary to keep the entire tree in memory. Here is an example:

```>>> def gen_many_nodes(N):
...     for i in xrange(N):
...         yield Node('a', {}, 'Text for node %d' % i)
```
```>>> lazytree = Node('lazytree', {}, nodes=gen_many_nodes(10))
```

The lazytree object defined here consumes no memory, because the nodes are not created a instantiation time. They are created as soon as you start iterating on the lazytree. In particular list(lazytree) will generated all of them. If your goal is to store the tree on the filesystem in XML format you should use a writing routine converting a subnode at the time, without requiring the full list of them. The routines provided by ElementTree are no good, however commonlib.writers provide an StreamingXMLWriter just for that purpose.

Lazy trees should not be used unless it is absolutely necessary in order to save memory; the problem is that if you use a lazy tree the slice notation will not work (the underlying generator will not accept it); moreover it will not be possible to iterate twice on the subnodes, since the generator will be exhausted. Notice that even accessing a subnode with the dot notation will avance the generator. Finally, nodes containing lazy nodes will not be pickleable.

class `openquake.baselib.node.``Node`(fulltag, attrib=None, text=None, nodes=None, lineno=None)[source]

Bases: `object`

A class to make it easy to edit hierarchical structures with attributes, such as XML files. Node objects must be pickleable and must consume as little memory as possible. Moreover they must be easily converted from and to ElementTree objects. The advantage over ElementTree objects is that subnodes can be lazily generated and that they can be accessed with the dot notation.

`append`(node)[source]

Append a new subnode

`attrib`
`getnodes`(name)[source]

Return the direct subnodes with name ‘name’

`lineno`
`nodes`
`tag`
`text`
`to_str`(expandattrs=True, expandvals=True)[source]

Convert the node into a string, intended for testing/debugging purposes

Parameters: expandattrs – print the values of the attributes if True, else print only the names expandvals – print the values if True, else print only the tag names
class `openquake.baselib.node.``SourceLineParser`(html=0, target=None, encoding=None)[source]

Bases: `xml.etree.ElementTree.XMLParser`

A custom parser managing line numbers: works for Python <= 3.3

class `openquake.baselib.node.``StreamingXMLWriter`(bytestream, indent=4, encoding='utf-8', nsmap=None)[source]

Bases: `object`

A bynary stream XML writer. The typical usage is something like this:

```with StreamingXMLWriter(output_file) as writer:
writer.start_tag('root')
for node in nodegenerator():
writer.serialize(node)
writer.end_tag('root')
```
`emptyElement`(name, attrs)[source]

Add an empty element (may have attributes)

`end_tag`(name)[source]

Close an XML tag

`serialize`(node)[source]

Serialize a node object (typically an ElementTree object)

`shorten`(tag)[source]

Get the short representation of a fully qualified tag

Parameters: tag (str) – a (fully qualified or not) XML tag
`start_tag`(name, attrs=None)[source]

Open an XML tag

class `openquake.baselib.node.``ValidatingXmlParser`(validators, stop=None)[source]

Bases: `object`

Validating XML Parser based on Expat. It has two methods .parse_file and .parse_bytes returning a validated `Node` object.

Parameters: validators – a dictionary of validation functions stop – the tag where to stop the parsing (if any)
exception `Exit`[source]

Bases: `exceptions.Exception`

Raised when the parsing is stopped before the end on purpose

`ValidatingXmlParser.``parse_bytes`(bytestr, isfinal=True)[source]

Parse a byte string. If the string is very large, split it in chuncks and parse each chunk with isfinal=False, then parse an empty chunk with isfinal=True.

`ValidatingXmlParser.``parse_file`(file_or_fname)[source]

Parse a file or a filename

`openquake.baselib.node.``context`(*args, **kwds)[source]

Context manager managing exceptions and adding line number of the current node and name of the current file to the error message.

Parameters: fname – the current file being processed node – the current node being processed
`openquake.baselib.node.``floatformat`(*args, **kwds)[source]

Context manager to change the default format string for the function `openquake.commonlib.writers.scientificformat()`.

Parameters: fmt_string – the format to use; for instance ‘%13.9E’
`openquake.baselib.node.``fromstring`(text)[source]

Parse an XML string and return a tree

`openquake.baselib.node.``iterparse`(source, events=('end', ), remove_comments=True, **kw)[source]

Thin wrapper around ElementTree.iterparse

`openquake.baselib.node.``node_copy`(node, nodefactory=<class 'openquake.baselib.node.Node'>)[source]

Make a deep copy of the node

`openquake.baselib.node.``node_display`(root, expandattrs=False, expandvals=False, output=<open file '<stdout>', mode 'w'>)[source]

Write an indented representation of the Node object on the output; this is intended for testing/debugging purposes.

Parameters: root – a Node object expandattrs (bool) – if True, the values of the attributes are also printed, not only the names expandvals (bool) – if True, the values of the tags are also printed, not only the names. output – stream where to write the string representation of the node
`openquake.baselib.node.``node_from_dict`(dic, nodefactory=<class 'openquake.baselib.node.Node'>)[source]

Convert a (nested) dictionary with attributes tag, attrib, text, nodes into a Node object.

`openquake.baselib.node.``node_from_elem`(elem, nodefactory=<class 'openquake.baselib.node.Node'>, lazy=())[source]

Convert (recursively) an ElementTree object into a Node object.

`openquake.baselib.node.``node_from_ini`(ini_file, nodefactory=<class 'openquake.baselib.node.Node'>, root_name='ini')[source]

Convert a .ini file into a Node object.

Parameters: ini_file – a filename or a file like object in read mode
`openquake.baselib.node.``node_from_xml`(xmlfile, nodefactory=<class 'openquake.baselib.node.Node'>)[source]

Convert a .xml file into a Node object.

Parameters: xmlfile – a file name or file object open for reading
`openquake.baselib.node.``node_to_dict`(node)[source]

Convert a Node object into a (nested) dictionary with attributes tag, attrib, text, nodes.

Parameters: node – a Node-compatible object
`openquake.baselib.node.``node_to_elem`(root)[source]

Convert (recursively) a Node object into an ElementTree object.

`openquake.baselib.node.``node_to_ini`(node, output=<open file '<stdout>', mode 'w'>)[source]

Convert a Node object with the right structure into a .ini file.

Params node: a Node object a file-like object opened in write mode
`openquake.baselib.node.``node_to_xml`(node, output=<open file '<stdout>', mode 'w'>, nsmap=None)[source]

Convert a Node object into a pretty .xml file without keeping everything in memory. If you just want the string representation use tostring(node).

Parameters: node – a Node-compatible object (ElementTree nodes are fine) nsmap – if given, shorten the tags with aliases
`openquake.baselib.node.``parse`(source, remove_comments=True, **kw)[source]

Thin wrapper around ElementTree.parse

`openquake.baselib.node.``pprint`(self, stream=None, indent=1, width=80, depth=None)[source]

Pretty print the underlying literal Python object

`openquake.baselib.node.``read_nodes`(fname, filter_elem, nodefactory=<class 'openquake.baselib.node.Node'>, remove_comments=True)[source]

Convert an XML file into a lazy iterator over Node objects satifying the given specification, i.e. a function element -> boolean.

Parameters: fname – file name of file object filter_elem – element specification

In case of errors, add the file name to the error message.

`openquake.baselib.node.``scientificformat`(value, fmt='%13.9E', sep=' ', sep2=':')[source]
Parameters: value – the value to convert into a string fmt – the formatting string to use for float values sep – separator to use for vector-like values sep2 – second separator to use for matrix-like values

Convert a float or an array into a string by using the scientific notation and a fixed precision (by default 10 decimal digits). For instance:

```>>> scientificformat(-0E0)
'0.000000000E+00'
>>> scientificformat(-0.004)
'-4.000000000E-03'
>>> scientificformat([0.004])
'4.000000000E-03'
>>> scientificformat([0.01, 0.02], '%10.6E')
'1.000000E-02 2.000000E-02'
>>> scientificformat([[0.1, 0.2], [0.3, 0.4]], '%4.1E')
'1.0E-01:2.0E-01 3.0E-01:4.0E-01'
```
`openquake.baselib.node.``striptag`(tag)[source]

Get the short representation of a fully qualified tag

Parameters: tag (str) – a (fully qualified or not) XML tag
`openquake.baselib.node.``to_literal`(self)[source]

Convert the node into a literal Python object

`openquake.baselib.node.``tostring`(node, indent=4, nsmap=None)[source]

Convert a node into an XML string by using the StreamingXMLWriter. This is useful for testing purposes.

Parameters: node – a node object (typically an ElementTree object) indent – the indentation to use in the XML (default 4 spaces)

## parallel¶

### The Starmap API¶

There are several good libraries to manage parallel programming, both in the standard library and in third party packages. Since we are not interested in reinventing the wheel, OpenQuake does not offer any new parallel library; however, it does offer some glue code so that you can use your library of choice. Currently multiprocessing, concurrent.futures, celery and ipython-parallel are supported. Moreover, `openquake.baselib.parallel` offers some additional facilities that make it easier to parallelize scientific computations, i.e. embarrassing parallel problems.

Typically one wants to apply a callable to a list of arguments in parallel rather then sequentially, and then combine together the results. This is known as a MapReduce problem. As a simple example, we will consider the problem of counting the letters in a text. Here is how you can solve the problem sequentially:

```>>> from itertools import starmap  # map a function with multiple arguments
>>> from functools import reduce  # reduce an iterable with a binary operator
>>> from collections import Counter  # callable doing the counting
```
```>>> arglist = [('hello',), ('world',)]  # list of arguments
>>> results = starmap(Counter, arglist)  # iterator over the results
>>> res = reduce(add, results, Counter())  # aggregated counts
```
```>>> sorted(res.items())  # counts per letter
[('d', 1), ('e', 1), ('h', 1), ('l', 3), ('o', 2), ('r', 1), ('w', 1)]
```

Here is how you can solve the problem in parallel by using `openquake.baselib.parallel.Starmap`:

```>>> res2 = Starmap(Counter, arglist).reduce()
>>> assert res2 == res  # the same as before
```

As you see there are some notational advantages with respect to use itertools.starmap. First of all, Starmap has a reduce method, so there is no need to import functools.reduce; secondly, the reduce method has sensible defaults:

1. the default aggregation function is add, so there is no need to specify it
2. the default accumulator is an empty accumulation dictionary (see `openquake.baselib.AccumDict`) working as a Counter, so there is no need to specify it.

You can of course ovverride the defaults, so if you really want to return a Counter you can do

```>>> res3 = Starmap(Counter, arglist).reduce(acc=Counter())
```

In the engine we use nearly always callables that return dictionaries and we aggregate nearly always with the addition operator, so such defaults are very convenient. You are encouraged to do the same, since we found that approach to be very flexible. Typically in a scientific application you will return a dictionary of numpy arrays.

The parallelization algorithm used by Starmap will depend on the environment variable OQ_DISTRIBUTE. Here are the possibilities available at the moment:

OQ_DISTRIBUTE not set or set to “futures”:
use multiprocessing via the concurrent.futures interface
OQ_DISTRIBUTE set to “no”:
disable the parallelization, useful for debugging
OQ_DISTRIBUTE set to “celery”:
use celery, useful if you have multiple machines in a cluster
OQ_DISTRIBUTE set tp “ipython”
use the ipyparallel concurrency mechanism (experimental)

There is no such a thing as OQ_DISTRIBUTE=”threading”; it would be trivial to do, but the performance of using threads instead of processes is terrible for the kind of applications we are interested in (CPU-dominated, which large tasks such that the time to spawn a new process is negligible with respect to the time to perform the task).

### The Starmap.apply API¶

The Starmap class has a very convenient classmethod Starmap.apply which is used in several places in the engine. Starmap.apply is useful when you have a sequence of objects that you want to split in homogenous chunks and then apply a callable to each chunk (in parallel). For instance, in the letter counting example discussed before, Starmap.apply could be used as follows:

```>>> text = 'helloworld'  # sequence of characters
>>> res3 = Starmap.apply(Counter, (text,)).reduce()
>>> assert res3 == res
```

The API of Starmap.apply is designed to extend the one of apply, a builtin of Python 2; the second argument is the tuple of arguments passed to the first argument. The difference with apply is that Starmap.apply returns a `Starmap` object so that nothing is actually done until you iterate on it (reduce is doing that).

How many chunks will be produced? That depends on the parameter concurrent_tasks; it it is not passed, it has a default of 5 times the number of cores in your machine - as returned by os.cpu_count() - and Starmap.apply will try to produce a number of chunks close to that number. The nice thing is that it is also possible to pass a weight function. Suppose for instance that instead of a list of letters you have a list of seismic sources: some sources requires a long computation time (such as ComplexFaultSources), some requires a short computation time (such as PointSources). By giving an heuristic weight to the different sources it is possible to produce chunks with nearly homogeneous weight; in particular PointSource tasks will contain a lot more sources than tasks with ComplexFaultSources.

It is essential in large computations to have a homogeneous task distribution, otherwise you will end up having a big task dominating the computation time (i.e. you may have 1000 cores of which 999 are free, having finished all the short tasks, but you have to wait for days for the single core processing the slow task). The OpenQuake engine does a great deal of work trying to split slow sources in more manageable fast sources.

class `openquake.baselib.parallel.``BaseStarmap`(func, iterargs, poolsize=None)[source]

Bases: `object`

classmethod `apply`(func, args, concurrent_tasks=80, weight=<function <lambda>>, key=<function <lambda>>)[source]
static `poolfactory`(size)
`reduce`(agg=<built-in function add>, acc=None, progress=<function info>)[source]
`submit_all`(progress=<function info>)[source]
Returns: an `IterResult` instance
class `openquake.baselib.parallel.``Computer`(hdf5path=None)[source]

Bases: `object`

Abstract Base Class. Subclasses must override the methods __call__ and gen_args, and may override aggregate. They may also override __init__: in that case they must set the hdf5path and __name__ attributes.

`aggregate`(acc, val)[source]

Aggregate values; the default operation is the sum

`gen_args`(*args)[source]

Yield tuples of arguments

`monitor`(operation=None, autoflush=False, measuremem=False)[source]

Return a `openquake.baselib.performance.Monitor` instance

`run`(*args, **kw)[source]

Run the computer with the given arguments; one can specify the extra arguments acc and Starmap.

class `openquake.baselib.parallel.``IterResult`(futures, taskname, num_tasks=None, progress=<function info>)[source]

Bases: `object`

Parameters: futures – an iterator over futures taskname – the name of the task num_tasks – the total number of expected futures (None if unknown) progress – a logging function for the progress report
`reduce`(agg=<built-in function add>, acc=None)[source]
`save_task_data`(mon)[source]
classmethod `sum`(iresults)[source]

Sum the data transfer information of a set of results

`task_data_dt` = dtype([('taskno', '<u4'), ('weight', '<f4'), ('duration', '<f4')])
class `openquake.baselib.parallel.``NoFlush`(monitor, taskname)[source]

Bases: `object`

class `openquake.baselib.parallel.``Pickled`(obj)[source]

Bases: `object`

An utility to manually pickling/unpickling objects. The reason is that celery does not use the HIGHEST_PROTOCOL, so relying on celery is slower. Moreover Pickled instances have a nice string representation and length giving the size of the pickled bytestring.

Parameters: obj – the object to pickle
`unpickle`()[source]

Unpickle the underlying object

class `openquake.baselib.parallel.``Processmap`(func, iterargs, poolsize=None)[source]

MapReduce implementation based on processes. For instance

```>>> from collections import Counter
>>> c = Processmap(Counter, [('hello',), ('world',)], poolsize=4).reduce()
>>> sorted(c.items())
[('d', 1), ('e', 1), ('h', 1), ('l', 3), ('o', 2), ('r', 1), ('w', 1)]
```
class `openquake.baselib.parallel.``Sequential`(func, iterargs, poolsize=None)[source]

A sequential Starmap, useful for debugging purpose.

class `openquake.baselib.parallel.``Starmap`(oqtask, task_args, name=None)[source]

Bases: `object`

A manager to submit several tasks of the same type. The usage is:

```tm = Starmap(do_something, logging.info)
tm.send(arg1, arg2)
tm.send(arg3, arg4)
print(tm.reduce())
```

Progress report is built-in.

classmethod `apply`(task, task_args, concurrent_tasks=80, maxweight=None, weight=<function <lambda>>, key=<function <lambda>>, name=None)[source]

Apply a task to a tuple of the form (sequence, *other_args) by first splitting the sequence in chunks, according to the weight of the elements and possibly to a key (see :func: openquake.baselib.general.split_in_blocks).

Parameters: task – a task to run in parallel task_args – the arguments to be passed to the task function agg – the aggregation function acc – initial value of the accumulator (default empty AccumDict) concurrent_tasks – hint about how many tasks to generate maxweight – if not None, used to split the tasks weight – function to extract the weight of an item in arg0 key – function to extract the kind of an item in arg0
`executor` = <concurrent.futures.process.ProcessPoolExecutor object>
`progress`(*args)[source]

`reduce`(agg=<built-in function add>, acc=None)[source]

Loop on a set of results and update the accumulator by using the aggregation function.

Parameters: agg – the aggregation function, (acc, val) -> new acc acc – the initial value of the accumulator the final value of the accumulator
classmethod `restart`()[source]
`submit`(*args)[source]

Submit a function with the given arguments to the process pool and add a Future to the list .results. If the attribute distribute is set, the function is run in process and the result is returned.

`submit_all`()[source]
Returns: an IterResult object
`task_ids` = []
`wait`()[source]

Returns: the total number of tasks that were spawned
class `openquake.baselib.parallel.``Threadmap`(func, iterargs, poolsize=None)[source]

MapReduce implementation based on threads. For instance

```>>> from collections import Counter
>>> c = Threadmap(Counter, [('hello',), ('world',)], poolsize=4).reduce()
>>> sorted(c.items())
[('d', 1), ('e', 1), ('h', 1), ('l', 3), ('o', 2), ('r', 1), ('w', 1)]
```
static `poolfactory`(size)
`openquake.baselib.parallel.``check_mem_usage`(monitor=<Monitor dummy>, soft_percent=90, hard_percent=100)[source]

Display a warning if we are running out of memory

Parameters: mem_percent (int) – the memory limit as a percentage
`openquake.baselib.parallel.``do_not_aggregate`(acc, value)[source]

Do nothing aggregation function.

Parameters: acc – the accumulator value – the value to accumulate the accumulator unchanged
`openquake.baselib.parallel.``get_pickled_sizes`(obj)[source]

Return the pickled sizes of an object and its direct attributes, ordered by decreasing size. Here is an example:

>> total_size, partial_sizes = get_pickled_sizes(Monitor(‘’)) >> total_size 345 >> partial_sizes [(‘_procs’, 214), (‘exc’, 4), (‘mem’, 4), (‘start_time’, 4), (‘_start_time’, 4), (‘duration’, 4)]

Notice that the sizes depend on the operating system and the machine.

`openquake.baselib.parallel.``main`(hostport)[source]
`openquake.baselib.parallel.``mkfuture`(result)[source]
`openquake.baselib.parallel.``oq_distribute`(task=None)[source]

If the task has an attribute shared_dir_on which is false, return ‘futures’ even if OQ_DISTRIBUTE is celery, otherwise return the current value of the variable OQ_DISTRIBUTE; if undefined, return ‘futures’.

`openquake.baselib.parallel.``pickle_sequence`(objects)[source]

Convert an iterable of objects into a list of pickled objects. If the iterable contains copies, the pickling will be done only once. If the iterable contains objects already pickled, they will not be pickled again.

Parameters: objects – a sequence of objects to pickle
`openquake.baselib.parallel.``qsub`(func, allargs, authkey=None)[source]

Map functions to arguments by means of the Grid Engine.

Parameters: func – a pickleable callable object allargs – a list of tuples of arguments authkey – authentication token used to send back the results an iterable over results of the form (res, etype, mon)
`openquake.baselib.parallel.``rec_delattr`(mon, name)[source]

Delete attribute from a monitor recursively

`openquake.baselib.parallel.``safely_call`(func, args, pickle=False, conn=None)[source]

Call the given function with the given arguments safely, i.e. by trapping the exceptions. Return a pair (result, exc_type) where exc_type is None if no exceptions occur, otherwise it is the exception class and the result is a string containing error message and traceback.

Parameters: func – the function to call args – the arguments pickle – if set, the input arguments are unpickled and the return value is pickled; otherwise they are left unchanged
`openquake.baselib.parallel.``wakeup_pool`()[source]

This is used at startup, only when the ProcessPoolExecutor is used, to fork the processes before loading any big data structure. It is called once once, and adds the list of PIDs spawned to the executor.

## performance¶

class `openquake.baselib.performance.``Monitor`(operation='dummy', hdf5path=None, autoflush=False, measuremem=False)[source]

Bases: `object`

Measure the resident memory occupied by a list of processes during the execution of a block of code. Should be used as a context manager, as follows:

```with Monitor('do_something') as mon:
do_something()
print mon.mem
```

At the end of the block the Monitor object will have the following 5 public attributes:

.start_time: when the monitor started (a datetime object) .duration: time elapsed between start and stop (in seconds) .exc: usually None; otherwise the exception happened in the with block .mem: the memory delta in bytes

The behaviour of the Monitor can be customized by subclassing it and by overriding the method on_exit(), called at end and used to display or store the results of the analysis.

NB: if the .address attribute is set, it is possible for the monitor to send commands to that address, assuming there is a `multiprocessing.connection.Listener` listening.

`address` = None
`authkey` = None
`calc_id` = None
`dt`

Last time interval measured

`flush`()[source]

Save the measurements on the performance file (or on stdout)

`get_data`()[source]
Returns: an array of dtype perf_dt, with the information of the monitor (operation, time_sec, memory_mb, counts); the lenght of the array can be 0 (for counts=0) or 1 (otherwise).
`measure_mem`()[source]

A memory measurement (in bytes)

`new`(operation='no operation', **kw)[source]

Return a copy of the monitor usable for a different operation.

`on_exit`()[source]

To be overridden in subclasses

`save_info`(dic)[source]

Save (name, value) information in the associated hdf5path

`send`(*args)[source]

Send a command to the listener. Add the .calc_id as last argument.

`start_time`

Datetime instance recording when the monitoring started

`openquake.baselib.performance.``memory_info`(proc)[source]
`openquake.baselib.performance.``virtual_memory`()[source]

## python3compat¶

Compatibility layer for Python 2 and 3. Mostly copied from six and future, but reduced to the subset of utilities needed by GEM. This is done to avoid an external dependency.

`openquake.baselib.python3compat.``check_syntax`(pkg)[source]

Recursively check all modules in the given package for compatibility with Python 3 syntax. No imports are performed.

Parameters: pkg – a Python package
`openquake.baselib.python3compat.``decode`(val)[source]

Decode an object assuming the encoding is UTF-8.

Param: a unicode or bytes object a unicode object
`openquake.baselib.python3compat.``encode`(val)[source]

Encode a string assuming the encoding is UTF-8.

Param: a unicode or bytes object bytes
`openquake.baselib.python3compat.``exec_`(_code_, _globs_=None, _locs_=None)[source]

Execute code in a namespace.

`openquake.baselib.python3compat.``raise_`(tp, value=None, tb=None)[source]
`openquake.baselib.python3compat.``with_metaclass`(meta, *bases)[source]

Returns an instance of meta inheriting from the given bases. To be used to replace the __metaclass__ syntax.

`openquake.baselib.python3compat.``zip`(arg, *args)[source]

## runtests¶

class `openquake.baselib.runtests.``TestLoader`[source]

Bases: `object`

`loadTestsFromNames`(suitename, module=None)[source]
class `openquake.baselib.runtests.``TestResult`(stream, descriptions, verbosity)[source]

Bases: `unittest.runner.TextTestResult`

`save_times`(fname)[source]
`startTest`(test)[source]
`stopTest`(test)[source]
`timedict` = {}
`openquake.baselib.runtests.``addTest`(self, test)[source]

## sap¶

Here is a minimal example of usage:

```>>> from openquake.baselib import sap
>>> def fun(input, inplace, output=None, out='/tmp'):
...     'Example'
...     for item in sorted(locals().items()):
...         print('%s = %s' % item)

>>> p = sap.Script(fun)
>>> p.arg('input', 'input file or archive')
>>> p.flg('inplace', 'convert inplace')
>>> p.arg('output', 'output archive')
>>> p.opt('out', 'optional output file')

>>> p.callfunc(['a'])
inplace = False
input = a
out = /tmp
output = None

>>> p.callfunc(['a', 'b', '-i', '-o', 'OUT'])
inplace = True
input = a
out = OUT
output = b
```

Parsers can be composed too.

class `openquake.baselib.sap.``Script`(func, name=None, parentparser=None, help=True, registry=True)[source]

Bases: `object`

A simple way to define command processors based on argparse. Each parser is associated to a function and parsers can be composed together, by dispatching on a given name (if not given, the function name is used).

`arg`(name, help, type=None, choices=None, metavar=None, nargs=None)[source]

Describe a positional argument

`callfunc`(argv=None)[source]

Parse the argv list and extract a dictionary of arguments which is then passed to the function underlying the Script.

`check_arguments`()[source]

Make sure all arguments have a specification

`flg`(name, help, abbrev=None)[source]

Describe a flag

`group`(descr)[source]

Added a new group of arguments with the given description

`help`()[source]

Return the help message as a string

`opt`(name, help, abbrev=None, type=None, choices=None, metavar=None, nargs=None)[source]

Describe an option

`registry` = {}
`openquake.baselib.sap.``compose`(scripts, name='main', description=None, prog=None, version=None)[source]

Collects together different Scripts and builds a single Script dispatching to the subparsers depending on the first argument, i.e. the name of the subparser to invoke.

Parameters: scripts – a list of Script instances name – the name of the composed parser description – description of the composed parser prog – name of the script printed in the usage message version – version of the script printed with –version
`openquake.baselib.sap.``get_parentparser`(parser, description=None, help=True)[source]
Parameters: parser – `argparse.ArgumentParser` instance or None description – string used to build a new parser if parser is None help – flag used to build a new parser if parser is None if parser is None the new parser; otherwise the .parentparser attribute (if set) or the parser itself (if not set)
`openquake.baselib.sap.``str_choices`(choices)[source]

Returns {choice1, ..., choiceN} or the empty string

## slots¶

`openquake.baselib.slots.``with_slots`(cls)[source]

Decorator for a class with _slots_. It automatically defines the methods __eq__, __ne__, assert_equal.