Miscellaneous Packages¶

XML Parsers¶

The xml_prm.parser module contains the parsers that instatiate a probrem instance from the XML specifications. There is parser for

Data Interface specification, DataInterfaceParser
PRM specifcation, PRMparser
Loading model parameters stored on disk, LocalDistributionParser

class xml_prm.parser.DataInterfaceParser[source]¶

The data interface is specified in XML and saved in for example ./DIexample.xml:

<?xml version="1.0" ?>
<DataInterface name="DIexample">
    <Crossvalidation folds='1'>
            <Dataset type='SQLite' path='./data/database.sqlite'/>
    </Crossvalidation>
</DataInterface>

A list of all imporant xml tags and xml attributes. Note the somewhat confusing double use of the word attributes, on one hand xml attributes and on the other hand the probabilistic PRM prm.attribute.

DataInterface

name : Freely chosen name for PRM model

Crossvalidation

folds : Number of folds used for cross validation. The data has to be split up on the database level, otherwise the different folds would have to be accessed by querying one database which decreases the performance. This feature has not been tested, if no cross validation is desired, just use folds=”1”

Dataset

type : Type of database the interface is connecting to. Currently only SQLite is supported, SQLiteDI.

path : The path to the database file

class xml_prm.parser.PRMparser[source]¶

The PRM model is also specified in XML and saved in for example ./PRMexample.xml:

<?xml version="1.0" ?>
<PRM name="PRMexample"  datainterface="./DIexample.xml" >
        <RelationalSchema>
                <Entities>
                        <Entity name="A">
                                <Attribute name="Aa" type="Binary"/>
                        </Entity>                       
                        <Entity name="B">
                                <Attribute name="Ba" type="Integer" description="1,20"/>
                        </Entity>               
            [......]
                </Entities>
                <Relationships>
                        <Relationship name="AB" foreign="A.pk,B.pk" type="1:n">
                                <Attribute name="ABa" type="Binary"/> 
                        </Relationship>                         
                        [......]
                </Relationships>
        </RelationalSchema>     
        <DependencyStructure>                   
                <Dependency name="Aa_Ba" child="A.Aa" parent="B.Ba" constraints="A.pk=B.pk"  aggregator='AVG'/>
                [......]
        </DependencyStructure>  
        <LocalDistributions>
                <LocalDistribution attribute='A.Aa' file='./localdistributions/Da_Aa.xml'/>
                <LocalDistribution attribute='B.Ba' file='./localdistributions/Ba_Aa.xml'/>
                <LocalDistribution attribute='AB.ABa' file='./localdistributions/Ca_Aa.xml'/>
        </LocalDistributions>   
</PRM>

A list of all imporant xml tags and xml attributes. Note the somewhat confusing double use of the word attributes, on one hand xml attributes and on the other hand the probabilistic PRM prm.attribute.

PRM

name : Freely chosen name for PRM model

datainterface : relative path to the data interface specification, optional

Entity

name : The name has to correspond with the corresponding table name in the relational database

Relationship

name : The name has to correspond with the corresponding table name in the relational database

foreign : A list, separated by a coma, of the foreign keys of the relationship. The foreign attributes can be probabilistic, but usually the primary key of an entity serves as foreign key. The primary key of an entity can be defined as probilistic attribute, but this entails that the domain is all the entries in one database table. This doesn’t scale well at all, but sometimes this can be desired behavior. In case the foreign key is not probabilistic and thus most likely a primary key, the special keyword pk can be used to refer to a primary key of an entity. The pk keyword defaults to entityname_id as the name of the primary key. Thus if there is a table Professor, Professor.pk’ would refer `Professor.professor_id.

type : This is only used in the context of reference uncertainty. A relationship can be of type k:n,`n:k`.

k : A fixed parameter indicating an uncertain relationship (i.e. reference uncertainty) of type n:k.

Attribute

name : The name has to be correspond to the data field in the relational database. Also, it has to be unique among the attributes of the entity/relationship it is part of.

type : The type must be either

Binary, instantiates an attribute of type BinaryAttribute

Integer, instantiates an attribute of type IntegerAttribute

Enumerated, instantiates an attribute of type EnumeratedAttribute

NotProbabilisticAttribute, instantiates an attribute of type NotProbabilisticAttribute

description : Specifies the domain of the attribute

For Binary not required

For Integer, e.g. “1,5” results in [1,2,3,4,5] (including 5)

For Enumerated, coma separated list of domain, e.g. “1,4,78” results in [1,4,78]

For NotProbabilisticAttribute not required

pk : Optional. If you choose to use the primary key, set pk=”1”. Note the remark in the Relationship tag for more information

Dependency

name : Freely chosen name for probabilistic dependency

parent : Parent attribute, referenced by its full name. e.g. Professor.fame

child : Child attribute, referenced by its full name. e.g. Student.success

constraint : Optional. A coma separated list of constraints that can be applied to the data interface. Most commonly used are normal slotchains, e.g. “Professor.pk=Advisor.professor_id,Advisor.student_id=Student.pk”. If no constraint is given, ProbReM will apply a depth-first-search to find a slot chain from child to parent using computeSlotChain()

aggregator : If the constraint on the dependency leads to one child attribute object having multiple parent attribute objects, the values of the parent attribute objects have to be aggregated as the CPD for the child allows only one value for that parent. The module aggregation implements different such methods, e.g. AVG, MAX, MIN, MODE

refun : Indicator if the dependency has an uncertain relationship in the slotchain (‘1’,’True’ or ‘T’)

LocalDistribution: See LocalDistributionParser

class xml_prm.parser.LocalDistributionParser[source]¶

The local distribution parser loads a the model parameters that have been saved to disk. Naturally this can only be done if the probabilistic structure and also the data itself have not changed. learners.cpdlearners.CPDTabularLearner.learnCPDsFull() can be called with the saveDistributions=True. The required XML specification along with the .nlp (numpy.array format) is saved in ./localdistributions/xxx.xml and the required XML will be printed to the standard output, e.g.

<LocalDistribution attribute='A.Aa' file='./localdistributions/Da_Aa.xml'/>

After adding that output to the <LocalDistributions> tag in the PRM specification, the next time the model is loaded - granted that the structure is the same - the local distribution will be loaded from disk.

User Interaction¶

The user interface package ui contains modules to facilitate the interaction with a ProbReM model.

Config Module `ui.config`¶

ui.config is used to create instances of the different building blocks for a ProbReM project, e.g. the PRM model, the data interface, the learner and the inference methods. The methods are accessed by importing the config module

from ui import config
import probrem
config.loadPRM(prmSpec)

Then code above for example would initialize the PRM module prm:

print probrem.PRM
<module 'prm.prm' from './../../src/prm/prm.pyc'>

The config module can’t be executed directly.

ui.config.fromFile(probremI, config)[source]¶: Using a config file to load PRM. Not implemented yet.

ui.config.loadDI(diSpec)[source]¶

Loads a DataInterface instance

Parameters:	diSpec – File name of Data Interface XML specification
Returns:	`data.datainterface.DataInterface` instance, e.g. `SQLiteDI`

ui.config.loadInferenceAlgorithm(inferenceType)[source]¶

Loads the specified inference algorithm for the engine engine and configures it to use inferenceType (e.g. MCMC,LW).

Usually an inference algorithm implements a configure() method that can be used to precompute data structures needed for inference.

In the case of the Gibbs sampler, gibbs.configure will precompute all the conditional likelihood functions of the attributes with parents. Note that at the time a inference method is configured, the PRM should be initialized with proper local distributions (either learned or loaded).

arg inferenceType:

The name of the inference method (e.g. GIBBS or MH)

ui.config.loadLearner(learnerType)[source]¶

Loads a learner instance, e.g. a CPDLearner instance for learning the conditional probability distributions (CPDs).

Parameters:	learnerType – Name of a learner class (e.g. CPDTabularLearner)
Returns:	A learner instance, e.g. `CPDTabularLearner`

ui.config.loadPRM(prmSpec)[source]¶

Loads a PRM instance using the PRMparser

Parameters:	prmSpec – File name of PRM XML specification
Returns:	`prm.prm.PRM` instance

Command Line Module `ui.cmd`¶

ui.cmd contains useful methods to display information about the ProbReM model on the command line using ipython. The methods are accessed by importing the cmd module

from ui import cmd

Then one can for example diplay all CPD instances:

cmd.displayCPDs() 

The cmd module can’t be executed directly.

ui.cmd.displayCPDs()[source]¶: Prints the conditional probability distributions (CPDs) for all probabilistic attributes

ui.cmd.ipythonRunning()[source]¶: Returns True if the method is executed in an active IPython shell

ui.cmd.ipythonShell()[source]¶: Starts an interactive ipython session if the session is not already started

Logging Module `ui.log`¶

The module ui.log allows the logging of warning/errors/debugging messages. It is configured display logging messages to the console as well as a log file. This module configures the python logging module accordingly. Instead of print statements in the code, one can simply:

import logging

Then one can for example diplay a debug message by invoking:

logging.debug('What is happening here?')

Analytics¶

The analytics package analytics implements methods that can be used to analyse/configure/debug the code and the model.

Measure the running time of any method, e.g. how much time is spent executing individual methods
Display graphs, e.g. the Ground Bayesian Network, using either GraphViz or NetworkX - both of which have to be installed seperately.

Performance Module `analytics.performance`¶

Main module for performance analyis of the Probrem package

analytics.performance.displayTimeAnalysis()[source]¶: Displays statistics about the running times of all methods that are decorated with the @time_analysis

analytics.performance.measurments¶: A dictionary to keep track of execution times for methods {key=function.__name__ : value=execution time }

analytics.performance.time_analysis(caller)[source]¶

A decorator function that measures and saves the time of the calling method caller

The decorator function is used by adding

from analytics.performance import time_analysis

to the module of the caller method and by adding

@time_analysis
def methodtomeasure():
    ...

on the line before the caller function definition

Visualization Module `analytics.visualization`¶

The analytics.visualization module can be used to create desciption files for graph visualization software. Note that the networks tend to be very large and thus creating meaningful graphs is usually not trivial nor useful.

analytics.visualization.createGraphvizFile(GBNgraph)[source]¶

Generates a description file reports/gbn.dot for the GraphViz software. It makes use of GvGen written by Sebastien Tricaud.

Parameters:	GBNgraph – `network.groundBN.GBNGraph`

analytics.visualization.displayGraph(graph)[source]¶

Plots the graph using NetworkX.

Parameters:	graph – `network.groundBN.GBNGraph`

analytics.visualization.plotAttrCPD(attr)[source]¶

Displayes the CPD of the Attribute attr using matplotlib. If attr has parents and thus different possible parent assigments, the method attempts to display a separate line for each assignment. It is safe to say that this doesn’t scale well.

Parameters:	attr – `Attribute`

Table Of Contents

Previous topic

Miscellaneous Packages¶

XML Parsers¶

User Interaction¶

Config Module `ui.config`¶

Command Line Module `ui.cmd`¶

Logging Module `ui.log`¶

Analytics¶

Performance Module `analytics.performance`¶

Visualization Module `analytics.visualization`¶

Navigation

Table Of Contents

Previous topic

Quick search

Miscellaneous Packages¶

XML Parsers¶

User Interaction¶

Config Module ui.config¶

Command Line Module ui.cmd¶

Logging Module ui.log¶

Analytics¶

Performance Module analytics.performance¶

Visualization Module analytics.visualization¶

Navigation

Config Module `ui.config`¶

Command Line Module `ui.cmd`¶

Logging Module `ui.log`¶

Performance Module `analytics.performance`¶

Visualization Module `analytics.visualization`¶