Table Of Contents

Previous topic

Inference Methods

Miscellaneous Packages

XML Parsers

The xml_prm.parser module contains the parsers that instatiate a probrem instance from the XML specifications. There is parser for

class xml_prm.parser.DataInterfaceParser[source]

The data interface is specified in XML and saved in for example ./DIexample.xml:

<?xml version="1.0" ?>
<DataInterface name="DIexample">
    <Crossvalidation folds='1'>
            <Dataset type='SQLite' path='./data/database.sqlite'/>
    </Crossvalidation>
</DataInterface>

A list of all imporant xml tags and xml attributes. Note the somewhat confusing double use of the word attributes, on one hand xml attributes and on the other hand the probabilistic PRM prm.attribute.

DataInterface

  • name : Freely chosen name for PRM model

Crossvalidation

  • folds : Number of folds used for cross validation. The data has to be split up on the database level, otherwise the different folds would have to be accessed by querying one database which decreases the performance. This feature has not been tested, if no cross validation is desired, just use folds=”1”

Dataset

  • type : Type of database the interface is connecting to. Currently only SQLite is supported, SQLiteDI.
  • path : The path to the database file
class xml_prm.parser.PRMparser[source]

The PRM model is also specified in XML and saved in for example ./PRMexample.xml:

<?xml version="1.0" ?>
<PRM name="PRMexample"  datainterface="./DIexample.xml" >
        <RelationalSchema>
                <Entities>
                        <Entity name="A">
                                <Attribute name="Aa" type="Binary"/>
                        </Entity>                       
                        <Entity name="B">
                                <Attribute name="Ba" type="Integer" description="1,20"/>
                        </Entity>               
            [......]
                </Entities>
                <Relationships>
                        <Relationship name="AB" foreign="A.pk,B.pk" type="1:n">
                                <Attribute name="ABa" type="Binary"/> 
                        </Relationship>                         
                        [......]
                </Relationships>
        </RelationalSchema>     
        <DependencyStructure>                   
                <Dependency name="Aa_Ba" child="A.Aa" parent="B.Ba" constraints="A.pk=B.pk"  aggregator='AVG'/>
                [......]
        </DependencyStructure>  
        <LocalDistributions>
                <LocalDistribution attribute='A.Aa' file='./localdistributions/Da_Aa.xml'/>
                <LocalDistribution attribute='B.Ba' file='./localdistributions/Ba_Aa.xml'/>
                <LocalDistribution attribute='AB.ABa' file='./localdistributions/Ca_Aa.xml'/>
        </LocalDistributions>   
</PRM>

A list of all imporant xml tags and xml attributes. Note the somewhat confusing double use of the word attributes, on one hand xml attributes and on the other hand the probabilistic PRM prm.attribute.

PRM

  • name : Freely chosen name for PRM model
  • datainterface : relative path to the data interface specification, optional

Entity

  • name : The name has to correspond with the corresponding table name in the relational database

Relationship

  • name : The name has to correspond with the corresponding table name in the relational database
  • foreign : A list, separated by a coma, of the foreign keys of the relationship. The foreign attributes can be probabilistic, but usually the primary key of an entity serves as foreign key. The primary key of an entity can be defined as probilistic attribute, but this entails that the domain is all the entries in one database table. This doesn’t scale well at all, but sometimes this can be desired behavior. In case the foreign key is not probabilistic and thus most likely a primary key, the special keyword pk can be used to refer to a primary key of an entity. The pk keyword defaults to entityname_id as the name of the primary key. Thus if there is a table Professor, Professor.pk’ would refer `Professor.professor_id.
  • type : This is only used in the context of reference uncertainty. A relationship can be of type k:n,`n:k`.
  • k : A fixed parameter indicating an uncertain relationship (i.e. reference uncertainty) of type n:k.

Attribute

  • name : The name has to be correspond to the data field in the relational database. Also, it has to be unique among the attributes of the entity/relationship it is part of.

  • type : The type must be either

    • Binary, instantiates an attribute of type BinaryAttribute
    • Integer, instantiates an attribute of type IntegerAttribute
    • Enumerated, instantiates an attribute of type EnumeratedAttribute
    • NotProbabilisticAttribute, instantiates an attribute of type NotProbabilisticAttribute
  • description : Specifies the domain of the attribute

    • For Binary not required
    • For Integer, e.g. “1,5” results in [1,2,3,4,5] (including 5)
    • For Enumerated, coma separated list of domain, e.g. “1,4,78” results in [1,4,78]
    • For NotProbabilisticAttribute not required
  • pk : Optional. If you choose to use the primary key, set pk=”1”. Note the remark in the Relationship tag for more information

Dependency

  • name : Freely chosen name for probabilistic dependency
  • parent : Parent attribute, referenced by its full name. e.g. Professor.fame
  • child : Child attribute, referenced by its full name. e.g. Student.success
  • constraint : Optional. A coma separated list of constraints that can be applied to the data interface. Most commonly used are normal slotchains, e.g. “Professor.pk=Advisor.professor_id,Advisor.student_id=Student.pk”. If no constraint is given, ProbReM will apply a depth-first-search to find a slot chain from child to parent using computeSlotChain()
  • aggregator : If the constraint on the dependency leads to one child attribute object having multiple parent attribute objects, the values of the parent attribute objects have to be aggregated as the CPD for the child allows only one value for that parent. The module aggregation implements different such methods, e.g. AVG, MAX, MIN, MODE
  • refun : Indicator if the dependency has an uncertain relationship in the slotchain (‘1’,’True’ or ‘T’)
LocalDistribution
See LocalDistributionParser
class xml_prm.parser.LocalDistributionParser[source]

The local distribution parser loads a the model parameters that have been saved to disk. Naturally this can only be done if the probabilistic structure and also the data itself have not changed. learners.cpdlearners.CPDTabularLearner.learnCPDsFull() can be called with the saveDistributions=True. The required XML specification along with the .nlp (numpy.array format) is saved in ./localdistributions/xxx.xml and the required XML will be printed to the standard output, e.g.

<LocalDistribution attribute='A.Aa' file='./localdistributions/Da_Aa.xml'/>

After adding that output to the <LocalDistributions> tag in the PRM specification, the next time the model is loaded - granted that the structure is the same - the local distribution will be loaded from disk.

User Interaction

The user interface package ui contains modules to facilitate the interaction with a ProbReM model.

Config Module ui.config

ui.config is used to create instances of the different building blocks for a ProbReM project, e.g. the PRM model, the data interface, the learner and the inference methods. The methods are accessed by importing the config module

from ui import config
import probrem
config.loadPRM(prmSpec)

Then code above for example would initialize the PRM module prm:

print probrem.PRM
<module 'prm.prm' from './../../src/prm/prm.pyc'> 

The config module can’t be executed directly.

ui.config.fromFile(probremI, config)[source]

Using a config file to load PRM. Not implemented yet.

ui.config.loadDI(diSpec)[source]

Loads a DataInterface instance

Parameters:diSpec – File name of Data Interface XML specification
Returns:data.datainterface.DataInterface instance, e.g. SQLiteDI
ui.config.loadInferenceAlgorithm(inferenceType)[source]

Loads the specified inference algorithm for the engine engine and configures it to use inferenceType (e.g. MCMC,LW).

Usually an inference algorithm implements a configure() method that can be used to precompute data structures needed for inference.

In the case of the Gibbs sampler, gibbs.configure will precompute all the conditional likelihood functions of the attributes with parents. Note that at the time a inference method is configured, the PRM should be initialized with proper local distributions (either learned or loaded).

arg inferenceType:
 The name of the inference method (e.g. GIBBS or MH)
ui.config.loadLearner(learnerType)[source]

Loads a learner instance, e.g. a CPDLearner instance for learning the conditional probability distributions (CPDs).

Parameters:learnerType – Name of a learner class (e.g. CPDTabularLearner)
Returns:A learner instance, e.g. CPDTabularLearner
ui.config.loadPRM(prmSpec)[source]

Loads a PRM instance using the PRMparser

Parameters:prmSpec – File name of PRM XML specification
Returns:prm.prm.PRM instance

Command Line Module ui.cmd

ui.cmd contains useful methods to display information about the ProbReM model on the command line using ipython. The methods are accessed by importing the cmd module

from ui import cmd

Then one can for example diplay all CPD instances:

cmd.displayCPDs() 

The cmd module can’t be executed directly.

ui.cmd.displayCPDs()[source]

Prints the conditional probability distributions (CPDs) for all probabilistic attributes

ui.cmd.ipythonRunning()[source]

Returns True if the method is executed in an active IPython shell

ui.cmd.ipythonShell()[source]

Starts an interactive ipython session if the session is not already started

Logging Module ui.log

The module ui.log allows the logging of warning/errors/debugging messages. It is configured display logging messages to the console as well as a log file. This module configures the python logging module accordingly. Instead of print statements in the code, one can simply:

import logging

Then one can for example diplay a debug message by invoking:

logging.debug('What is happening here?')

Analytics

The analytics package analytics implements methods that can be used to analyse/configure/debug the code and the model.

  • Measure the running time of any method, e.g. how much time is spent executing individual methods
  • Display graphs, e.g. the Ground Bayesian Network, using either GraphViz or NetworkX - both of which have to be installed seperately.

Performance Module analytics.performance

Main module for performance analyis of the Probrem package

analytics.performance.displayTimeAnalysis()[source]

Displays statistics about the running times of all methods that are decorated with the @time_analysis

analytics.performance.measurments

A dictionary to keep track of execution times for methods {key=function.__name__ : value=execution time }

analytics.performance.time_analysis(caller)[source]

A decorator function that measures and saves the time of the calling method caller

The decorator function is used by adding

from analytics.performance import time_analysis

to the module of the caller method and by adding

@time_analysis
def methodtomeasure():
    ...

on the line before the caller function definition

Visualization Module analytics.visualization

The analytics.visualization module can be used to create desciption files for graph visualization software. Note that the networks tend to be very large and thus creating meaningful graphs is usually not trivial nor useful.

analytics.visualization.createGraphvizFile(GBNgraph)[source]

Generates a description file reports/gbn.dot for the GraphViz software. It makes use of GvGen written by Sebastien Tricaud.

Parameters:GBNgraphnetwork.groundBN.GBNGraph
analytics.visualization.displayGraph(graph)[source]

Plots the graph using NetworkX.

Parameters:graphnetwork.groundBN.GBNGraph
analytics.visualization.plotAttrCPD(attr)[source]

Displayes the CPD of the Attribute attr using matplotlib. If attr has parents and thus different possible parent assigments, the method attempts to display a separate line for each assignment. It is safe to say that this doesn’t scale well.

Parameters:attrAttribute