June 27th— July 3rd

July 4th— July 10th

July 11th— July 17th

Although the ICML paper is well written, it has some ambiguous parts which their exact meanings are crucial for understanding and applying the algorithm.

The main idea of the algorithm is to keep splitting the nodes until their transition sets are deterministic. A state x can loop back to one of its ancestors q if for all possible different length prefixes p, px and pq have exactly the same first level transitions set.

The part which we are not yet sure is what they exactly mean by deterministic transition set. Does it mean that only one action is possible after that state in the history, or the observations for  each following action is deterministic or ………. ?

The task for next week is to work on different examples and find the most reasonable assumption for the definition of deterministic.

We are also interested to know whether the learned automaton is minimal or not.

OK, lets go for the exploration…. :)

After working on various examples and long discussions and reading the paper more carefully, the best definition of “deterministic” that we agreed on is as follows:

We stop splitting a node when applying the history that the node represents leads us to the states which have identical single–step action/observation transitions.

To check if this condition holds we check if there exists a prefix which prepending it to the node will change the transition set. If there is no such a prefix, then the node’s transition set is deterministic. It should be noted that if we assume that all actions are valid on each state, then just checking if the node’s observation for each action is deterministic is sufficient for the decision of termination.

We also observed that the learned automaton is not minimal because a state can only loop back to one of its ancestors, so if two nodes are identical but appear in different sub trees then they cannot point back to each other, which will cause several duplicate sub trees.

We have also developed a merge-split version of the PAC algorithm which works for DFAs, yet informal though! We want to compare the learned PST and merge-split automata, and to be more specific we are interested to know which one gives the smaller version of the original automaton, and whether their predictions are equivalent or not.

Top

Top

We realized that the number of states in the merge-split learned automaton is in the worst case 2^n (where n is the number of states in the original automaton). But the space complexity of the PST algorithm is more complicated and seems to be more dependent on the number of actions and observations rather than number of states.

 

The learned merge-split representation was smaller than PST (Isbell’s algorithm) so far, still looking for a counter example.

 

For next week we should:

1. Formalize our current merge-split algorithm

2. Find an example that PST learned automaton is smaller than merge-split one.

3. Try to modify the PST algorithm to also work for DASOs

July 18th— July 24th

Top

Finally we found an example which shows that the PST learned automaton can sometimes be smaller than the merge-split one. But we also observed that their predictions are not the same, and the merge-split automaton is more accurate in predicting. This is problematic because we are now unsure if we have understood their algorithm and their prediction process correctly. Therefore we have decided to email them our example and ask if this is actually the correct learned automaton.

For now we are building the merge-split learned machines by ignoring the distinguishability parameter and largeness condition, what we are interested at this point is to run the current code which we have for PAC learning of POMDPs on this example and see how the learned automaton with these parameters looks like.

To be able to run the code on our example we need to encode it. Encoding POMDPs has a special format, and I should a paper on that during next week.

Week 9

Week 10

Week 11

|

|

Week 12

|

Top

Dorna Kashef Haghighi

LINKS