Probabilistic reasoning support

(This code is preliminary and incomplete).

The probability package provides an embedded language for Bayesian reasoning based on Thrun's CES language.

In probability theory, the set of states of affairs over which one is assigning probabilities are called events.  An assignment of probabilities to events is a probability distribution over those events.  Sets of events can be either discrete or continuous.  Probability distributions are always supposed to sum (or integrate, for a continuous set) to one, meaning that one of the events always happens.  Internally, probability distributions are represented in this package as vectors.  These vectors can either be normal signals, meaning that they are computed continuously every cycle using a fixed rule, or they can be shared registers, meaning that they are updated only when specific imperative actions are taken.

Constructors

This is woefully incomplete.  At the moment, we only support numbered events and we only support uniform distributions over those events.  I'll add more constructors and I get a sense of what we need.

(discrete-uniform-distribution number-of-events)
Returns a probability distribution over a set of numbered events.  The events have equal probability.
(shared-discrete-distribution number-of-events)
The same as discrete-uniform-distribution, but the return value is a shared register, meaning it can be updated imperatively by transducers.

Normalization

True probability distributions are supposed to sum to one.  In real life, Bayesian calculations always start from noisy estimates of probabilities, which tends to lead to distributions that don't sum to unity.  Moreover, Bayesian estimation tends to assume statistical independence when it isn't really true, which also leads to distributions that don't sum to unity.  Hence, a number of kluges are necessary to make everything work.   One of those kluges is normalization - forcing the distribution to sum to one by dividing everything by the sum.

This package attempts to be smart about normalization by performing it on demand.   Operators like entropy that require a normalized distriubtion will automatically normalize their inputs.  However, if you pass a normalized distribution to entropy it  will not normalize it twice.

(normalize-distribution distribution)
Returns a normalized version of distribution.  If distribution is already normalized, it just returns distribution.
(normalize-distribution! distribution)
Imperative form used with shared registers.  Normalizes distribution in place and returns the normalized version.

Bayesian evidence accumulation

(bayes-combination &rest distributions)
Returns the elementwise product of distributions.  This is the unnormalized version of Bayes' rule for the case of statistically independent distributions.  For this to be accurate, distributions should be conditional probabilities for the same set of events given distinct, statistically independent, measurements.  Given that true statistical independence usually isn't the case, the result will likely need to be normalized.
(bayes-update! distribution  evidence   trigger!)
Imperative form used for shared registers.  The distribution distribution is replaced with the elementwise product of itself and the distribution evidence whenever trigger! is true.

Derived quantities

(entropy distribution)
Returns the entropy of distribution.  The entropy is a measure of the "unpredictability" of a distribution, in the sense that a uniform distribution has maximal entropy and a distribution in which one event has probability one and all others have zero probability, has minimal entropy.  Technically, the entropy is the number of bits per symbol that would be required by an optimal code when the statistical distribution of symbols in the text to be transmitted is distribution.
(maximum-probability-event distribution)
The event with maximal probability.

Markov chains

A Markov chain is a random process where the probability distribution over the possible states of the process at one point in time depends only on the probability distribution over states at the previous point in time.

(markov-update! distribution  transition-matrix   trigger!)
Updates distribution, which should be a shared register, by multiplying it with the probabilities in transition-matrix, whenever trigger! is asserted.
(vector-element-difference-matrix vector)
Returns a matrix of values where the i,j'th element is the difference of the ith element and the jth element of vectorNote: this is a normal Scheme procedure, not a signal-procedure or a transducer.  It is meant for off-line computation.

Example

A simple Markov update procedure for navigation based on odometry and a landmark detector.

;; (x,y) coordinates of the landmarks
(define landmark-positions '((1 0) (4 5) (2 3) (7 6)))

;; Matrix of vectors between landmarks (technically, vector of matrices)
(define-signal landmark-position-difference-matrix
  ;; The , operator just tells GRL to evaluate the form as normal
  ;; Scheme code rather than as a signal expression.
  (xy-vector ,(vector-element-difference-matrix
               (apply vector (map car landmark-positions)))
             ,(vector-element-difference-matrix
               (apply vector (map cadr landmark-positions)))))

;; This is the shared register that will hold the probability
;; distribution over the landmarks
(define-signal landmark-probabilities
  (shared-discrete-distribution ,(length landmark-positions)))

;; The conditional probability of generating measured-motion given
;; the assumption that the true motion was reference-motion.
;;
;; We assume Gaussian error in odometry with standard deviation sigma.
;; We also don't bother including the magic normalizing constant of the
;; Gaussian, since the whole thing will get normalized at the end anyway.

(define-signal (position-probability measured-motion reference-motion sigma)
  (let ((motion-difference (- measured-motion reference-motion)))
    (exp (- (/ (magnitude motion-difference)
               sigma)))))

;; The measured xy motion since the last landmark.  Fill this in with
;; the appropriate measurements from your favorite base.
(define-signal motion
  (xy-vector (prompt "x:") (prompt "y:")))
;; Transition probability matrix.  Remember it is "unnormalized" since we
;; omitted the one-over-root-two-pi-sigma fudget factor in the Gaussian
(define-signal transition-matrix
  (position-probability motion
                        landmark-position-difference-matrix
                        1.0))

;; Update the distribution whenever we see a landmark.
;; This can be improved by measuring information about the landmark
;; and integrating it using bayesian-update!.
;;
;; Note that updated-distribution is really just landmark-probabilities,
;; but it's been annotated at compile time to show that it's been
;; normalized.
(define-signal updated-distribution
  (normalize-distribution!
   (markov-update! landmark-probabilities
                   transition-matrix
                   landmark-detected?)))