Vincec's Dimension

# AI Review Note - Probability

Word count: 779 / Reading time: 5 min
2018/11/21 Share

# Quantifying Uncertainty

## Uncertainty

• Outside scope
• Too complex
• Too expensive or risky
• Randomness problem/information

## Probability

### Notation

• P(X = xi)
• P(X): Distribution on all values of X, e.g. P(X, Y), P(X, y)
• P is the whole table (AKA Joint probability distribution), Y is the set of all varibles yi; y is a one varible

### Two axioms

• Sum axiom: P(A | B) + P(~A | B) = 1
• Product axiom: P(AB | C) = P(A | C) P(B | AC) = P(B | C) P(A | BC)

### Subjective Probability

Mainly disscussing

## Basic Concepts

• P(A ∨ B) = P(A) + P(B) - P(A ∧ B)
• P(x ∧ y) = P(x) * P(y) if x and y are independent
• P(x ∧ y) = 0 if x and y are mutually exclusive
• P(x) = P(y) if x = y (equivalent)
• P(x1 ∨ … ∨ Xn) = 1 if the xi‘s are exhaustive, e.g. P(x ∨ ~x) = 1
• P(~x) = 1 - P(x)

### Domains of Variables

Must be a partition for cover all, exhaustive and mutually exclusive, must yield 1

• Boolean, <true, false>
• Discrete Random, countable, <A, B, C, D, F>
• Continuous Random, [0, +∞]

## Atomic Events

A complete specification of state of the world about uncertain agent, like the model in logic.

The set of possible atomic events is a partition

# Probability Distribution Model

Variables or Value assignments in Table or graph

## Joint Probability Distribution

A set of all random variables gives the probability of every atomic event on those random variables.

### Fully Joint Probability Distribution

Fully is list all situation.

### Full Joint (Discrete) Distributions

• A complete probability model. Showing every entry for all variables.
• Possible world are mutually exclusive(独立) and exhausitve(完全覆盖为1)
• Number of possible world: the product of the size of each variable

# Inferences Rules

• Sum Rule
• Product Rule
• Conditional
• Marginalization
• Normalization

## Sum Rule

P(A | B) + P(~A | B) = 1

## Production Rule

• P(AB | C) = P(A | C) P(B | AC) = P(B | C) P(A | BC)
• P(A ∧ B) = P(AB) = P(B) P(A | B) = P(A) P(B | A)

## Conditional Probability

• P(A | B) = P(A ∧ B) / P(B) if P(B) != 0
• P(A ∧ B) = P(A | B) P(B) = P(B | A) P(A) [Production Rule]
• P(A, B) = P(A | B) * P(B) [, == ∧]

### Bayes’ Rule

P(A | B) = P(B | A) * P(A) / P(B)

## Normalization

Σ P(A = ai | B = b) = 1, so 1 / P(B = b) = a, a is a normalization factor

• P(A | B = b) = a*P(B = b | A)*P(A) = a*P(B = b ∧ A)
• P(X | e) = P(X,e)/P(e) = aP(X,e) = aΣyP(X,e,y)

E.g.:

• P(B = b | A) P(A) = a<0.4, 0.2, 0.2> = <0.5, 0.25, 0.25>

## Marginalization

Adding a variable as an extra condition

P(X | Y) = Σz P(X | Y, Z = z) * P(Z = z|Y)

## Probabilistic Inference by Enumeration(枚举)

Adding if it is ture: probability called marginal probability.

# Multiple Sources

## Independent

• A and B are independent iff P(A|B) = P(A) or P(B|A) = P(B) or P(A, B) = P(A)*P(B)
• Can reduce entity set of the whole table/graph

## Conditional Independence

• P(A, B) = P(A) * P(B) [original independence relationship]
• P(A, B | C) = P(A | C) * P(B | C) [conditional independence relationship]
• P(X | Y, Z) = P(X | Z)

## Combining Evidence

P(A | B, C) = aP(B, C| A) * P(A) [Bayes’ Rule] = aP(B | A) * P(C | A) * P(A) [Conditonal Independence]

# Decision Making

Decision Theory = probability + utility theory, Choose action/option with highest expected utilitya

## Bayesian Network

### Syntax

• A conditional distribution for each node given its parents

### Advantage (Bayesian Network VS Fully Joint Distribution Table)

Size = O(n * dk) VS. O(dn), K: # of parents; d: # in variable domain

### Enumeration Algorithm

Brute Force of P(H | E)

1. Apply the conditional probability rule: P(H | E) = P(H ∧ E) / P(E)
2. Appply the marginal distribution rule to the unknown vertices U: P(H ∧ E) = ΣU=u P(H ∧ E ∧ U = u) [whole table]
3. Apply joint distribution rule for Bayesian Network: P(X1, …, Xn) = Pii=1n P(Xi | Parents(Xi)) [Using the graph networkas]

### Inference Algorithms

• Approximate Inference: Monte Carlo
• Direct Sampling: Sampling with not evidence / Reset and generate another sample / P(Sample with ALL False)
• Rejection Sampling: not waste time when sample contradicts evidence
• Likelihood Weighting: using likelihood of observed value instead of equally weighting

• slides

Author: VINCEC