Vincec's Dimension

AI Review Note - Probability

Word count: 779 / Reading time: 5 min
2018/11/21 Share

Quantifying Uncertainty

Uncertainty

  • Outside scope
  • Too complex
  • Too expensive or risky
  • Randomness problem/information

Probability

Notation

  • P(X = xi)
  • P(X): Distribution on all values of X, e.g. P(X, Y), P(X, y)
    • P is the whole table (AKA Joint probability distribution), Y is the set of all varibles yi; y is a one varible

Two axioms

  • Sum axiom: P(A | B) + P(~A | B) = 1
  • Product axiom: P(AB | C) = P(A | C) P(B | AC) = P(B | C) P(A | BC)

Objective Probability

P(heads) = P(tails) = 1/2

Subjective Probability

Mainly disscussing

Basic Concepts

  • P(A ∨ B) = P(A) + P(B) - P(A ∧ B)
  • P(x ∧ y) = P(x) * P(y) if x and y are independent
  • P(x ∧ y) = 0 if x and y are mutually exclusive
  • P(x) = P(y) if x = y (equivalent)
  • P(x1 ∨ … ∨ Xn) = 1 if the xi‘s are exhaustive, e.g. P(x ∨ ~x) = 1
  • P(~x) = 1 - P(x)

Domains of Variables

Must be a partition for cover all, exhaustive and mutually exclusive, must yield 1

  • Boolean, <true, false>
  • Discrete Random, countable, <A, B, C, D, F>
  • Continuous Random, [0, +∞]

Atomic Events

A complete specification of state of the world about uncertain agent, like the model in logic.

The set of possible atomic events is a partition

Probability Distribution Model

Variables or Value assignments in Table or graph

Joint Probability Distribution

A set of all random variables gives the probability of every atomic event on those random variables.


Fully Joint Probability Distribution

Fully is list all situation.

Full Joint (Discrete) Distributions

  • A complete probability model. Showing every entry for all variables.
  • Possible world are mutually exclusive(独立) and exhausitve(完全覆盖为1)
  • Number of possible world: the product of the size of each variable

Inferences Rules

  • Sum Rule
  • Product Rule
  • Conditional
  • Marginalization
  • Normalization

Sum Rule

P(A | B) + P(~A | B) = 1

Production Rule

  • P(AB | C) = P(A | C) P(B | AC) = P(B | C) P(A | BC)
  • P(A ∧ B) = P(AB) = P(B) P(A | B) = P(A) P(B | A)

Conditional Probability

  • P(A | B) = P(A ∧ B) / P(B) if P(B) != 0
  • P(A ∧ B) = P(A | B) P(B) = P(B | A) P(A) [Production Rule]
  • P(A, B) = P(A | B) * P(B) [, == ∧]

Bayes’ Rule

P(A | B) = P(B | A) * P(A) / P(B)



source

Degrees of belief

Normalization

Σ P(A = ai | B = b) = 1, so 1 / P(B = b) = a, a is a normalization factor

  • P(A | B = b) = a*P(B = b | A)*P(A) = a*P(B = b ∧ A)
  • P(X | e) = P(X,e)/P(e) = aP(X,e) = aΣyP(X,e,y)

E.g.:

  • P(B = b | A) P(A) = a<0.4, 0.2, 0.2> = <0.5, 0.25, 0.25>

Marginalization

Adding a variable as an extra condition

P(X | Y) = Σz P(X | Y, Z = z) * P(Z = z|Y)

Probabilistic Inference by Enumeration(枚举)

Adding if it is ture: probability called marginal probability.

Multiple Sources

Independent

  • A and B are independent iff P(A|B) = P(A) or P(B|A) = P(B) or P(A, B) = P(A)*P(B)
  • Can reduce entity set of the whole table/graph

Conditional Independence

  • P(A, B) = P(A) * P(B) [original independence relationship]
  • P(A, B | C) = P(A | C) * P(B | C) [conditional independence relationship]
  • P(X | Y, Z) = P(X | Z)

Combining Evidence

P(A | B, C) = aP(B, C| A) * P(A) [Bayes’ Rule] = aP(B | A) * P(C | A) * P(A) [Conditonal Independence]

Decision Making

Decision Theory = probability + utility theory, Choose action/option with highest expected utilitya

Bayesian Network

Syntax

  • Links between nodes, no link => independent
  • A conditional distribution for each node given its parents

Advantage (Bayesian Network VS Fully Joint Distribution Table)

Size = O(n * dk) VS. O(dn), K: # of parents; d: # in variable domain

independence relation

Enumeration Algorithm

Brute Force of P(H | E)

  1. Apply the conditional probability rule: P(H | E) = P(H ∧ E) / P(E)
  2. Appply the marginal distribution rule to the unknown vertices U: P(H ∧ E) = ΣU=u P(H ∧ E ∧ U = u) [whole table]
  3. Apply joint distribution rule for Bayesian Network: P(X1, …, Xn) = Pii=1n P(Xi | Parents(Xi)) [Using the graph networkas]

Inference Algorithms

  • Approximate Inference: Monte Carlo
  • Direct Sampling: Sampling with not evidence / Reset and generate another sample / P(Sample with ALL False)
  • Rejection Sampling: not waste time when sample contradicts evidence
  • Likelihood Weighting: using likelihood of observed value instead of equally weighting

Markov Chain Monte Carlo

Dempster-Shafer Theory

Fuzzy Logic

Reference

  • slides
CATALOG
  1. 1. Quantifying Uncertainty
    1. 1.1. Uncertainty
    2. 1.2. Probability
      1. 1.2.1. Notation
      2. 1.2.2. Two axioms
      3. 1.2.3. Objective Probability
      4. 1.2.4. Subjective Probability
    3. 1.3. Basic Concepts
      1. 1.3.1. Domains of Variables
    4. 1.4. Atomic Events
  2. 2. Probability Distribution Model
    1. 2.1. Joint Probability Distribution
      1. 2.1.1. Fully Joint Probability Distribution
      2. 2.1.2. Full Joint (Discrete) Distributions
  3. 3. Inferences Rules
    1. 3.1. Sum Rule
    2. 3.2. Production Rule
    3. 3.3. Conditional Probability
      1. 3.3.1. Bayes’ Rule
      2. 3.3.2. Degrees of belief
    4. 3.4. Normalization
    5. 3.5. Marginalization
    6. 3.6. Probabilistic Inference by Enumeration(枚举)
  4. 4. Multiple Sources
    1. 4.1. Independent
    2. 4.2. Conditional Independence
    3. 4.3. Combining Evidence
  5. 5. Decision Making
    1. 5.1. Bayesian Network
      1. 5.1.1. Syntax
      2. 5.1.2. Advantage (Bayesian Network VS Fully Joint Distribution Table)
      3. 5.1.3. independence relation
      4. 5.1.4. Enumeration Algorithm
      5. 5.1.5. Inference Algorithms
      6. 5.1.6. Markov Chain Monte Carlo
      7. 5.1.7. Dempster-Shafer Theory
      8. 5.1.8. Fuzzy Logic
  6. 6. Reference