1 Introduction: Energy-Based Models

  • scalar energy to each configuration of the variable
  • inference: set observed -> find remaining, minimize the energy
  • learning: energy function, low energies to correct values
  • loss functional: quality of energy functions
  • probabilistic or non-probabilistic
  • no normalization constant, flexible

1.1 Energy-Based Inference

  • example: pixel -> label
  • contrast functions, value functions, or negative log-likelihood functions, \(E(Y,X)\)
  • discrete or continuous, dimension...
  • all kinds of optimization techniques

1.2 What Questions Can a Model Answer?

  • What is the Y that is most compatible with this X? predict, classify, decision-making
  • ranking, detection (threshold), conditional probability (given to a human or another system)
  • \(X\) high \(Y\) low (common); converse: image restoration, CG, generation; both high: complex! e.g. hyper-resolution

1.3 Decision Making versus Probabilistic Modeling

  • uncalibrated, not commensurate, probability, Gibbs distribution, temperature, partition function, statistical physics
  • convergence, restrict, intractable

2 Energy-Based Training: Architecture and Loss Function

  • a family of energy functions indexed by a parameter \(W\)
  • architecture, the internal structure of the parameterized function \(E(W, Y, X)\)
    • real vectors, a linear combination of basis functions (kernel methods)
    • neural
  • training samples, prior knowledge, loss functional \(\mathcal L(E,\mathcal S)\), loss function, \(\mathcal L(W,\mathcal S)\)
  • \(W^* = min_{W\in \mathcal W}\mathcal L(W,\mathcal S)\)
    • \(\mathcal L(E,\mathcal S) = \frac 1P\sum_{i=1}^P L(Y^i, E(W,\mathcal y, X^i)) + R(W)\)
    • \(Y^i\): answer, fixed. \(\mathcal y\): varying answer
    • \(R(W)\): regularize, prior
  • use theories of statistical learning

2.1 Designing a Loss Functional

  • shape the energy surface
  • push, pull
  • the architecture (model), the loss function, the learning algorithm (3 elements shared with common ML), the inference algorithm
  • prior: architecture and the loss function (regularize)
  • effective, efficient

2.2 Examples of Loss Functions

  • concentrate on the data-dependent part
  • discuss, 'good', 'bad'
  • energy loss: just the energy, push down, not pull up, collapsed solution, zero
    • works: automatically pull, \(E(W, Y^i,X^i)=||Y^i-G(W,X^i)||^2\), inference is trivial, MSE