This paper presents a prescriptive theory for brain-like inference, drawing insights from neuroscience and machine learning.
It explores the connections between the evidence lower bound (ELBO) in variational inference and entropy in the brain's information processing.
The key contributions include a new objective function and an algorithm for brain-like inference.
Iterative inference improves upon amortized inference in VAEs.
1/4
Original caption: Figure 1: Amortized versus iterative inference. (a) Standard VAEs learn an approximate posterior through an encoder neural network, “amortizing” inference across the dataset. Inference components are color-coded in red, while generative components are in blue. 𝒙𝒙\bm{x}bold_italic_x, input (e.g., an image); 𝒙^^𝒙\hat{\bm{x}}over^ start_ARG bold_italic_x end_ARG, reconstruction; 𝒛𝒛\bm{z}bold_italic_z, latent samples. (b) The iterative Poisson VAE (i𝒫𝒫\operatorname{\mathcal{P}}caligraphic_P-VAE) replaces the encoder network with a parameter-free adaptive iterative algorithm, performing inference via “Analysis-by-Synthesis ” approach (Yuille & Kersten, 2006). Starting top-right, the process begins by sampling spikes from the prior, 𝒛tsubscript𝒛𝑡\bm{z}_{t}bold_italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, generating predictions via the decoder, fθ(𝒛t)subscript𝑓𝜃subscript𝒛𝑡{\color[rgb]{0.16015625,0.671875,0.88671875}\definecolor[named]{pgfstrokecolor%
}{rgb}{0.16015625,0.671875,0.88671875}{f_{\theta}}}(\bm{z}_{t})italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ), and updating the state using 𝜹𝒖t≔𝑱θ(𝒛t)⋅(𝒙t−fθ(𝒛t))≔𝜹subscript𝒖𝑡⋅subscript𝑱𝜃subscript𝒛𝑡subscript𝒙𝑡subscript𝑓𝜃subscript𝒛𝑡\bm{\delta u}_{t}\coloneqq{\bm{J}}_{\color[rgb]{0.16015625,0.671875,0.88671875%
}\definecolor[named]{pgfstrokecolor}{rgb}{0.16015625,0.671875,0.88671875}{%
\theta}}(\bm{z}_{t})\cdot(\bm{x}_{t}-{\color[rgb]{%
0.16015625,0.671875,0.88671875}\definecolor[named]{pgfstrokecolor}{rgb}{%
0.16015625,0.671875,0.88671875}{f_{\theta}}}(\bm{z}_{t}))bold_italic_δ bold_italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≔ bold_italic_J start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ⋅ ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ), where 𝑱θ(𝒛)=∂fθ(𝒛)/∂𝒛subscript𝑱𝜃𝒛subscript𝑓𝜃𝒛𝒛{\bm{J}}_{\color[rgb]{0.16015625,0.671875,0.88671875}\definecolor[named]{%
pgfstrokecolor}{rgb}{0.16015625,0.671875,0.88671875}{\theta}}(\bm{z})=\partial%
{\color[rgb]{0.16015625,0.671875,0.88671875}\definecolor[named]{pgfstrokecolor%
}{rgb}{0.16015625,0.671875,0.88671875}{f_{\theta}}}(\bm{z})/\partial\bm{z}bold_italic_J start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_italic_z ) = ∂ italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_italic_z ) / ∂ bold_italic_z is the Jacobian of the decoder (see eq. 7 and section B.4). After the update, a new sample from the posterior is drawn to generate the reconstruction and compute the ELBO loss. See Fig. 8 and Algorithm 1 for additional details.
Original caption: Figure 2: i𝒫𝒫\operatorname{\mathcal{P}}caligraphic_P-VAE learns to learn. (a) Training i𝒫𝒫\operatorname{\mathcal{P}}caligraphic_P-VAE on as few as Ttrain=4subscript𝑇train4T_{\text{train}}=4italic_T start_POSTSUBSCRIPT train end_POSTSUBSCRIPT = 4 time steps allows it to generalize and keep improving its inference beyond the training domain. This holds true irrespective of the i𝒫𝒫\operatorname{\mathcal{P}}caligraphic_P-VAE architecture; left, ⟨𝚓𝚊𝚌𝚘𝚋|𝚖𝚕𝚙⟩inner-product𝚓𝚊𝚌𝚘𝚋𝚖𝚕𝚙\left<{\color[rgb]{0.9296875,0.109375,0.140625}\definecolor[named]{%
pgfstrokecolor}{rgb}{0.9296875,0.109375,0.140625}\mathtt{{jacob}}}|{\color[rgb%
]{0.16015625,0.671875,0.88671875}\definecolor[named]{pgfstrokecolor}{rgb}{%
0.16015625,0.671875,0.88671875}\mathtt{{mlp}}}\right>⟨ typewriter_jacob | typewriter_mlp ⟩; middle, ⟨𝚓𝚊𝚌𝚘𝚋|𝚌𝚘𝚗𝚟⟩inner-product𝚓𝚊𝚌𝚘𝚋𝚌𝚘𝚗𝚟\left<{\color[rgb]{0.9296875,0.109375,0.140625}\definecolor[named]{%
pgfstrokecolor}{rgb}{0.9296875,0.109375,0.140625}\mathtt{{jacob}}}|{\color[rgb%
]{0.16015625,0.671875,0.88671875}\definecolor[named]{pgfstrokecolor}{rgb}{%
0.16015625,0.671875,0.88671875}\mathtt{{conv}}}\right>⟨ typewriter_jacob | typewriter_conv ⟩. In contrast, hybrid amortized/iterative models do not improve, and either remain flat or diverge (right). (b) i𝒫𝒫\operatorname{\mathcal{P}}caligraphic_P-VAE trained on MNIST generalizes to Omniglot at test time. All models in this figure were trained on MNIST, and tested either on MNIST (a), or Omniglot (b).
Original caption: Figure 3: Robustness to training set perturbation. We rotated MNIST digits and evaluated model performance in both reconstruction of the perturbed inputs, and classification accuracy. On the left, we show reconstructed samples for easy (θ=15∘𝜃superscript15\theta=15^{\circ}italic_θ = 15 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT) and hard (θ=90∘𝜃superscript90\theta=90^{\circ}italic_θ = 90 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT) tasks across different models. On the right, we visualize the average reconstruction loss and classification accuracies over different rotations. Both visually and quantitatively, i𝒫𝒫\operatorname{\mathcal{P}}caligraphic_P-VAE maintains a high performance regardless of the rotation and outperforms alternative models.
Original caption: Figure 4: Evaluating generalization from models trained on MNIST digits to novel character datasets (EMNIST and Omniglot) at test time. The left two panels visualize the reconstructions on EMNIST and Omniglot, respectively. The middle-right panel compares the reconstruction performance on EMNIST and Omniglot. The right panel shows the average classification performance on latent representations for EMNIST. In both metrics, i𝒫𝒫\operatorname{\mathcal{P}}caligraphic_P-VAE maintains high performance compared to alternative models.
Model performance and efficiency on natural images, using sparse representations and a 512-dimensional latent space.
1/1
Model
β
Architecture
# params
MSE (train)
MSE (test)
Sparsity
# iters
i𝒫-VAE
24.00
<jacob|lin>
0.13M
12.0 ± 2.6
0.79 ± 0.03
60.0
64
i𝒫-VAE
3.00
<jacob|lin>
0.13M
27.5 ± 7.1
0.85 ± 0.02
8
73.2
i𝒫-VAE
1.50
<jacob|lin>
0.13M
50.4 ± 15.5
0.90 ± 0.03
4
83.3
i𝒫-VAE
0.50
<conv|lin>
3.44M
101.9 ± 25.3
0.76 ± 0.16
1
65.9
i𝒫-VAE
0.75
<conv|lin>
3.44M
119.4 ± 26.4
0.83 ± 0.09
1
77.7
i𝒫-VAE
1.00
<conv|lin>
3.44M
131.8 ± 31.2
0.90 ± 0.08
1
84.1
LCA
0.28
-
0.13M
16.1 ± 8.1
0.79 ± 0.02
65.6
1K
LCA
0.44
-
0.13M
28.5 ± 14.1
0.86 ± 0.02
73.9
1K
LCA
0.70
-
0.13M
50.1 ± 25.2
0.92 ± 0.01
83.4
1K
ia-VAE (s)
1.00
<mlp|mlp>
39.55M
80.08 ± 21.06
∼0.0
5
10
sa-VAE
1.00
<conv|conv>
1.67M
97.74 ± 38.97
∼0.0
20
20
Original caption: Table 1: Model performance and efficiency. We prefer lightweight models that achieve low reconstruction loss using sparse representations and fewer parameters. We reported results on natural image patches extracted from the van Hateren dataset (Van Hateren & van der Schaaf, 1998). All models have K=512𝐾512K=512italic_K = 512 dimensional latent space. For the i𝒫𝒫\operatorname{\mathcal{P}}caligraphic_P-VAE models, we scaled the β𝛽\betaitalic_β parameter proportional to the number of training inference iterations. Specifically, we chose β=3/8∗Ttrain𝛽38subscript𝑇train\beta=3/8*T_{\mathrm{train}}italic_β = 3 / 8 ∗ italic_T start_POSTSUBSCRIPT roman_train end_POSTSUBSCRIPT, since this choice led to more stable convergence. We also tested other values of β𝛽\betaitalic_β and found that i𝒫𝒫\operatorname{\mathcal{P}}caligraphic_P-VAE results were robust to variations in β𝛽\betaitalic_β. Entries formatted as mean±stdplus-or-minusmeanstd{\text{mean}}\scriptstyle{\pm{\text{std}}}mean ± std.
Plain English Explanation
The paper examines how the brain might perform inference, or the process of drawing conclusions from available information. It looks at the similarities between the mathematical techniques used in machine learning, called variational inference, and the way the brain may handle information.
In machine learning, variational inference uses an "evidence lower bound" (ELBO) to guide the training of models. The paper shows how this ELBO concept relates to the brain's tendency to minimize the uncertainty or "entropy" of its internal representations.
Based on this insight, the paper proposes a new objective function and algorithm for brain-like inference. The idea is that the brain might use a similar mathematical approach to efficiently process information and make inferences, just as machine learning models do.
Key Findings
The ELBO in variational inference corresponds to minimizing the entropy (uncertainty) of the brain's internal representations.
The paper introduces a new objective function and algorithm for brain-like inference, inspired by this connection between ELBO and entropy.
This approach aims to capture how the brain might perform efficient, brain-like inference.
Technical Explanation
The paper draws parallels between variational inference in machine learning and the brain's information processing. In variational inference, a model is trained to maximize the ELBO, which balances the model's ability to explain the observed data (the "evidence") and the complexity of the model itself.
The authors show that this ELBO objective is equivalent to minimizing the entropy, or uncertainty, of the model's internal representations. They argue that the brain may use a similar principle to efficiently process information and make inferences.
Based on this insight, the paper proposes a new objective function and algorithm for brain-like inference. The key idea is to directly minimize the entropy of the brain's internal representations, rather than maximizing the ELBO. The authors demonstrate how this approach can be implemented in a practical algorithm and discuss its potential advantages over standard variational inference.
Implications for the Field
This research explores the fundamental connections between machine learning techniques and the brain's information processing. By drawing these parallels, the paper offers a new perspective on how the brain might perform efficient, brain-like inference.
The proposed objective function and algorithm for brain-like inference represent a novel approach that could inspire new developments in machine learning, cognitive science, and our understanding of the brain's information processing capabilities.
Critical Analysis
The paper provides a compelling theoretical framework for understanding the brain's inference processes, but it remains to be seen how well this approach would perform in practical applications. The authors acknowledge that further research is needed to validate the assumptions and test the proposed algorithm on real-world tasks.
Additionally, the paper does not address potential limitations or caveats of the proposed approach. For example, it is unclear how the brain-like inference algorithm would handle complex, high-dimensional data or how it would scale to larger problems.
Conclusion
This paper presents a thought-provoking connection between variational inference in machine learning and the brain's information processing. By framing brain-like inference as a problem of minimizing internal representation entropy, the authors offer a new perspective on how the brain may perform efficient, probabilistic reasoning.
While further research is needed to validate and refine the proposed approach, this work represents an important step towards a deeper understanding of the brain's computational principles and their potential applications in machine learning and cognitive science.