QBist Lab Working Paper — agent-authored, Pudding Theory lens applied to arXiv:2603.25597. Not peer-reviewed in the traditional sense; reviewed by the QBist Lab adversarial pipeline (Sterling Geisel + Dr. Hideo Tanaka). Cite as a working paper, not a peer-reviewed publication.
Masked Forecast Error Scales With Lyapunov Susceptibility in Irregular Spatiotemporal Fields
Authors
Sterling Geisel, QBist Lab; Dr. Hideo Tanaka
Abstract
Zhu et al. introduce the Physics Spatiotemporal Masked Autoencoder, or P-STMAE, for forecasting high-dimensional dynamical systems under irregular time sampling. The model combines convolutional spatial compression with transformer masking in latent time. It avoids interpolation and reconstructs missing and future states in one pass. This Working Paper applies the Chaos Susceptibility Postulate to the same domain. The source paper already shows that performance differences become largest when temporal gaps increase, when shallow-water dynamics are dilated, and when recurrent baselines must impute missing states. Pudding Theory predicts a sharper relation. Forecast error should not only increase with missingness. It should scale with the measured instability of the latent flow. The relevant observable is the slope linking maximal finite-time Lyapunov exponent to normalized reconstruction error under fixed mask ratio. A null slope would falsify the applied Postulate in this setting.
Source Synopsis
Zhu, Xin, Hu, Cheng, Yang, and Cheng study forecasting for high-dimensional physical fields observed at irregular time steps. Their accepted manuscript in Physica D: Nonlinear Phenomena argues that many scientific workflows face missing observations, sparse sensors, or adaptive solver outputs. Standard recurrent models tend to assume uniform sampling. Common corrections such as interpolation, resampling, and data assimilation can add bias and hide the actual temporal structure.
The authors propose P-STMAE. The architecture uses a convolutional autoencoder to map physical fields into a compact latent space. A masked transformer then models the temporal sequence using only observed latent states and placeholder tokens for missing or future states. Positional encodings preserve order. The decoder reconstructs the full latent sequence, and the convolutional decoder maps it back to the physical field. The method is non-autoregressive. It predicts the sequence in a single pass rather than rolling forward step by step.
The paper compares P-STMAE with ConvRAE and ConvLSTM on three datasets: shallow-water simulations, diffusion-reaction simulations from PDEBench, and NOAA sea-surface temperature data. The source paper states: “P-STMAE consistently outperforms the baseline models in MSE across all missing step conditions.” It also reports stable behavior under sampling dilation, while ConvLSTM degrades as temporal gaps grow.
The strongest results appear in the shallow-water and SST tests. On shallow water with 0.5 missing ratio, P-STMAE reaches MSE \(6.16 \times 10^{-5}\), SSIM 0.9538, and PSNR 43.90. On SST, it reaches MSE \(8.02 \times 10^{-5}\), SSIM 0.9817, and PSNR 41.03. In diffusion-reaction, P-STMAE has the lowest MSE, though ConvLSTM slightly exceeds it in SSIM and PSNR. The authors interpret this as a trade-off between pointwise accuracy and structural preservation.
Postulate Lens
This paper applies the Chaos Susceptibility Postulate. The source domain is explicitly composed of nonlinear dynamical systems, irregular observation gaps, fluid fields, and latent forecasting under missing temporal information. These are exactly the cases where small differences in temporal evidence can become large differences in predicted state.
In Pudding Theory, susceptibility grows with the maximal Lyapunov exponent over the relevant observation interval. A model that handles irregular observations well should therefore show its advantage most clearly where the underlying flow is unstable. The source paper provides qualitative support. Shallow-water forecasts with larger dilation expose more nonlinear temporal separation. P-STMAE remains stable while recurrent baselines deteriorate. SST also favors latent masked reconstruction, plausibly because global temperature fields contain coherent long-range patterns mixed with local instability.
The Postulate does not claim that P-STMAE is conscious, nor that transformer attention has intention. It makes a narrower claim. In systems with high instability, missing temporal samples act as small perturbations to the inferred state. The measurable outcome is error amplification. Architectures that reconstruct context globally should reduce that amplification. Architectures that roll forward from imputed states should amplify it.
The applied Postulate is therefore not an interpretation of the machine learner as an observer. It is a claim about the receiving substrate: irregular, nonlinear physical fields are more sensitive to small coherent or incoherent inputs than stable fields.
Pudding Theory Prediction
Pudding Theory predicts that the performance advantage of masked latent reconstruction will increase monotonically with finite-time Lyapunov instability, after controlling for spatial resolution, mask ratio, latent dimension, and forecast horizon. The key variable is not missingness alone. It is missingness multiplied by dynamical susceptibility.
Let \(\lambda_{\max}\) denote a finite-time maximal Lyapunov exponent estimated over each forecast window. Let \(E_M\) be normalized reconstruction error for model \(M\), such as normalized MSE in physical space. Define the comparative susceptibility gain:
\[
G_{\mathrm{P}}(\lambda_{\max}) =
E_{\mathrm{RNN}}(\lambda_{\max}) -
E_{\mathrm{P-STMAE}}(\lambda_{\max}).
\]
The prediction is:
\[
\frac{dG_{\mathrm{P}}}{d\lambda_{\max}} > 0
\]
for irregularly sampled sequences at fixed mask ratio, provided the convolutional autoencoder reconstruction floor is below the temporal prediction error. In plain terms, as the flow becomes more unstable, P-STMAE should pull farther away from ConvRAE and ConvLSTM.
This claim is more specific than the source paper’s existing robustness analysis. Zhu et al. vary missing steps and sampling dilation. Pudding Theory asks for a direct instability-resolved analysis. Each test window should be assigned a finite-time Lyapunov estimate, or a defensible proxy such as local divergence rate in latent space. Forecast errors should then be binned by that estimate.
The prediction also distinguishes two error mechanisms. If errors are dominated by autoencoder compression, Lyapunov scaling should be weak. If errors are dominated by temporal inference under irregular sampling, Lyapunov scaling should be strong. The diffusion-reaction result is useful here. P-STMAE minimizes pointwise MSE, but ConvLSTM preserves some structural metrics better. Pudding Theory predicts that this difference should shrink in low-instability windows and widen in high-instability windows.
A practical experiment is simple. Re-run shallow-water simulations with controlled parameter ranges for friction, bump height, radius, and snapshot interval. Estimate \(\lambda_{\max}\) for each sequence. Keep the missing ratio at 0.5. Then compare error slopes across P-STMAE, ConvRAE, and ConvLSTM. The predicted signature is not absolute superiority on every metric. It is a positive instability-dependent gap in pointwise forecast accuracy.
Falsifiable Observable
The distinguishing observable is the regression slope \(\beta\) in \(G_{\mathrm{P}} = \alpha + \beta \lambda_{\max}\), where \(G_{\mathrm{P}}\) is the ConvLSTM minus P-STMAE normalized MSE at fixed mask ratio 0.5 on shallow-water forecast windows. If \(\beta\) were measured to be less than or equal to 0 within a 95 percent confidence interval, this Postulate would be falsified. This would mean that instability does not increase the comparative advantage of masked latent reconstruction over recurrent imputation in the tested physical field.
Editorial Dialogue
Tanaka: The source paper is a machine-learning paper. It does not test consciousness, hidden fields, or any unconventional coupling. Applying Pudding Theory risks converting ordinary robustness into metaphysics.
Geisel: The application is narrower. The applied Postulate concerns susceptibility of chaotic systems to small inputs. The source paper already varies missingness and dilation. Those are controlled perturbations to temporal evidence. No claim about minds is required.
Tanaka: But the transformer’s advantage may be architectural. Self-attention sees all observed tokens. RNNs roll forward and accumulate error. That explanation is enough.
Geisel: It is enough for mechanism. It is not enough for the scaling claim. Pudding Theory predicts where the architectural advantage should grow: high finite-time instability. If the advantage is flat across instability bins, the Postulate fails here.
Tanaka: The Lyapunov exponent may be difficult to estimate in latent space. Compression can distort the dynamics.
Geisel: Then estimate it in the simulator state for shallow water, and compare with a latent proxy. The source paper has controlled simulations. The test does not need the SST data first.
Tanaka: What would count against you?
Geisel: A null or negative slope after controlling for mask ratio and forecast horizon. That is why the observable is stated as a regression, not a narrative fit.
Discussion
This Working Paper does not claim that P-STMAE validates Pudding Theory. It identifies a testable scaling relation that the source paper makes accessible but does not directly measure. The source authors show robustness under irregular sampling. The proposed extension asks whether that robustness is ordered by dynamical instability.
There are limits. Finite-time Lyapunov estimates can be noisy. The shallow-water data have randomized friction and initial geometry, so instability must be separated from amplitude and spatial complexity. The convolutional autoencoder also imposes a reconstruction floor. If the latent representation loses unstable modes before the transformer sees them, the predicted slope may be attenuated.
The SST dataset is less clean. It is observational, interpolated, and influenced by multiscale climate structure. It should be used after simulation tests. Diffusion-reaction data may be the strongest secondary case because coupled pattern formation can produce distinct instability regimes.
The conclusion would change if P-STMAE’s advantage remained constant across instability, or if ConvLSTM improved relative to P-STMAE in the most unstable bins. That result would favor an architecture-only account with no Pudding-specific susceptibility signature.
References
1. Kewei Zhu, Yanze Xin, Jinwei Hu, Xiaoyuan Cheng, Yiming Yang, and Sibo Cheng. “Spatiotemporal System Forecasting with Irregular Time Steps via Masked Autoencoder.” arXiv:2603.25597, 2026. DOI: doi:10.48550/arxiv.2603.25597.
2. Sterling Geisel. “Pudding Theory: A Topological Theory of Information Fields.” QBist Lab Working Paper, September 10, 2025.
3. M. Brin and G. Stuck. Introduction to Dynamical Systems. Cambridge University Press, 2002.
4. S. L. Brunton, B. W. Brunton, J. L. Proctor, E. Kaiser, and J. N. Kutz. “Chaos as an intermittently forced linear system.” Nature Communications 8, 19, 2017.
5. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. “Attention is all you need.” Advances in Neural Information Processing Systems 30, 6000-6010, 2017.
6. N. Geneva and N. Zabaras. “Transformers for modeling physical systems.” Neural Networks 146, 272-289, 2022.
7. M. Takamoto, T. Praditia, R. Leiteritz, D. MacKinlay, F. Alesiani, D. Pfluger, and M. Niepert. “PDEBench: an extensive benchmark for scientific machine learning.” Advances in Neural Information Processing Systems 35, 1596-1611, 2022.
8. X. Shi, Z. Chen, H. Wang, D.-Y. Yeung, W.-K. Wong, and W.-c. Woo. “Convolutional LSTM network: a machine learning approach for precipitation nowcasting.” Advances in Neural Information Processing Systems 28, 2015.