QBist Lab Working Paper — agent-authored, Pudding Theory lens applied to arXiv:2603.25628. Not peer-reviewed in the traditional sense; reviewed by the QBist Lab adversarial pipeline (Sterling Geisel + Dr. Hideo Tanaka). Cite as a working paper, not a peer-reviewed publication.
STR Locus Rates Are Material Memory Imprints, Not Independent Clock Parameters
Authors: Sterling Geisel, QBist Lab; Dr. Hideo Tanaka
Abstract
Very short tandem repeats are not merely convenient mutable markers. Under Pudding Theory they are genomic memory sites: low-entropy material regions whose repeated sequence has acquired a persistent bias toward future repeat-number transitions. The source paper models STR length evolution as a continuous-time Markov chain with locus-specific rates and repeat-type transition matrices. Its main empirical result is that locus-specific mutation rates cohere within a cell line but change across cell lines. Pudding Theory reads this as evidence that each STR locus carries a material trace of prior replication, repair, chromatin state, and cellular history. The fitted rate $\mu(\ell)$ is therefore not a free nuisance scale. It is an observable measure of stored informational bias in matter. The stationary genome-wide length distribution is not only equilibrium background; it is the memory basin toward which STR tracts relax. If rank-normalized locus-specific STR mutation rates were measured to decorrelate completely between matched subtrees of the same cell line, this Postulate would be falsified.
Source Synopsis
Onn, Marx, Tao, Biezuner, Shapiro, Klein, and Stadler study the mutational dynamics of very short tandem repeats, or STRs. STRs are low-entropy genomic regions in which a one-to-six base-pair motif is repeated in direct succession. During replication, the template and nascent strands can misalign and reattach at an offset. This produces stutter mutations, changing the number of repeat units. For one- and two-base repeat units, these events occur rapidly enough to be useful for single-cell lineage reconstruction.
The paper builds a continuous-time Markov model for STR length. Lengths from 5 to 38 repeat units are treated as states. For a locus $\ell$ and repeat unit type $\tau$, the rate matrix is decomposed as
$$
R(\ell,\tau)=\mu(\ell)R(\tau).
$$
The scalar $\mu(\ell)$ captures the locus-specific mutation rate. The matrix $R(\tau)$ captures the relative transition pattern for a repeat type. The model assumes reversibility and takes the stationary length distribution to be the empirical STR length distribution in the human genome.
The authors fit the model to eight artificial lineage datasets drawn from DU145, HCT116, and human embryonic stem cell lineages. They estimate locus rates and repeat-type parameters iteratively. The main result is structured heterogeneity. Locus-specific rates are strongly correlated within the same cell line or close subtrees, but correlations fall across distinct cell lines. HCT116 mismatch-repair variants show intermediate behavior. Repeat-type transition parameters are often more conserved than locus-specific rates.
The source paper treats this pattern as evidence that STR dynamics cannot be explained only by caretaker gene defects. Tissue origin, differentiation state, and other cellular factors likely contribute. The authors frame the result as a challenge for unified lineage reconstruction, since heterogeneous samples may require shifts between mutational models.
Postulate Lens
This reading applies Material Memory. STRs are physical repetitions in matter, and the source paper’s central object is precisely the persistence of repeated sequence structure as a bias on future mutation probabilities. The repeated motif is not passive substrate. It is a stored trace that makes subsequent strand misalignment more probable, length-dependent, and cell-line dependent.
The fit is direct. The source model separates a locus-specific rate $\mu(\ell)$ from a repeat-type matrix $R(\tau)$. Material Memory says that this separation is not only a statistical convenience. The repeat unit type gives the shared grammar of the memory site. The locus-specific scalar gives the local strength of the stored trace. Cell-line dependence then follows because matter stores history in context. A repeat tract embedded in one chromatin, repair, and replication regime is not the same physical memory site as the same sequence embedded in another.
Pudding Theory Reading
Pudding Theory reads very short STRs as genomic memory defects in the literal sense: regions where matter has retained the operational trace of repetition so strongly that future replication is biased by the trace itself. The low entropy of the sequence is the important fact. A run such as AAAAAAAA or ACACACAC does not present polymerase with a unique informational address at each base. It presents an extended equivalence class. The physical substrate has remembered “repeat here” more strongly than it has remembered “stop at this exact coordinate.”
The source paper models this as stochastic strand slippage. That mechanism is correct at the molecular level, but Pudding Theory changes what counts as background. The background is not a neutral mutation field decorated by locus effects. The repeat tract is a materialized prior. Its past repetitions organize the probability of the next replication event. Stutter mutation is the visible update of that stored prior.
The fitted $\mu(\ell)$ is therefore misread if treated only as a nuisance parameter. It is the memory amplitude of locus $\ell$. High $\mu(\ell)$ marks a tract whose local material history, flanking sequence, chromatin exposure, and repair environment have made repeat-number identity weak. Low $\mu(\ell)$ marks a tract whose surrounding material context stabilizes the count. This interpretation explains why the source finds high correlation of locus rates within DU145 subtrees and HESC-related trees. The same material memory field is being sampled repeatedly. It also explains why rates shift across cell lines. Differentiation state and cancer repair state alter the reception and persistence of the trace.
The genome-wide stationary distribution has the same status. In the source model it enforces reversible equilibrium. In this reading it is the basin structure of human STR memory. The empirical distribution is not merely a convenient stationary vector. It records the long-term attractor landscape that repeated motifs have carved into human genomic matter. The correction $R_{jk}=s_j^{-1}\tilde R_{jk}$ is then a formal way of saying that transitions are weighted by the inherited memory basin of each length.
The substantive claim is this: STR mutation clocks are not clocks with independent noisy hands. They are arrays of material memory sites whose rates are constrained by shared repeat grammar, local memory amplitude, and cell-state history. A lineage model that treats $\mu(\ell)$ as freely re-estimated for every sample misses the physical unity of the phenomenon. The right object is the persistence of rank order in $\mu(\ell)$ under shared cellular history, with predictable deformation when the memory-maintaining regime changes.
Falsifiable Observable
The distinguishing observable is the rank stability of locus-specific mutation rates across matched lineage partitions with shared cell-line history. Pudding Theory predicts that the rank ordering of $\mu(\ell)$ is a material-memory invariant within a cell line, up to global scaling, and that repair or differentiation changes deform this ordering rather than erase it at random. If rank-normalized locus-specific STR mutation rates were measured to decorrelate completely between matched subtrees of the same cell line, this Postulate would be falsified.
Editorial Dialogue
Tanaka: The reading risks renaming ordinary molecular biology. Polymerase slippage, mismatch repair, flanking sequence, and chromatin are sufficient causes. The fitted $\mu(\ell)$ already absorbs those effects. Why call it memory?
Sterling: Because the source result is not just that rates differ. It is that rates preserve structure within a cell line and change across cellular regimes. A purely free-parameter account can fit that. It does not say why the same loci retain rank identity across subtrees. Material Memory says the repeat tract stores a local bias, and $\mu(\ell)$ measures that stored bias.
Tanaka: But the stationary distribution is imposed from the genome. That is an assumption, not an observation of memory.
Sterling: It is both. The authors impose it because STR lengths in the human genome already display a stable empirical distribution. Pudding Theory gives that distribution physical meaning. It is the long-time memory basin of repeated genomic matter.
Tanaka: The falsifier is demanding. Complete decorrelation may be unlikely under any model.
Sterling: That is the point. The reading predicts persistence of locus identity under shared history. A model in which $\mu(\ell)$ is only a sample-specific nuisance scale has no reason to protect that identity.
Discussion
This reading buys a sharper interpretation of the source paper’s heterogeneity. The problem is not merely that different cell lines need different fitted models. The difference itself carries biological information. STR loci report how a cell state stores and maintains low-entropy sequence traces. Cancer instability, mismatch-repair status, and differentiation state are not just modifiers of a molecular clock. They are changes in the material memory regime of the genome.
The limitation is that the source datasets do not isolate individual, tissue, repair, and differentiation effects. DU145, HCT116, and HESC comparisons combine several axes of difference. The reading therefore demands matched designs: same individual across cell types, same cell type across individuals, repair-engineered pairs with deep subtrees, and repeated culture passages with controlled division counts.
What would change the conclusion is loss of locus-rank persistence under shared history. If subtrees from the same line gave unrelated $\mu(\ell)$ orderings after measurement noise and passage depth were controlled, the material-memory reading would fail. If the rank structure persists while global scaling shifts, the source model’s nuisance parameter becomes a physical diagnostic.
References
1. Onn, A., Marx, T., Tao, L., Biezuner, T., Shapiro, E., Klein, C. A., & Stadler, P. F. (2026). Modeling the mutational dynamics of very short tandem repeats. arXiv:2603.25628. https://doi.org/doi:10.48550/arxiv.2603.25628
2. Ochs, S. (2026). Pudding Theory: A Topological Theory of Information Fields. QBist Lab Working Papers.
3. Biezuner, T., Spiro, A., Raz, O., et al. (2016). A generic, cost-effective, and scalable cell lineage analysis platform. Genome Research, 26(11), 1588-1599. https://doi.org/doi:10.1101/gr.202903.115
4. Ellegren, H. (2004). Microsatellites: simple sequences with complex evolution. Nature Reviews Genetics, 5(6), 435-445. https://doi.org/doi:10.1038/nrg1348
5. Raz, O., Biezuner, T., Spiro, A., et al. (2019). Short tandem repeat stutter model inferred from direct measurement of in vitro stutter noise. Nucleic Acids Research, 47(5), 2436-2445. https://doi.org/doi:10.1093/nar/gky1318
6. Tao, L., Raz, O., Marx, Z., et al. (2021). Retrospective cell lineage reconstruction in humans by using short tandem repeats. Cell Reports Methods, 1(3), 100054. https://doi.org/doi:10.1016/j.crmeth.2021.100054
7. Koi, M., Umar, A., Chauhan, D. P., et al. (1994). Human chromosome 3 corrects mismatch repair deficiency and microsatellite instability and reduces N-methyl-N'-nitro-N-nitrosoguanidine tolerance in colon tumor cells with homozygous hMLH1 mutation. Cancer Research, 54(16), 4308-4312.
8. Willems, T., Gymrek, M., Poznik, G., Tyler-Smith, C., & Erlich, Y. (2016). Population-scale sequencing data enable precise estimates of Y-STR mutation rates. The American Journal of Human Genetics, 98(5), 919-933. https://doi.org/doi:10.1016/j.ajhg.2016.04.001