Rosetta¶
Demonstrates hierarchical type discovery on opaque binary data. A table of harmonic-series values is encoded as a flat byte blob with no metadata. The analysis rediscovers the element type, record stride, and per-field structure -- recovering mathematical content from anonymous bytes.
What It Shows¶
The sample proceeds through the levels described in the algebra paper (Section "Hierarchical Type Discovery"):
| Step | Tool used | What it finds |
|---|---|---|
| Flat probe | libxs_fprint (UNKNOWN) | inconclusive (expected) |
| Stride sweep | libxs_fprint per-column | f64, stride=6 |
| Shuffle test | libxs_shuffle + fprint | order matters (3-4x) |
| Field analysis | libxs_fprint per-field | decay rates per column |
| Sort test | libxs_sort_smooth (GREEDY) | already optimally ordered |
| Verification | libxs_setdiff_min | 0 unmatched, tol=0 |
Key observations from the output:
- The flat 1-D probe fails because interleaved fields of different scales (1/n mixed with n mixed with ln(n)) look like noise when read sequentially. This motivates the stride sweep.
- The stride sweep requires ALL columns at a candidate width to have decay < 1 before accepting it, which eliminates false positives from partial correlations at wrong strides.
- Shuffle stability confirms that the record order carries genuine structure (decay increases ~4x after coprime permutation).
- Per-field analysis reveals:
[0]decay=0 -- perfect ramp (sequential index)[1]decay~0.63 -- 1/n (decaying but not ultra-smooth)[2]decay~0.20 -- H_n (partial sums, very smooth)[5]decay~0.29 -- converges to Euler-Mascheroni gamma
- GREEDY sort confirms the data is already optimally ordered (the natural 1..64 sequence is the smoothest permutation).
The Data¶
For n = 1, 2, ..., 64 the table stores six f64 fields per record:
[0] n integer index
[1] 1/n harmonic term
[2] H_n partial sum of the harmonic series
[3] ln(n) natural logarithm
[4] H_n - ln(n) difference (converges to gamma from above)
[5] H_n - ln(n) - 1/(2n) better gamma estimate
Total: 64 records x 6 fields = 384 doubles = 3072 bytes.
Build¶
Usage¶
No arguments. The sample is self-contained: it generates the data, encodes it as a flat blob, runs the full analysis, and prints the results.
To collect machine-readable artifacts, set ROSETTA_OUTDIR to an
existing directory:
This writes summary.csv, rosetta_original.mhd,
rosetta_shuffled.mhd, rosetta_fields.mhd, and
rosetta_fields_shuffled.mhd. If LIBXS_MHD_PNG=1 is also set, the
MHD writer emits PNG files with the same basenames. The fields
images are normalized per field, making the ordered and shuffled
record traces easy to compare visually. The artifact mode is optional
and leaves the interactive sample output unchanged.
Example Output¶
ROSETTA: Recovering structure from opaque bytes
--------------------------------------------------
Ground truth (hidden from the analysis):
64 records x 6 fields = 384 doubles (3072 bytes)
Data: harmonic series H_n and derived quantities.
Field [5] converges to Euler-Mascheroni gamma.
--------------------------------------------------
Level 0: Flat 1-D probe (all bytes as one sequence)
Input: 3072 opaque bytes
No type has decay < 1 -- flat stream is not smooth.
This is expected: interleaved fields of different
scales look like noise when read sequentially.
--------------------------------------------------
Stride sweep: discovering record layout
Best type: f64
Best stride: 6 elements (48 bytes per record)
Records: 64
Avg decay: 0.288244
--------------------------------------------------
Shuffle stability (record-level)
Avg decay (original): 0.288244
Avg decay (shuffled): 1.103839
Ratio: 3.8x
-> Record order carries structure.
--------------------------------------------------
Per-field decay analysis
[0] n decay=0.000000
[1] 1/n decay=0.631512
[2] H_n decay=0.203226
[3] ln(n) decay=0.259621
[4] H_n-ln(n) decay=0.342853
[5] gamma_est decay=0.292251 (last=0.5771953203, gamma=0.5772156649)
Smoothest: [0] n (perfect ramp, decay=0)
-> This reveals a sequential index column.
Most interesting: [5] gamma_est -- converges to a constant.
--------------------------------------------------
GREEDY sort test (64 rows x 6 cols)
Data is already optimally ordered for row smoothness.
--------------------------------------------------
Verification: setdiff(original, blob)
Unmatched: 0, tolerance: 0.00e+00
-> Byte-perfect recovery.
--------------------------------------------------
Why This Is Interesting¶
From 3072 anonymous bytes the framework discovers:
- The element type (f64) -- not i32, not f32, not i64.
- The record structure (6-field records of 48 bytes each).
- A sequential index column (field [0], decay = 0).
- A column converging to Euler-Mascheroni gamma (field [5]).
- That the natural ordering is already optimal (no resorting).
No metadata, no format knowledge, no human guidance. The hierarchical composition -- stride sweep with all-columns-must-pass filtering, shuffle stability, per-field decay ranking -- is what makes this possible. Any single tool alone would either fail (flat probe) or produce ambiguous results (stride sweep without the all-columns requirement accepts wrong strides).