Permutation and Sorting¶
Header: libxs_perm.h
Permutation-based data reordering: deterministic co-prime shuffling (in-place and out-of-place with optional SIMD gather) and smoothness-optimized row permutations for matrices.
Shuffling¶
The shuffle functions use a co-prime stride C to produce a fixed, deterministic permutation of count elements. The mapping is affine:
dst[k] = src[(N-1) - ((C*k + offset) mod N)]
The stride must be co-prime to count; passing NULL selects libxs_coprime2(count). The offset parameter extends the basic co-prime permutation into an affine family, expanding the number of available permutations from ~phi(N)/2 to ~N*phi(N)/2. Pass offset=0 for the standard (non-affine) shuffle.
int libxs_shuffle(void* inout, size_t elemsize, size_t count,
const size_t* shuffle, size_t offset,
const size_t* nrepeat);
In-place shuffle of count elements of elemsize bytes each. Uses cycle following with a bit vector (N/8 bytes auxiliary). nrepeat controls the number of successive permutation applications (NULL or pointing to 1 means one pass). Returns EXIT_SUCCESS or EXIT_FAILURE.
int libxs_shuffle2(void* dst, const void* src, size_t elemsize,
size_t count, const size_t* shuffle, size_t offset,
const size_t* nrepeat);
Out-of-place shuffle from src to dst. dst and src must not overlap. If *nrepeat is zero an ordinary copy is performed. Uses AVX2/AVX-512 gather instructions when available for 4- and 8-byte elements (offset=0 path).
Return the number of libxs_shuffle2 applications needed to restore the original element order for the given count and co-prime stride. The cycle length depends only on C and N, not on the offset.
int libxs_unshuffle2(void* dst, const void* src, size_t elemsize,
size_t count, const size_t* shuffle, size_t offset,
const size_t* nrepeat);
Single-pass inverse of libxs_shuffle2. Computes the modular inverse C_inv = C^{-1} mod N via the extended Euclidean algorithm, then gathers elements using the inverse permutation:
dst[m] = src[C_inv * (N-1-offset-m) mod N]
This restores the original order in one O(N) pass rather than the R-1 repeated applications that libxs_unshuffle would require. The loop structure (sequential write, strided read) is identical to the forward shuffle and amenable to the same SIMD gather optimizations. Supports multi-pass inversion (nrepeat > 1) by iterating f^{-1}.
General-Purpose Sort¶
typedef int (*libxs_sort_cmp_t)(
const void* a, const void* b, void* ctx);
void libxs_sort(void* base, int n, size_t size,
libxs_sort_cmp_t cmp, void* ctx);
Sort n elements of the given byte size. The comparator returns negative, zero, or positive (tristate, like qsort_r). The ctx pointer is forwarded to the comparator unchanged.
Built-in comparators:
int libxs_cmp_f64(const void* a, const void* b, void* ctx);
int libxs_cmp_f32(const void* a, const void* b, void* ctx);
int libxs_cmp_i32(const void* a, const void* b, void* ctx);
int libxs_cmp_u32(const void* a, const void* b, void* ctx);
Built-in comparators enable an O(n) radix sort fast path. Custom comparators use O(n log n) heap sort.
With a built-in comparator, ctx controls the mode:
ctx = NULL Sort base in-place. ctx != NULL Read from ctx, write sorted result to base (out-of-place, source unchanged).
With a custom comparator, ctx is user state passed to cmp and base is sorted in-place.
Examples:
libxs_sort(data, n, sizeof(double), libxs_cmp_f64, NULL); libxs_sort(dst, n, sizeof(double), libxs_cmp_f64, src); libxs_sort(perm, n, sizeof(int), my_indirect_cmp, keys);
Space-Filling Curves¶
uint64_t libxs_hilbert(const unsigned int coords[], int ndims);
void libxs_hilbert_decode(uint64_t code, unsigned int coords[], int ndims);
N-dimensional Hilbert curve index. Maps ndims coordinates to a 64-bit key with strong locality guarantees. Each coordinate is quantized to floor(64/ndims) bits. coords[k] must be in [0, 2^bits_per_dim). The decode routine maps such a key back to ndims coordinates with the same bit budget.
uint64_t libxs_morton(const unsigned int coords[], int ndims);
void libxs_morton_decode(uint64_t code, unsigned int coords[], int ndims);
N-dimensional Morton code (Z-order curve). Bit-interleaves ndims coordinates into a 64-bit key. Each coordinate is quantized to floor(64/ndims) bits. The decode routine deinterleaves the key back to ndims coordinates.
int libxs_stratify_morton(
const unsigned int src_coords[], int src_ndims,
unsigned int dst_coords[], int dst_ndims);
int libxs_stratify_hilbert(
const unsigned int src_coords[], int src_ndims,
unsigned int dst_coords[], int dst_ndims);
Stratify higher-dimensional coordinates into a lower-dimensional layout without slicing. The source coordinates are encoded using the selected source-dimensional space-filling curve, and the resulting key is decoded as target-dimensional coordinates using the same curve family. This gives deterministic rank-preserving embeddings such as 3D to 2D:
code = curve(src_coords, 3)
dst_coords = inverse_curve(code, 2)
The target dimensionality must be smaller than the source dimensionality. The functions return EXIT_SUCCESS or EXIT_FAILURE.
k-d Tree¶
void libxs_kdtree_build(const double* pts, int* idx, int n,
int ndims, int stride);
int libxs_kdtree_nearest(const double* pts, const int* idx,
const unsigned char* used, int n, int ndims, int stride,
const double* query, double max_dist2);
Dense-array k-d tree for nearest-neighbor queries in arbitrary dimensions. Points are stored row-major: pts[i*stride + k] is coordinate k of point i. stride >= ndims allows padding or interleaved auxiliary data. The index array idx[0..n-1] is rearranged into implicit tree structure during build.
Query: finds the nearest point to query[0..ndims-1] within squared Euclidean distance max_dist2. Returns the point index or -1. The used array (may be NULL) marks consumed points (non-zero entries are skipped).
Convenience wrappers for 2D (interleaved x,y layout):
void libxs_kdtree2d_build(double* pts, int* idx, int n);
int libxs_kdtree2d_nearest(const double* pts, const int* idx,
const unsigned char* used, int n,
double x, double y, double max_dist2);
Smooth Sorting¶
typedef enum libxs_sort_t {
LIBXS_SORT_NONE = 0,
LIBXS_SORT_IDENTITY = 1,
LIBXS_SORT_NORM = 2,
LIBXS_SORT_MEAN = 3,
LIBXS_SORT_GREEDY = 4,
LIBXS_SORT_MORTON = 5,
LIBXS_SORT_HILBERT = 6
} libxs_sort_t;
int libxs_sort_smooth(libxs_sort_t method, int m, int n,
const void* mat, int ld, libxs_data_t datatype, int* perm);
Compute a row permutation that reorders the rows of an m-by-n column-major matrix for smoothness (decaying forward differences between consecutive rows).
Parameters:
method Sorting strategy (see constants above). m, n Matrix dimensions (m rows, n columns). mat Column-major matrix data (read-only). ld Leading dimension (ld >= m). datatype Element type (F64, F32, I32, U32, I16, U16, I8, U8). perm Output array of m ints; perm[i] = source row for position i after reordering.
Methods:
NONE No permutation (early return). IDENTITY Identity permutation (perm[i] = i). NORM Sort rows by ascending L1-norm. MEAN Sort rows by ascending column mean. GREEDY Nearest-neighbor chain: starting from row 0, each step picks the unvisited row with the smallest Euclidean distance to the current row. MORTON Sort rows by Morton (Z-order) key computed from quantized column values. HILBERT Sort rows by Hilbert curve key computed from quantized column values.
Returns EXIT_SUCCESS or EXIT_FAILURE.