alignment module
- alignment.munkres(x_arr, y_arr, x_linked, y_linked, intens_x, intens_y, skip_fraction=0.4, skip_level=0.8, segmentation_threshold=300)[source]
Chunked alignment wrapper over munkres_align to handle large inputs.
This function splits both input sequences and their linked/intensity arrays into chunks to control memory usage and runtime, aligns each chunk independently via the Hungarian algorithm, and concatenates the results.
- Parameters:
x_arr (Sequence[array_like] or array_like) – Sequences of items for the X and Y sides. Each item is reduced to its mean value inside munkres_align for distance computation.
y_arr (Sequence[array_like] or array_like) – Sequences of items for the X and Y sides. Each item is reduced to its mean value inside munkres_align for distance computation.
x_linked (array_like) – Auxiliary arrays associated with x_arr/y_arr that are returned in aligned order (e.g., original structures or metadata). Must be indexable by the same indices as the corresponding inputs.
y_linked (array_like) – Auxiliary arrays associated with x_arr/y_arr that are returned in aligned order (e.g., original structures or metadata). Must be indexable by the same indices as the corresponding inputs.
intens_x (array_like) – Intensity values linked to x_arr/y_arr. Used to compute intensity-aware costs in munkres_align.
intens_y (array_like) – Intensity values linked to x_arr/y_arr. Used to compute intensity-aware costs in munkres_align.
skip_fraction (float, optional) – Fraction of extra dummy rows/columns added to each chunk’s cost matrix to allow skipping matches. Example: 0.2 means add ~20% rows/cols.
skip_level (float, optional) – Penalty multiplier applied to the maximum base cost to build dummy rows/columns. Larger values discourage skipping.
segmentation_threshold (int, optional) – Target maximum chunk length. The input is split into roughly ceil(len/segmentation_threshold) segments to avoid huge cost matrices and memory errors.
- Returns:
(aln_x, aln_y, aln_x_linked, aln_y_linked) where - aln_x, aln_y: lists of items from x_arr/y_arr in aligned order - aln_x_linked, aln_y_linked: np.ndarray of linked items aligned with
aln_x/aln_y respectively.
- Return type:
tuple
Notes
Chunking reduces peak memory: each cost matrix is built per-chunk.
Choose skip_fraction modestly (e.g., 0.1–0.5); large values can blow up the matrix and cause MemoryError.
The function preserves per-chunk order; cross-chunk matches are not considered. If global cross-chunk matches matter, increase segmentation_threshold or consider overlapping chunks.
- alignment.munkres_align(x_arr, y_arr, x_linked, y_linked, intens_x, intens_y, skip_fraction=0.4, skip_level=0.8)[source]
Align two sequences using the Hungarian (Munkres) algorithm with an intensity-aware distance cost and optional skipping.
- Parameters:
x_arr (Sequence[array_like] or array_like) – Sequence of items for the X side. Each item is reduced to its mean value for distance computation.
y_arr (Sequence[array_like] or array_like) – Sequence of items for the Y side, treated analogously to x_arr.
x_linked (array_like) – Array of auxiliary data associated with x_arr (returned aligned; e.g., original structures or metadata). Must be indexable by the same indices as x_arr.
y_linked (array_like) – Auxiliary data associated with y_arr (returned aligned).
intens_x (array_like) – Intensity values associated with x_arr, used as the linked_array to compute intensity differences in the cost.
intens_y (array_like) – Intensity values associated with y_arr.
skip_fraction (float, optional) – Fraction of additional dummy rows/cols to add to the cost matrix to enable skipping matches.
round(n * skip_fraction)rows/cols are padded, wherenis the current matrix size. Default is 0.3.skip_level (float, optional) – Multiplier applied to the maximum cost to set the padding value for dummy rows/cols. Higher values discourage skipping. Default is 0.8.
- Returns:
aln_x (list) – Elements of x_arr selected by the optimal assignment, in match order.
aln_y (list) – Elements of y_arr selected by the optimal assignment, in match order.
aln_x_linked (ndarray) – Items from x_linked corresponding to aln_x.
aln_y_linked (ndarray) – Items from y_linked corresponding to aln_y.