# Tessellation¶

This step consists of defining time segments or space cells and assigning the individual molecule locations to one or more of these segments or cells/bins.

## Basic usage¶

The *tramway* command features the tessellate sub-command that handles this step.
Let us consider again some of the example calls to this sub-command:

```
> tramway tessellate kmeans -i example.trxyt -l kmeans
> tramway tessellate gwr -i example.rwa -w -n 50 -N 50 -l gwr*
> tramway tessellate gwr -i example.rwa -d 0.1 -ss 10 -n 30 -l gwr*
```

The equivalent Python code is:

```
from tramway.helper import *
tessellate('example.trxyt', 'kmeans', label='kmeans')
tessellate('example.rwa', 'gwr', scaling=True, knn=(50, 50), label='gwr*')
tessellate('example.rwa', 'gwr', ref_distance=0.1, strict_min_location_count=10, knn=30, label='gwr*')
```

The above calls to the `tessellate()`

helper read and write into the *example.rwa* file.

Alternatively, reads and writes to files can be limited by directly manipulating the analysis tree:

```
trajectories = load_xyt('example.trxyt')
analyses = tessellate(trajectories, 'kmeans', label='kmeans', return_analyses=True)
tessellate(analyses, 'gwr', ...
...
save_rwa('example.rwa', analyses)
```

The `load_xyt()`

and `save_rwa()`

functions respectivelly read the input file and write the output file.

The analysis tree is augmented by each further call to `tessellate()`

.

A spatial 2D partition can be displayed with the command-line:

```
> tramway draw cells -i example.rwa --label kmeans
```

or with the `cell_plot()`

helper function:

```
cell_plot('example.rwa', label='kmeans')
```

## Standard methods¶

The available methods are:

- random: randomly located cell centers.
- grid: regular grid with equally sized square or cubic areas.
- hexagon: regular grid with equally shaped hexagons (2D only).
- kdtree: kd-tree tessellation with midpoint splits.
- kmeans: tessellation based on the k-means clustering algorithm.
- gwr (or gas): tessellation based on the Growing-When-Required self-organizing gas.

All the above methods can handle time as just another space dimension if the time_scale argument to `tessellate()`

if defined.
This argument scales the time axis so that it can be quantitatively related to the space axes.
Note however that this feature has been very sparsely tested.

Instead, molecule (trans-)locations can be divided into time segments. See the window subsection for more information.

### The *random* method¶

*random* places cell centers at random within the bounding box of the (trans-)locations.
Empty cells are discarded (unless `allow_empty_cells=True`

is passed to `tessellate()`

) and cell boundaries are defined as the Voronoi graph.

The corresponding tessellation class is `RandomMesh`

.

From the command-line, a key argument is `--cell-count`

.
This is the desired number of cells.

The number of cells can alternatively be determined using the `--location-count`

argument (or avg_location_count argument to the `tessellate()`

Python helper).
If defined, the number of cells is defined so that the average number of locations per cell (before empty cells are discarded) equals the specified value.

### The *grid* method¶

*grid* is a regular grid. Every cells are equal-size hypercubes.

The corresponding tessellation class is `RegularMesh`

.

From the command-line, key arguments are `--location-count`

and `--distance`

.

Per default, `--distance`

is set to the average translocation distance.
If `--location-count`

is not defined, neighbour cells are spaced by twice the value of `--distance`

.

If `--location-count`

is defined, the cells are sized so that the average number of locations per cell equals the specified value.

The cell size (or inter-cell distance) is bounded by \(0.8\) times the value of `--distance`

.
This bound can be ignored with `--min-location-count=0`

.

With the `tessellate()`

Python helper only, one can adjust avg_distance (equivalent to commandline option `--distance`

) without specifying an absolute value, with rel_avg_distance instead.
The average translocation distance is multiplied by this value.

### The *hexagon* method¶

*hexagon* is a regular grid of hexagonal cells.

The corresponding tessellation class is `HexagonalMesh`

.

From the command-line, key arguments are `--location-count`

and `--distance`

.

If `--location-count`

is not defined, per default `--distance`

is set to the average translocation distance and neighbour cells are spaced by twice this value.

If `--location-count`

is defined, the cells are sized so that the average number of locations per cell equals the specified value.

Per default, the cell size (or inter-cell distance) is lower-bounded by \(0.8\) times the average translocation distance.
This bound can be ignored with `--min-location-count=0`

.

With the `tessellate()`

Python helper only, one can adjust avg_distance (equivalent to commandline option `--distance`

) without specifying an absolute value, with rel_avg_distance instead.
The average translocation distance is multiplied by this value.

### The *kdtree* method¶

*kdtree* is the *quad-tree* algorithm from **InferenceMAP** extended to any dimensionality greater than or equal to 2.

A main difference from the widely known *k* d-tree algorithm in that it recursively splits the cells in two equal parts along each dimension.

The corresponding tessellation class is `KDTreeMesh`

.

Cells scale with the ref_distance input argument to `tessellate()`

and equivalently the avg_distance attribute of the class.

The maximum cell size can be controlled with the max_level input argument that defines the maximum cell size as a multiple of the smallest cell size, in side length.

### The *kmeans* method¶

*kmeans* is a fast tessellation approach that usually displays different cell shapes and offers better resolutions along density borders.

The corresponding tessellation class is `KMeansMesh`

.

The algorithm is initialized with a grid tessellation.
As a consequence cells scale wrt the avg_probability argument (or command-line option `--location-count`

) or ref_distance argument (or command-line option `--distance`

).

### The *gwr* method¶

*gwr* stands for *Grow(ing) When Required* and is actually a largely modified version of the algorithm described by Marsland, Shapiro and Nehmzow in 2002.

The corresponding tessellation class is `GasMesh`

.

The main arguments are min_probability (or command-line option `-s`

/`--min-location-count`

) and avg_distance (or command-line option `-d`

/`--distance`

).

*gwr* exhibits many more arguments.
Some of them must be passed directly to the `tessellate()`

method.

This method may be useful to build high resolution maps with the desired minimum number of locations per cell reasonably well approached in the low-density areas. The knn argument to cell_index may be very useful in combination to such high resolution tessellations.

*gwr* is more computer-intensive than the other methods.
To prototype with this method, reasonnable under-trained solutions can be obtained passing a value less than `1`

for argument pass_count or command-line option `--pass-count`

.

For example, *gwr* is not suitable for controlling the average number of locations per cell.
Argument avg_probability or command-line option `-c`

/`--location-count`

are taken as constraints that may not be satisfied.
A trial-and-error approach may be necessary to generate a mesh with a suitable average location count per cell.
Under-trained solutions tend to exhibit the same average location count as in more refined solutions.

### The *window* method¶

The *window* plugin implements a sliding time window.

The corresponding tessellation class is `SlidingWindow`

.

The step (shift) and window width (duration) can be defined either as timestamps (default) or as
frames (with `frames=True`

or command-line option `--frames`

).

Note that all the above spatial-only tessellation methods admit extra arguments related to time windowing, including `time_window_duration`

and `time_window_shift`

.
This is equivalent to applying first the spatial method and then segmenting in time, all in one step.

An example is given in the Segmenting time subsection of the command-line primer section.

Arbitrary time segments can also be defined, although no helper is available for this. See also the Custom time segments subsection.

## Implementation details¶

The concept of tessellation or windowing in itself is independent of the proper partition of the molecule locations or translocations in subsets corresponding to cells or segments.

These data are combined together in a `Partition`

.

Its `tessellation`

attribute of type `Tessellation`

is the “tessellation” or binning/sampling strategy grown from (trans-)location data and in principle suitable for partitioning/sampling any other similar set of (trans-)locations.

The partition itself, i.e. the proper assignment of locations to cells/bins or segments - whether these locations are those involved in growing the tessellation or others, is referred to as *cell_index*.
This naming appears in the `Tessellation`

class as a method, in the `Partition`

class as an attribute (actually a property) and at other locations.

From a particular partition a series of derivative products are commonly extracted, such as the location count per cell, and some of these products are conveniently provided by the `Partition`

.

Note that, because tessellating and partitioning are considered two different procedures, some input arguments to the `tessellate()`

helper function may have multiple understandings.
Some constraints may be taken as directions by the tessellation algorithm while the same constraints would typically be enforced by the partitioning.

As a consequence, `tessellate()`

takes arguments with the *strict_* prefix in their name.
These arguments apply to the partition while arguments of similar names without this prefix apply to the tessellation.

## Advanced methods¶

### Tessellation nesting¶

Each cell of a tessellation can be tessellated again.

This is made possible by the `NestedTessellations`

class.

The command-line also supports this extension. This can be useful for example to independently tessellate the space in each time segment:

```
> tramway tessellate window -i example.trxyt --shift 1 --duration 2 --output-label 2s_win
> tramway tessellate gwr -i example.rwa --input-label 2s_win --output-label windowed_gwr
```

### Custom cell centers¶

To define specific centroids and partition using knn, no explicit cells are needed.

The `tessellation`

package exposes basic classes such as `Delaunay`

and `Voronoi`

.

Both can be used to implemented such a use case:

```
from tramway.core import *
from tramway.core.hdf5 import *
from tramway.tessellation import *
from numpy.random import rand
from pandas import DataFrame
n_centroids = 100
n_nearest_neighbours = 50
space_columns = ['x', 'y']
# load the trajectories
translocations = load_xyt('example.trxyt')
# find the bounding box
coordinates = translocations[space_columns]
xmin, xmax = coordinates.min(axis=0).values, coordinates.max(axis=0).values
# pick some centroids within the bounding box
centroids = rand(n_centroids, len(space_columns))
centroids *= xmax - xmin
centroids += xmin
centroids = DataFrame(data=centroids, columns=space_columns)
# grow the tessellation
tessellation = Delaunay()
tessellation.tessellate(centroids)
# find the nearest neighbours
cells = Partition(translocations, tessellation)
cells.cell_index = tessellation.cell_index(translocations,
knn=(n_nearest_neighbours, n_nearest_neighbours)) # knn is (min, max)
# assemble the analysis tree
analyses = Analyses(translocations)
analyses.add(cells, label='random centroids')
# save it to a file
save_rwa('example.rwa', analyses)
```

The above example illustrates how to explicitly define cell centers.
The specific case of random cell centers can be implemented using the ‘*random*’ plugin instead:

```
from tramway.helper import *
n_centroids = 100
n_nearest_neighbours = 50
tessellate('example. trxyt', 'random', cell_count=n_centroids,
knn=(n_nearest_neighbours, n_nearest_neighbours), label='random centroids')
```

### Custom time segments¶

The `TimeLattice`

class is more flexible than the `SlidingWindow`

class in that it admits arbitrary time segments.

The following example makes contiguous segments such that the total location count per segment is at least equal to a defined amount:

```
from tramway.helper import *
import numpy
min_location_count_per_segment = 10000
time_column = 't'
# load the trajectories
translocations = load_xyt('example.trxyt')
# count the rows (or locations) along time
timestamps = translocations[time_column].values
ts, counts = numpy.unique(timestamps, return_counts=True)
# pick the segment bounds
index = 0
bounds = [ts[index]]
while index < counts.size:
count = 0
while count < min_location_count_per_segment:
count += counts[index]
index += 1
if index == counts.size:
break
if index == counts.size: # or similarly if count < min_location_count_per_segment:
pass # let the loop end
else:
bounds.append(ts[index])
# grow a spatial tessellation
static_cells = tessellate(translocations, 'kmeans')
# associate the segments
segments = numpy.c_[bounds[:-1], bounds[1:]]
dynamic_cells = with_time_lattice(static_cells, segments)
```

The same example with different spatial tessellations for each segment can be implemented with the help of tessellation nesting.

If maps are inferred on such a tessellation, separate maps will be generated for each segment. These maps can be individualized as follows:

```
# infer the diffusivity
diffusivity_maps = infer(dynamic_cells, 'D')
# slice the maps (one map per segment) to plot each of them
for diffusivity_map in dynamic_cells.tessellation.split_frames(diffusivity_maps):
map_plot(diffusivity_map, cells=static_cells)
# assemble the analysis tree
analyses = Analyses(translocations)
analyses.add(dynamic_cells, label='time-windowed cells', \
comment='count-normalized kmeans tessellation')
analyses['time-windowed cells'].add(diffusivity_maps, label='D', \
comment='diffusivity (D mode)')
```