Vector Filtering#

At this point, we’ve now seen how to generate rich histogram visualisations that allow us to observe patterns in a collection of vectors. But, what if we want to take things a step further and only study vectors with a magnitude in a specific range, or with a specific orientation?

In this section, we’ll see how to filter vector datasets based on magnitude and orientation.

Important

When we use the term filter, we are referring to selecting specific vectors on the basis of magnitude and/or orientation.

We’ll continue with our two_clusters.npy dataset, which can also be loaded directly in VectoRose without any separate download using SampleData.TWO_CLUSTERS in the data module.

As usual, let’s start by loading the data.

import numpy as np # We'll use NumPy a bit later

import vectorose as vr
import vectorose.data

my_vectors = vr.data.SampleData.TWO_CLUSTERS.load()
my_vectors = vr.util.remove_zero_vectors(my_vectors)

my_vectors
array([[ 0.01330902,  0.06486094, -0.08154041],
       [ 0.19911095,  0.06809676, -0.02230348],
       [ 0.14568445,  0.08995054,  0.06688315],
       ...,
       [-0.0377645 ,  0.35891606,  0.6731566 ],
       [-0.13033349,  0.56234415,  0.27104764],
       [-0.03287074,  0.60723468,  0.63249369]], shape=(200000, 3))

As usual, let’s also create a FineTregenzaSphere with 10 nested shells to use for bin assignment in terms of magnitude and orientation.

my_sphere = vr.tregenza_sphere.FineTregenzaSphere(number_of_shells=10)
labelled_vectors, magnitude_bins = my_sphere.assign_histogram_bins(my_vectors)

labelled_vectors
phi theta magnitude shell ring bin
0 140.922763 11.595749 0.105038 0 41 3
1 96.050088 71.119103 0.211612 1 28 34
2 68.662637 58.307407 0.183816 1 20 26
3 108.637973 58.402654 0.283404 1 32 26
4 101.600750 38.565368 0.301634 1 30 18
... ... ... ... ... ... ...
199995 70.754479 309.190779 0.922267 5 21 141
199996 35.055502 350.961121 0.900928 5 11 101
199997 28.196955 353.993542 0.763798 4 9 85
199998 64.847612 346.951053 0.637718 4 19 151
199999 43.874659 356.901498 0.877418 5 13 118

200000 rows × 6 columns

Our vectors have now been assigned to bins based on magnitude and orientation. With this in mind, we can already see a couple of possible ways that we may want to filter:

  • based on magnitude using the shell index, or

  • based on orientation using the angular bin index.

With these ideas in mind, let’s get started on filtering!

Tip

This page is not a comprehensive guide on filtering. Since vector datasets are represented as NumPy arrays and pandas DataFrames, the possibilities are quite endless. This page seeks to show some simple workflows for filtering using the representations and functions included in VectoRose.

Now that we have the vectors assigned to histogram bins, we can use the built-in tools provided by pandas to perform filtering based on both magnitude and orientation. We’ll also see some novel tools provided by VectoRose to simplify this filtering process.

Before we get started, here’s what the nested histogram shells for this dataset look like:

We’ll keep this distribution in mind as we perform filtering.

Magnitude Filtering#

The vector magnitude is represented by a scalar value. Filtering based on the scalar magnitude is quite trivial. Let’s immediately dive into an example to see this.

Looking at our histogram shells, it’s quite clear that fifth shell contains an interesting pattern. It contains a cluster with a high frequency. If we want to study only the vectors in this magnitude level, we can simply look for all vectors with a shell index of 4.

Warning

Recall that indexing in Python starts at zero, so the fifth shell has index 4, not 5.

Let’s see how we can do that in code:

shell_4_vectors = labelled_vectors[labelled_vectors["shell"] == 4]

shell_4_vectors
phi theta magnitude shell ring bin
7313 94.603439 56.255863 0.682123 4 28 27
7377 132.959873 325.894683 0.634843 4 39 115
7480 97.308252 44.798522 0.638771 4 29 21
7694 89.834346 86.295302 0.653199 4 26 41
10789 147.005123 74.227381 0.668650 4 43 19
... ... ... ... ... ... ...
199989 7.202286 315.019661 0.725102 4 2 14
199993 28.011988 332.977878 0.737873 4 8 71
199994 66.015628 9.319906 0.784328 4 20 4
199997 28.196955 353.993542 0.763798 4 9 85
199998 64.847612 346.951053 0.637718 4 19 151

30856 rows × 6 columns

Now, notice that all of our vectors are in shell 4. We can then convert these vectors to a NumPy array using the method :meth:.SphereBase.convert_vectors_to_cartesian_array:

shell_4_vectors_array = my_sphere.convert_vectors_to_cartesian_array(
    shell_4_vectors,
)

shell_4_vectors_array
array([[ 0.56537347,  0.37768679, -0.05474634],
       [-0.26050741,  0.38469095, -0.43263665],
       [ 0.44643152,  0.44958233, -0.08125639],
       ...,
       [ 0.11605203,  0.70714692,  0.31881958],
       [-0.0377645 ,  0.35891606,  0.6731566 ],
       [-0.13033349,  0.56234415,  0.27104764]], shape=(30856, 3))

We can, of course, perform more complicated filtering tasks to extract the vectors from multiple shells. For example, to select all vectors in the fifth shell or higher, we can simply run:

labelled_vectors[labelled_vectors["shell"] >= 4]
phi theta magnitude shell ring bin
7313 94.603439 56.255863 0.682123 4 28 27
7377 132.959873 325.894683 0.634843 4 39 115
7480 97.308252 44.798522 0.638771 4 29 21
7694 89.834346 86.295302 0.653199 4 26 41
10789 147.005123 74.227381 0.668650 4 43 19
... ... ... ... ... ... ...
199995 70.754479 309.190779 0.922267 5 21 141
199996 35.055502 350.961121 0.900928 5 11 101
199997 28.196955 353.993542 0.763798 4 9 85
199998 64.847612 346.951053 0.637718 4 19 151
199999 43.874659 356.901498 0.877418 5 13 118

63243 rows × 6 columns

We can get the other vectors by running:

labelled_vectors[labelled_vectors["shell"] < 4]
phi theta magnitude shell ring bin
0 140.922763 11.595749 0.105038 0 41 3
1 96.050088 71.119103 0.211612 1 28 34
2 68.662637 58.307407 0.183816 1 20 26
3 108.637973 58.402654 0.283404 1 32 26
4 101.600750 38.565368 0.301634 1 30 18
... ... ... ... ... ... ...
199975 40.465908 11.782549 0.529120 3 12 3
199980 88.269805 10.719323 0.542940 3 26 5
199986 51.338357 8.719554 0.540043 3 15 3
199991 73.835240 337.467390 0.472457 2 22 157
199992 45.375038 354.394809 0.342514 2 14 126

136757 rows × 6 columns

We can also combine both of these to select from a range of shells:

labelled_vectors[
    (4 <= labelled_vectors["shell"]) &  (labelled_vectors["shell"] <= 6)
]
phi theta magnitude shell ring bin
7313 94.603439 56.255863 0.682123 4 28 27
7377 132.959873 325.894683 0.634843 4 39 115
7480 97.308252 44.798522 0.638771 4 29 21
7694 89.834346 86.295302 0.653199 4 26 41
10789 147.005123 74.227381 0.668650 4 43 19
... ... ... ... ... ... ...
199995 70.754479 309.190779 0.922267 5 21 141
199996 35.055502 350.961121 0.900928 5 11 101
199997 28.196955 353.993542 0.763798 4 9 85
199998 64.847612 346.951053 0.637718 4 19 151
199999 43.874659 356.901498 0.877418 5 13 118

61166 rows × 6 columns

We can also work directly based on the magnitude values, ignoring the shell indices altogether:

labelled_vectors[labelled_vectors["magnitude"] <= 0.25]
phi theta magnitude shell ring bin
0 140.922763 11.595749 0.105038 0 41 3
1 96.050088 71.119103 0.211612 1 28 34
2 68.662637 58.307407 0.183816 1 20 26
6 125.068594 92.790300 0.240778 1 37 36
9 127.274274 41.427476 0.188289 1 37 16
... ... ... ... ... ... ...
199362 71.228212 38.097520 0.244133 1 21 17
199432 45.116698 8.053522 0.191675 1 13 2
199602 45.084514 333.047845 0.232529 1 13 111
199965 23.988923 59.811282 0.212724 1 7 11
199971 47.203126 12.767886 0.238363 1 14 4

31926 rows × 6 columns

For more details on basic indexing using pandas, make sure to check out this page in the pandas documentation.

Orientation Filtering - Single Bin#

Orientation presents a bit more of a challenge. On a basic level, we can do something similar for filtering by orientation. Let’s say we want all vectors containing in the ring with index 15 and the bin with index 10 in that ring. We can once again use pandas directly:

filtered_vectors = labelled_vectors[
    (labelled_vectors["ring"] == 15) & (labelled_vectors["bin"] == 10)
]

filtered_vectors
phi theta magnitude shell ring bin
2186 51.280935 28.434182 0.270519 1 15 10
4679 49.549383 27.674359 0.293125 1 15 10
8585 51.977364 28.753270 0.377855 2 15 10
32739 51.187734 28.800747 0.110106 0 15 10
66625 49.406903 28.639178 0.419409 2 15 10
... ... ... ... ... ... ...
195255 52.142043 28.854474 0.593782 3 15 10
195918 50.777303 27.513780 0.540157 3 15 10
197121 49.018275 28.045500 0.579771 3 15 10
198464 49.466779 27.042250 0.749263 4 15 10
199896 48.750686 28.114370 0.765183 4 15 10

182 rows × 6 columns

In practice, this isn’t very helpful. It’s hard to intuitively know what bin we want for a specific orientation.

The good news is that we can use other tools in VectoRose to convert between angles and face index information.

Let’s plot the orientation histogram for our dataset to find the orientations of interesting features. To help, we’ll add angular \(\phi\) and \(\theta\) axes using the method SpherePlotter.add_spherical_axes().

orientation_histogram = my_sphere.construct_marginal_orientation_histogram(
    labelled_vectors
)

orientation_histogram_mesh = my_sphere.create_shell_mesh(orientation_histogram)

sphere_plotter = vr.plotting.SpherePlotter(orientation_histogram_mesh)
sphere_plotter.produce_plot()
sphere_plotter.add_spherical_axes()
sphere_plotter.show()
2026-05-21 05:26:04.604 (   1.704s) [    70A7CE554B80]vtkXOpenGLRenderWindow.:1458  WARN| bad X server connection. DISPLAY=

Using these axes, we can see the orientations of our two clusters in the dataset. The upper cluster seems centred around \(\phi=55\) and \(\theta=0\) degrees.

Let’s extract all the vectors that fall into the bin containing this orientation. To do this, we just need to create a unit vector pointed in that direction in Cartesian coordinates and pass it to SphereBase.assign_histogram_bins() to get the closest bin.

my_spherical_coordinates = np.array([55, 0])
my_cartesian_coordinates = vr.util.convert_spherical_to_cartesian_coordinates(
    my_spherical_coordinates, radius=1, use_degrees=True
)

my_bin, _ = my_sphere.assign_histogram_bins(my_cartesian_coordinates)

my_bin
phi theta magnitude shell ring bin
0 55.0 0.0 1.0 4 16 0

Now we see that the vectors we want to filter are found in ring 16, bin 0. To extract these vectors, we can once again use the indexing features from pandas:

vectors_in_cell = labelled_vectors.loc[
    (labelled_vectors["ring"] == my_bin.loc[0, "ring"])
    & (labelled_vectors["bin"] == my_bin.loc[0, "bin"])
]

print(f"We have extracted {len(vectors_in_cell)} vectors!")

vectors_in_cell
We have extracted 343 vectors!
phi theta magnitude shell ring bin
10069 55.106892 1.252508 0.319393 2 16 0
25419 53.058520 2.192422 0.343976 2 16 0
25851 53.435031 1.138519 0.333352 2 16 0
100094 52.227808 0.193758 0.530046 3 16 0
100245 54.934177 1.098404 1.140148 7 16 0
... ... ... ... ... ... ...
197204 55.365899 2.073308 0.875228 5 16 0
197213 52.904454 2.362285 0.480103 3 16 0
198577 53.546724 0.102937 0.582745 3 16 0
199833 55.281786 0.793643 0.542250 3 16 0
199854 53.245919 0.564946 0.499349 3 16 0

343 rows × 6 columns

Caution

Don’t forget to put the initial 0 index in my_bin.loc[0, "ring"]. Otherwise, Python will get unhappy. We need to extract the ring and bin for the specific face of interest.

Using this approach, we can extract vectors from within single cells. But, the syntax seems a bit long due to the explicit indexing for the ring and the bin indices. This syntax also only works for Tregenza spheres. We would need to figure something else out for triangulated spheres.

Thankfully, we don’t have to go to this bother! VectoRose includes the helpful method SphereBase.get_vectors_from_single_cell(). This method takes in the bin information for a single cell, regardless of the specific sphere implementation, and extracts all the vectors located in that one cell.

Attention

The method SphereBase.get_vectors_from_single_cell() takes in two arguments:

  1. The DataFrame containing the labelled vectors as returned by SphereBase.assign_histogram_bins().

  2. A Series containing the information for the single bin examined. If magnitude information is present, then filtering will automatically happen by orientation and magnitude. If you do not want to filter by magnitude, only pass in the orientation bin information.

Here’s the thing… We got a DataFrame from our call to SphereBase.assign_histogram_bins(). We need to now just extract our lone row. We can do that easily using the iloc attribute.

my_bin_series = my_bin.iloc[0]

my_bin_series
phi          55.0
theta         0.0
magnitude     1.0
shell         4.0
ring         16.0
bin           0.0
Name: 0, dtype: float64

And now we’re ready to perform the extraction using the call to SphereBase.get_vectors_from_single_cell().

vectors_in_cell = my_sphere.get_vectors_from_single_cell(
    labelled_vectors, my_bin_series
)

print(f"We have extracted {len(vectors_in_cell)} vectors!")

vectors_in_cell
We have extracted 110 vectors!
phi theta magnitude shell ring bin
100274 54.440479 1.693340 0.709073 4 16 0
100309 54.129339 1.514807 0.699285 4 16 0
102816 54.449704 1.003119 0.664982 4 16 0
103208 54.554752 0.260319 0.648086 4 16 0
105354 54.642801 1.452821 0.644222 4 16 0
... ... ... ... ... ... ...
188820 52.599966 1.321196 0.768638 4 16 0
190030 52.635190 1.606791 0.746692 4 16 0
190262 52.445586 0.027911 0.659340 4 16 0
192672 55.359747 1.179960 0.764052 4 16 0
195050 54.256079 1.078080 0.711523 4 16 0

110 rows × 6 columns

But, wait! We have fewer rows here! What’s going on?

The answer is that our my_bin_series contains the magnitude shell, so filtering is done automatically by both magnitude and orientation. If we want to just use the orientation, we can extract the bin and ring data:

vectors_in_cell = my_sphere.get_vectors_from_single_cell(
    labelled_vectors, my_bin_series[["ring", "bin"]]
)

print(f"We have extracted {len(vectors_in_cell)} vectors!")

vectors_in_cell
We have extracted 343 vectors!
phi theta magnitude shell ring bin
10069 55.106892 1.252508 0.319393 2 16 0
25419 53.058520 2.192422 0.343976 2 16 0
25851 53.435031 1.138519 0.333352 2 16 0
100094 52.227808 0.193758 0.530046 3 16 0
100245 54.934177 1.098404 1.140148 7 16 0
... ... ... ... ... ... ...
197204 55.365899 2.073308 0.875228 5 16 0
197213 52.904454 2.362285 0.480103 3 16 0
198577 53.546724 0.102937 0.582745 3 16 0
199833 55.281786 0.793643 0.542250 3 16 0
199854 53.245919 0.564946 0.499349 3 16 0

343 rows × 6 columns

We now have the exact same result. And, we didn’t need to figure out any quirks of pandas indexing!

This method offers a flexible approach that will work regardless of whether we are working with a TregenzaSphere or a TriangleSphere.

But, what if we want to extract vectors from multiple cells? I’m glad you asked…

Orientation Filtering - Multiple Bins#

Let’s say we don’t want to confine our filtering to a single bin. Well, the solution is actually quite easy! Just like we have the method SphereBase.get_vectors_from_single_cell() for getting vectors in a single cell, we have the similar SphereBase.get_vectors_from_selected_cells() to get the vectors contained in multiple cells. It’s actually even easier this time, because we don’t need to convert our cells into a Series. We can directly pass in the DataFrame containing the cell information.

Let’s say we want to get the vectors from a couple of other cells. We want to extract those centred around \((\phi, \theta) = \{ (55, 0), (60, 5), (70, 10) \}\)

Well, we can easily get the corresponding cell information, and then extract the vectors using a similar pipeline.

spherical_coordinates = np.array(
    [
        [55, 0],
        [60, 5],
        [70, 10],
    ]
)

cartesian_coordinates = vr.util.convert_spherical_to_cartesian_coordinates(
    spherical_coordinates, radius=1, use_degrees=True
)

bin_indices, _ = my_sphere.assign_histogram_bins(cartesian_coordinates)

# Let's not filter by magnitude
orientation_bins = bin_indices[["ring", "bin"]]

orientation_bins
ring bin
0 16 0
1 18 2
2 21 4

Now that we have our bins, let’s extract our vectors!

vectors_in_cells = my_sphere.get_vectors_from_selected_cells(
    labelled_vectors, orientation_bins
)

print(
    f"We have extracted {len(vectors_in_cells)} vectors from the "
    f"{len(orientation_bins)} cells."
)

vectors_in_cells
We have extracted 835 vectors from the 3 cells.
phi theta magnitude shell ring bin
10069 55.106892 1.252508 0.319393 2 16 0
25419 53.058520 2.192422 0.343976 2 16 0
25851 53.435031 1.138519 0.333352 2 16 0
100094 52.227808 0.193758 0.530046 3 16 0
100245 54.934177 1.098404 1.140148 7 16 0
... ... ... ... ... ... ...
195499 69.746920 10.137440 0.780610 4 21 4
195587 69.960752 8.986954 0.898679 5 21 4
196012 72.704219 9.780314 0.841229 5 21 4
196750 72.373390 10.206567 0.879696 5 21 4
198276 70.857858 10.463513 0.623735 3 21 4

835 rows × 6 columns

Seems quite straight-forward, eh?

But… what if you don’t know the angles?

Interactively Selecting Histogram Cells#

If you don’t know the exact angles of your faces of interest, no problem! VectoRose contains a way to interactively select the histogram cells to use for this process. This interactivity is controlled by the SpherePlotter class.

Warning

The interactive face selector only works when running VectoRose in a local Python shell or using the trame renderer when using a Jupyter notebook. It will not appear in the rendered HTML documentation. We have embedded some videos to illustrate the process, but to get the full experience of this example, please make sure to run the code locally.

To be able to interactively select histogram cells, you must set the property SpherePlotter.cell_picking_active to be True. Then, in the interactive plotter, you can select cells by right-clicking them. To deselect a cell, you must right-click again.

Tip

Cell selection is done by right-clicking.

When a cell is selected, it will appear to have a thick magenta border.

To clear selected cells, simply call SpherePlotter.clear_picked_cells(). The picked cells are cleared automatically if cell picking is deactivated.

Warning

For reasons potentially beyond our control, it may be difficult to select certain cells (for example, the poles of a Tregenza sphere). We have provided programmatic ways to select cells, shown below. See SpherePlotter.pick_cells() and SphereBase.get_cell_indices() for the two key methods.

Once the cells are picked, you can access the bin information for the selected cells through the property SpherePlotter.picked_cells. The DataFrame provided by this property can then be passed directly to SphereBase.get_vectors_from_selected_cells() to extract the vectors.

Let’s do a demonstration with the three cells we considered earlier, which are still stored in the variable orientation_bins. Since this example is rendered automatically in HTML, we’ll programmatically select the cells.

orientation_bin_cell_indices = my_sphere.get_cell_indices(orientation_bins)

orientation_plotter = vr.plotting.SpherePlotter(
    orientation_histogram_mesh
)
orientation_plotter.produce_plot()
orientation_plotter.cell_picking_active = True
orientation_plotter.pick_cells(orientation_bin_cell_indices)
orientation_plotter.show()

Now that we have our cells picked, we can get the cell information and extract the vectors.

picked_cells = orientation_plotter.picked_cells
vectors_in_cells = my_sphere.get_vectors_from_selected_cells(
    labelled_vectors, picked_cells
)

print(f"Extracted {len(vectors_in_cells)} vectors from the {len(picked_cells)} picked cells")

vectors_in_cells
Extracted 835 vectors from the 3 picked cells
phi theta magnitude shell ring bin
10069 55.106892 1.252508 0.319393 2 16 0
25419 53.058520 2.192422 0.343976 2 16 0
25851 53.435031 1.138519 0.333352 2 16 0
100094 52.227808 0.193758 0.530046 3 16 0
100245 54.934177 1.098404 1.140148 7 16 0
... ... ... ... ... ... ...
195499 69.746920 10.137440 0.780610 4 21 4
195587 69.960752 8.986954 0.898679 5 21 4
196012 72.704219 9.780314 0.841229 5 21 4
196750 72.373390 10.206567 0.879696 5 21 4
198276 70.857858 10.463513 0.623735 3 21 4

835 rows × 6 columns

Using this approach, we can easily filter vectors based on user-defined cells of interest.

Other Filtering Approaches#

As we mentioned above, this tutorial is not a comprehensive guide on vector filtering. There are many approaches that we haven’t gone into here. For example, by computing arc lengths using the function util.compute_arc_lengths(), it is possible to filter vectors based on angular distance from a reference orientation. This could be useful for separating the two clusters in our dataset. Many of these approaches rely more heavily on the capabilities of pandas, rather than new features introduced by VectoRose.

VectoRose also provides the ability to filter based on both magnitude and orientation. Combining these two variables provides the user with much greater control over which vectors are kept for analysis.

Conclusion#

In this guide, we have seen the basics of performing vector filtering using VectoRose and pandas. You can now easily extract vectors from individual histogram cells, or from collections of cells. These operations enable rich analyses of directed data.

Before leaving, let’s close our SpherePlotter objects to release the resources back to the operating system.

sphere_plotter.close()
orientation_plotter.close()

Now, you are equipped to not only construct histograms using VectoRose, but also to select specific data points to analyse further.