Vector Filtering#

At this point, we’ve now seen how to generate rich histogram visualisations that allow us to observe patterns in a collection of vectors. But, what if we want to take things a step further and only study vectors with a magnitude in a specific range, or with a specific orientation?

In this section, we’ll see how to filter vector datasets based on magnitude and orientation.

Important

When we use the term filter, we are referring to selecting specific vectors on the basis of magnitude and/or orientation.

We’ll continue with our two_clusters.npy dataset, which can also be loaded directly in VectoRose without any separate download using SampleData.TWO_CLUSTERS in the data module.

As usual, let’s start by loading the data.

import numpy as np # We'll use NumPy a bit later

import vectorose as vr
import vectorose.data

my_vectors = vr.data.SampleData.TWO_CLUSTERS.load()
my_vectors = vr.util.remove_zero_vectors(my_vectors)

my_vectors

array([[ 0.01330902,  0.06486094, -0.08154041],
       [ 0.19911095,  0.06809676, -0.02230348],
       [ 0.14568445,  0.08995054,  0.06688315],
       ...,
       [-0.0377645 ,  0.35891606,  0.6731566 ],
       [-0.13033349,  0.56234415,  0.27104764],
       [-0.03287074,  0.60723468,  0.63249369]], shape=(200000, 3))

As usual, let’s also create a FineTregenzaSphere with 10 nested shells to use for bin assignment in terms of magnitude and orientation.

my_sphere = vr.tregenza_sphere.FineTregenzaSphere(number_of_shells=10)
labelled_vectors, magnitude_bins = my_sphere.assign_histogram_bins(my_vectors)

labelled_vectors

	phi	theta	magnitude	shell	ring	bin
0	140.922763	11.595749	0.105038	0	41	3
1	96.050088	71.119103	0.211612	1	28	34
2	68.662637	58.307407	0.183816	1	20	26
3	108.637973	58.402654	0.283404	1	32	26
4	101.600750	38.565368	0.301634	1	30	18
...	...	...	...	...	...	...
199995	70.754479	309.190779	0.922267	5	21	141
199996	35.055502	350.961121	0.900928	5	11	101
199997	28.196955	353.993542	0.763798	4	9	85
199998	64.847612	346.951053	0.637718	4	19	151
199999	43.874659	356.901498	0.877418	5	13	118

200000 rows × 6 columns

Our vectors have now been assigned to bins based on magnitude and orientation. With this in mind, we can already see a couple of possible ways that we may want to filter:

based on magnitude using the shell index, or
based on orientation using the angular bin index.

With these ideas in mind, let’s get started on filtering!

Tip

This page is not a comprehensive guide on filtering. Since vector datasets are represented as NumPy arrays and pandas DataFrames, the possibilities are quite endless. This page seeks to show some simple workflows for filtering using the representations and functions included in VectoRose.

Now that we have the vectors assigned to histogram bins, we can use the built-in tools provided by pandas to perform filtering based on both magnitude and orientation. We’ll also see some novel tools provided by VectoRose to simplify this filtering process.

Before we get started, here’s what the nested histogram shells for this dataset look like:

We’ll keep this distribution in mind as we perform filtering.

Magnitude Filtering#

The vector magnitude is represented by a scalar value. Filtering based on the scalar magnitude is quite trivial. Let’s immediately dive into an example to see this.

Looking at our histogram shells, it’s quite clear that fifth shell contains an interesting pattern. It contains a cluster with a high frequency. If we want to study only the vectors in this magnitude level, we can simply look for all vectors with a shell index of 4.

Warning

Recall that indexing in Python starts at zero, so the fifth shell has index 4, not 5.

Let’s see how we can do that in code:

shell_4_vectors = labelled_vectors[labelled_vectors["shell"] == 4]

shell_4_vectors

	phi	theta	magnitude	shell	ring	bin
7313	94.603439	56.255863	0.682123	4	28	27
7377	132.959873	325.894683	0.634843	4	39	115
7480	97.308252	44.798522	0.638771	4	29	21
7694	89.834346	86.295302	0.653199	4	26	41
10789	147.005123	74.227381	0.668650	4	43	19
...	...	...	...	...	...	...
199989	7.202286	315.019661	0.725102	4	2	14
199993	28.011988	332.977878	0.737873	4	8	71
199994	66.015628	9.319906	0.784328	4	20	4
199997	28.196955	353.993542	0.763798	4	9	85
199998	64.847612	346.951053	0.637718	4	19	151

30856 rows × 6 columns

Now, notice that all of our vectors are in shell 4. We can then convert these vectors to a NumPy array using the method :meth:.SphereBase.convert_vectors_to_cartesian_array:

shell_4_vectors_array = my_sphere.convert_vectors_to_cartesian_array(
    shell_4_vectors,
)

shell_4_vectors_array

array([[ 0.56537347,  0.37768679, -0.05474634],
       [-0.26050741,  0.38469095, -0.43263665],
       [ 0.44643152,  0.44958233, -0.08125639],
       ...,
       [ 0.11605203,  0.70714692,  0.31881958],
       [-0.0377645 ,  0.35891606,  0.6731566 ],
       [-0.13033349,  0.56234415,  0.27104764]], shape=(30856, 3))

We can, of course, perform more complicated filtering tasks to extract the vectors from multiple shells. For example, to select all vectors in the fifth shell or higher, we can simply run:

labelled_vectors[labelled_vectors["shell"] >= 4]

	phi	theta	magnitude	shell	ring	bin
7313	94.603439	56.255863	0.682123	4	28	27
7377	132.959873	325.894683	0.634843	4	39	115
7480	97.308252	44.798522	0.638771	4	29	21
7694	89.834346	86.295302	0.653199	4	26	41
10789	147.005123	74.227381	0.668650	4	43	19
...	...	...	...	...	...	...
199995	70.754479	309.190779	0.922267	5	21	141
199996	35.055502	350.961121	0.900928	5	11	101
199997	28.196955	353.993542	0.763798	4	9	85
199998	64.847612	346.951053	0.637718	4	19	151
199999	43.874659	356.901498	0.877418	5	13	118

63243 rows × 6 columns

We can get the other vectors by running:

labelled_vectors[labelled_vectors["shell"] < 4]

	phi	theta	magnitude	shell	ring	bin
0	140.922763	11.595749	0.105038	0	41	3
1	96.050088	71.119103	0.211612	1	28	34
2	68.662637	58.307407	0.183816	1	20	26
3	108.637973	58.402654	0.283404	1	32	26
4	101.600750	38.565368	0.301634	1	30	18
...	...	...	...	...	...	...
199975	40.465908	11.782549	0.529120	3	12	3
199980	88.269805	10.719323	0.542940	3	26	5
199986	51.338357	8.719554	0.540043	3	15	3
199991	73.835240	337.467390	0.472457	2	22	157
199992	45.375038	354.394809	0.342514	2	14	126

136757 rows × 6 columns

We can also combine both of these to select from a range of shells:

labelled_vectors[
    (4 <= labelled_vectors["shell"]) &  (labelled_vectors["shell"] <= 6)
]

	phi	theta	magnitude	shell	ring	bin
7313	94.603439	56.255863	0.682123	4	28	27
7377	132.959873	325.894683	0.634843	4	39	115
7480	97.308252	44.798522	0.638771	4	29	21
7694	89.834346	86.295302	0.653199	4	26	41
10789	147.005123	74.227381	0.668650	4	43	19
...	...	...	...	...	...	...
199995	70.754479	309.190779	0.922267	5	21	141
199996	35.055502	350.961121	0.900928	5	11	101
199997	28.196955	353.993542	0.763798	4	9	85
199998	64.847612	346.951053	0.637718	4	19	151
199999	43.874659	356.901498	0.877418	5	13	118

61166 rows × 6 columns

We can also work directly based on the magnitude values, ignoring the shell indices altogether:

labelled_vectors[labelled_vectors["magnitude"] <= 0.25]

	phi	theta	magnitude	shell	ring	bin
0	140.922763	11.595749	0.105038	0	41	3
1	96.050088	71.119103	0.211612	1	28	34
2	68.662637	58.307407	0.183816	1	20	26
6	125.068594	92.790300	0.240778	1	37	36
9	127.274274	41.427476	0.188289	1	37	16
...	...	...	...	...	...	...
199362	71.228212	38.097520	0.244133	1	21	17
199432	45.116698	8.053522	0.191675	1	13	2
199602	45.084514	333.047845	0.232529	1	13	111
199965	23.988923	59.811282	0.212724	1	7	11
199971	47.203126	12.767886	0.238363	1	14	4

31926 rows × 6 columns

For more details on basic indexing using pandas, make sure to check out this page in the pandas documentation.

Orientation Filtering - Single Bin#

Orientation presents a bit more of a challenge. On a basic level, we can do something similar for filtering by orientation. Let’s say we want all vectors containing in the ring with index 15 and the bin with index 10 in that ring. We can once again use pandas directly:

filtered_vectors = labelled_vectors[
    (labelled_vectors["ring"] == 15) & (labelled_vectors["bin"] == 10)
]

filtered_vectors

	phi	theta	magnitude	shell	ring	bin
2186	51.280935	28.434182	0.270519	1	15	10
4679	49.549383	27.674359	0.293125	1	15	10
8585	51.977364	28.753270	0.377855	2	15	10
32739	51.187734	28.800747	0.110106	0	15	10
66625	49.406903	28.639178	0.419409	2	15	10
...	...	...	...	...	...	...
195255	52.142043	28.854474	0.593782	3	15	10
195918	50.777303	27.513780	0.540157	3	15	10
197121	49.018275	28.045500	0.579771	3	15	10
198464	49.466779	27.042250	0.749263	4	15	10
199896	48.750686	28.114370	0.765183	4	15	10

182 rows × 6 columns

In practice, this isn’t very helpful. It’s hard to intuitively know what bin we want for a specific orientation.

The good news is that we can use other tools in VectoRose to convert between angles and face index information.

Let’s plot the orientation histogram for our dataset to find the orientations of interesting features. To help, we’ll add angular \(\phi\) and \(\theta\) axes using the method SpherePlotter.add_spherical_axes().

orientation_histogram = my_sphere.construct_marginal_orientation_histogram(
    labelled_vectors
)

orientation_histogram_mesh = my_sphere.create_shell_mesh(orientation_histogram)

sphere_plotter = vr.plotting.SpherePlotter(orientation_histogram_mesh)
sphere_plotter.produce_plot()
sphere_plotter.add_spherical_axes()
sphere_plotter.show()

2026-05-21 05:26:04.604 (   1.704s) [    70A7CE554B80]vtkXOpenGLRenderWindow.:1458  WARN| bad X server connection. DISPLAY=

Using these axes, we can see the orientations of our two clusters in the dataset. The upper cluster seems centred around \(\phi=55\) and \(\theta=0\) degrees.

Let’s extract all the vectors that fall into the bin containing this orientation. To do this, we just need to create a unit vector pointed in that direction in Cartesian coordinates and pass it to SphereBase.assign_histogram_bins() to get the closest bin.

my_spherical_coordinates = np.array([55, 0])
my_cartesian_coordinates = vr.util.convert_spherical_to_cartesian_coordinates(
    my_spherical_coordinates, radius=1, use_degrees=True
)

my_bin, _ = my_sphere.assign_histogram_bins(my_cartesian_coordinates)

my_bin

	phi	theta	magnitude	shell	ring	bin
0	55.0	0.0	1.0	4	16	0

Now we see that the vectors we want to filter are found in ring 16, bin 0. To extract these vectors, we can once again use the indexing features from pandas:

vectors_in_cell = labelled_vectors.loc[
    (labelled_vectors["ring"] == my_bin.loc[0, "ring"])
    & (labelled_vectors["bin"] == my_bin.loc[0, "bin"])
]

print(f"We have extracted {len(vectors_in_cell)} vectors!")

vectors_in_cell

We have extracted 343 vectors!

	phi	theta	magnitude	shell	ring	bin
10069	55.106892	1.252508	0.319393	2	16	0
25419	53.058520	2.192422	0.343976	2	16	0
25851	53.435031	1.138519	0.333352	2	16	0
100094	52.227808	0.193758	0.530046	3	16	0
100245	54.934177	1.098404	1.140148	7	16	0
...	...	...	...	...	...	...
197204	55.365899	2.073308	0.875228	5	16	0
197213	52.904454	2.362285	0.480103	3	16	0
198577	53.546724	0.102937	0.582745	3	16	0
199833	55.281786	0.793643	0.542250	3	16	0
199854	53.245919	0.564946	0.499349	3	16	0

343 rows × 6 columns

Caution

Don’t forget to put the initial 0 index in my_bin.loc[0, "ring"]. Otherwise, Python will get unhappy. We need to extract the ring and bin for the specific face of interest.

Using this approach, we can extract vectors from within single cells. But, the syntax seems a bit long due to the explicit indexing for the ring and the bin indices. This syntax also only works for Tregenza spheres. We would need to figure something else out for triangulated spheres.

Thankfully, we don’t have to go to this bother! VectoRose includes the helpful method SphereBase.get_vectors_from_single_cell(). This method takes in the bin information for a single cell, regardless of the specific sphere implementation, and extracts all the vectors located in that one cell.

Attention

The method SphereBase.get_vectors_from_single_cell() takes in two arguments:

The DataFrame containing the labelled vectors as returned by SphereBase.assign_histogram_bins().
A Series containing the information for the single bin examined. If magnitude information is present, then filtering will automatically happen by orientation and magnitude. If you do not want to filter by magnitude, only pass in the orientation bin information.

Here’s the thing… We got a DataFrame from our call to SphereBase.assign_histogram_bins(). We need to now just extract our lone row. We can do that easily using the iloc attribute.

my_bin_series = my_bin.iloc[0]

my_bin_series

phi          55.0
theta         0.0
magnitude     1.0
shell         4.0
ring         16.0
bin           0.0
Name: 0, dtype: float64

And now we’re ready to perform the extraction using the call to SphereBase.get_vectors_from_single_cell().

vectors_in_cell = my_sphere.get_vectors_from_single_cell(
    labelled_vectors, my_bin_series
)

print(f"We have extracted {len(vectors_in_cell)} vectors!")

vectors_in_cell

We have extracted 110 vectors!

	phi	theta	magnitude	shell	ring	bin
100274	54.440479	1.693340	0.709073	4	16	0
100309	54.129339	1.514807	0.699285	4	16	0
102816	54.449704	1.003119	0.664982	4	16	0
103208	54.554752	0.260319	0.648086	4	16	0
105354	54.642801	1.452821	0.644222	4	16	0
...	...	...	...	...	...	...
188820	52.599966	1.321196	0.768638	4	16	0
190030	52.635190	1.606791	0.746692	4	16	0
190262	52.445586	0.027911	0.659340	4	16	0
192672	55.359747	1.179960	0.764052	4	16	0
195050	54.256079	1.078080	0.711523	4	16	0

110 rows × 6 columns

But, wait! We have fewer rows here! What’s going on?

The answer is that our my_bin_series contains the magnitude shell, so filtering is done automatically by both magnitude and orientation. If we want to just use the orientation, we can extract the bin and ring data:

vectors_in_cell = my_sphere.get_vectors_from_single_cell(
    labelled_vectors, my_bin_series[["ring", "bin"]]
)

print(f"We have extracted {len(vectors_in_cell)} vectors!")

vectors_in_cell

We have extracted 343 vectors!

	phi	theta	magnitude	shell	ring	bin
10069	55.106892	1.252508	0.319393	2	16	0
25419	53.058520	2.192422	0.343976	2	16	0
25851	53.435031	1.138519	0.333352	2	16	0
100094	52.227808	0.193758	0.530046	3	16	0
100245	54.934177	1.098404	1.140148	7	16	0
...	...	...	...	...	...	...
197204	55.365899	2.073308	0.875228	5	16	0
197213	52.904454	2.362285	0.480103	3	16	0
198577	53.546724	0.102937	0.582745	3	16	0
199833	55.281786	0.793643	0.542250	3	16	0
199854	53.245919	0.564946	0.499349	3	16	0

343 rows × 6 columns

We now have the exact same result. And, we didn’t need to figure out any quirks of pandas indexing!

This method offers a flexible approach that will work regardless of whether we are working with a TregenzaSphere or a TriangleSphere.

But, what if we want to extract vectors from multiple cells? I’m glad you asked…

Orientation Filtering - Multiple Bins#

Let’s say we don’t want to confine our filtering to a single bin. Well, the solution is actually quite easy! Just like we have the method SphereBase.get_vectors_from_single_cell() for getting vectors in a single cell, we have the similar SphereBase.get_vectors_from_selected_cells() to get the vectors contained in multiple cells. It’s actually even easier this time, because we don’t need to convert our cells into a Series. We can directly pass in the DataFrame containing the cell information.

Let’s say we want to get the vectors from a couple of other cells. We want to extract those centred around \((\phi, \theta) = \{ (55, 0), (60, 5), (70, 10) \}\)

Well, we can easily get the corresponding cell information, and then extract the vectors using a similar pipeline.

spherical_coordinates = np.array(
    [
        [55, 0],
        [60, 5],
        [70, 10],
    ]
)

cartesian_coordinates = vr.util.convert_spherical_to_cartesian_coordinates(
    spherical_coordinates, radius=1, use_degrees=True
)

bin_indices, _ = my_sphere.assign_histogram_bins(cartesian_coordinates)

# Let's not filter by magnitude
orientation_bins = bin_indices[["ring", "bin"]]

orientation_bins

	ring	bin
0	16	0
1	18	2
2	21	4

Now that we have our bins, let’s extract our vectors!

vectors_in_cells = my_sphere.get_vectors_from_selected_cells(
    labelled_vectors, orientation_bins
)

print(
    f"We have extracted {len(vectors_in_cells)} vectors from the "
    f"{len(orientation_bins)} cells."
)

vectors_in_cells

We have extracted 835 vectors from the 3 cells.

	phi	theta	magnitude	shell	ring	bin
10069	55.106892	1.252508	0.319393	2	16	0
25419	53.058520	2.192422	0.343976	2	16	0
25851	53.435031	1.138519	0.333352	2	16	0
100094	52.227808	0.193758	0.530046	3	16	0
100245	54.934177	1.098404	1.140148	7	16	0
...	...	...	...	...	...	...
195499	69.746920	10.137440	0.780610	4	21	4
195587	69.960752	8.986954	0.898679	5	21	4
196012	72.704219	9.780314	0.841229	5	21	4
196750	72.373390	10.206567	0.879696	5	21	4
198276	70.857858	10.463513	0.623735	3	21	4

835 rows × 6 columns

Seems quite straight-forward, eh?

But… what if you don’t know the angles?

Interactively Selecting Histogram Cells#

If you don’t know the exact angles of your faces of interest, no problem! VectoRose contains a way to interactively select the histogram cells to use for this process. This interactivity is controlled by the SpherePlotter class.

Warning

The interactive face selector only works when running VectoRose in a local Python shell or using the trame renderer when using a Jupyter notebook. It will not appear in the rendered HTML documentation. We have embedded some videos to illustrate the process, but to get the full experience of this example, please make sure to run the code locally.

To be able to interactively select histogram cells, you must set the property SpherePlotter.cell_picking_active to be True. Then, in the interactive plotter, you can select cells by right-clicking them. To deselect a cell, you must right-click again.

Tip

Cell selection is done by right-clicking.

When a cell is selected, it will appear to have a thick magenta border.

To clear selected cells, simply call SpherePlotter.clear_picked_cells(). The picked cells are cleared automatically if cell picking is deactivated.

Warning

For reasons potentially beyond our control, it may be difficult to select certain cells (for example, the poles of a Tregenza sphere). We have provided programmatic ways to select cells, shown below. See SpherePlotter.pick_cells() and SphereBase.get_cell_indices() for the two key methods.

Once the cells are picked, you can access the bin information for the selected cells through the property SpherePlotter.picked_cells. The DataFrame provided by this property can then be passed directly to SphereBase.get_vectors_from_selected_cells() to extract the vectors.

Let’s do a demonstration with the three cells we considered earlier, which are still stored in the variable orientation_bins. Since this example is rendered automatically in HTML, we’ll programmatically select the cells.

orientation_bin_cell_indices = my_sphere.get_cell_indices(orientation_bins)

orientation_plotter = vr.plotting.SpherePlotter(
    orientation_histogram_mesh
)
orientation_plotter.produce_plot()
orientation_plotter.cell_picking_active = True
orientation_plotter.pick_cells(orientation_bin_cell_indices)
orientation_plotter.show()

Now that we have our cells picked, we can get the cell information and extract the vectors.

picked_cells = orientation_plotter.picked_cells
vectors_in_cells = my_sphere.get_vectors_from_selected_cells(
    labelled_vectors, picked_cells
)

print(f"Extracted {len(vectors_in_cells)} vectors from the {len(picked_cells)} picked cells")

vectors_in_cells

Extracted 835 vectors from the 3 picked cells

	phi	theta	magnitude	shell	ring	bin
10069	55.106892	1.252508	0.319393	2	16	0
25419	53.058520	2.192422	0.343976	2	16	0
25851	53.435031	1.138519	0.333352	2	16	0
100094	52.227808	0.193758	0.530046	3	16	0
100245	54.934177	1.098404	1.140148	7	16	0
...	...	...	...	...	...	...
195499	69.746920	10.137440	0.780610	4	21	4
195587	69.960752	8.986954	0.898679	5	21	4
196012	72.704219	9.780314	0.841229	5	21	4
196750	72.373390	10.206567	0.879696	5	21	4
198276	70.857858	10.463513	0.623735	3	21	4

835 rows × 6 columns

Using this approach, we can easily filter vectors based on user-defined cells of interest.

Other Filtering Approaches#

As we mentioned above, this tutorial is not a comprehensive guide on vector filtering. There are many approaches that we haven’t gone into here. For example, by computing arc lengths using the function util.compute_arc_lengths(), it is possible to filter vectors based on angular distance from a reference orientation. This could be useful for separating the two clusters in our dataset. Many of these approaches rely more heavily on the capabilities of pandas, rather than new features introduced by VectoRose.

VectoRose also provides the ability to filter based on both magnitude and orientation. Combining these two variables provides the user with much greater control over which vectors are kept for analysis.

Conclusion#

In this guide, we have seen the basics of performing vector filtering using VectoRose and pandas. You can now easily extract vectors from individual histogram cells, or from collections of cells. These operations enable rich analyses of directed data.

Before leaving, let’s close our SpherePlotter objects to release the resources back to the operating system.

sphere_plotter.close()
orientation_plotter.close()

Now, you are equipped to not only construct histograms using VectoRose, but also to select specific data points to analyse further.