--- jupytext: text_representation: extension: .md format_name: myst format_version: 0.13 jupytext_version: 1.16.4 kernelspec: display_name: Python 3 (ipykernel) language: python name: python3 --- # Loading Vectors into VectoRose The previous section introduced axial and vectorial data. VectoRose is a Python package that can be used to visualise and analyse these data. But, before the data can be examined, the vectors must be loaded into VectoRose. This page describes how vectors must be formatted and how to import them into VectoRose. Unlike images, which have very well-defined standards, vectorial data have yet to be widely standardised. We have tried to define simple, intuitive formats for representing vectorial data. ## Data Formats and Layout VectoRose accepts axial and vectorial data in three formats: 1. Binary NumPy files (`*.npy`) 2. Comma-separated value files (`*.csv` or `*.txt`) 3. Excel spreadsheets (`*.xlsx`) The data must be arranged so that each **row** represents a single vector, and the columns represent the vector components. This diagram illustrates how the file should be organised: ![Data table representation showing vectors as rows and components as columns](assets/data_format/VectorFormatting.png) If the provided vectors contain spatial information, the first three columns are assumed to represent the vector positions in space, while the last three columns are assumed to represent the vector components. While these settings are the default, they can be easily customised to accommodate files produced by other software tools. ```{attention} While it is possible to configure these options when loading a file, once a collection of vectors is open in VectoRose, these are the conventions that are followed. ``` ### NumPy Arrays NumPy binary array files (see [here](https://numpy.org/doc/stable/reference/generated/numpy.lib.format.html) for more detail) are a flexible, efficient way of storing multidimensional arrays. The important trade-off is that the file format is *binary*, and so you can't open these files in a text editor. #### Higher-dimensional Arrays Unlike spreadsheets and text-based files, the information stored in NumPy arrays are not restricted to two dimensions. NumPy files can easily store arrays of any dimension. An example of a higher dimensional array is a *vector field*. Since a vector is defined at each position in 3D space, it may be more intuitive to store these data in a 4D array, where three of the dimensions represent the spatial location and the fourth is used to distinguish the vector components. ```{warning} Currently, only `*.npy` files can be imported. To import an array stored in a compressed `*.npz` file, extract the constituent arrays and load the specific `*.npy` file extracted. ``` ### CSV Files CSV files are a **plain-text** format which represent data as a 2D table. Each line in the file represents a table row. Within a row, columns are separated by a specific character, such as a comma (`,`), a tab (`\t`), a space (` `) or a semicolon (`;`). As these files are text-based, they are relatively lightweight and can easily be opened with a wide variety of editing software[^text-editors]. ### Excel Spreadsheets Excel spreadsheets are a more sophisticated XML-based format for storing multiple 2D tables as sheets containing rows and columns. Similar to the other formats described, rows represent different vectors and columns represent different vector components. ```{warning} VectoRose can only open the newer ``*.xlsx`` files. The older ``*.xls`` spreadsheets may not be supported. ``` ## Importing Vectors into VectoRose Once we have vectors in one of the file formats described above, we can load these vectors into Python using VectoRose. Before trying to load your vectors, you **must** import the `vectorose` package into the Python interpreter: ```python import vectorose as vr ``` ```{tip} To save some time writing code, we recommend using `vr` as a shorthand for `vectorose`. ``` Vectors are imported using the function {func}`vectorose.io.import_vector_field`. For example, if your vectors are in a NumPy array file called {download}`two_clusters.npy <./two_clusters.npy>`, we can load the vectors by writing: ```{code-cell} ipython3 import vectorose as vr vectors = vr.io.import_vector_field("two_clusters.npy") vectors ``` We can now see that we have an array of vectors available to process and analyse. There are a number of parameters that can control how the vectors are loaded. The most important parameters are: `component_columns` : Indicate the columns containing the `x,y,z` vector components. By default, the last three columns are considered. `location_columns` : Indicate the columns containing the `x,y,z` positions of the vectors in space, in the case of a vector field. If this is set to `None`, then the location coordinates are ignored. By default, the first three columns are considered. `separator` : When reading vectors from a **CSV file**, indicate what character is used to separate the columns. `contains_headers` : Indicate whether the first row of the file contains column headers, which will be discarded. `sheet` : When reading vectors from an **Excel file**, indicate the name or position of the sheet to read. Regardless of the file type, this function creates a 2D NumPy array with `n` rows, corresponding to the number of vectors, and either 3 or 6 columns, depending on whether the location coordinates are read. ## Pre-processing Vectors Once the vectors are read, there are a number of important pre-processing steps that can be performed: {func}`vectorose.util.remove_zero_vectors` : Remove all vectors with a magnitude of zero from the list. {func}`vectorose.util.convert_vectors_to_axes` : Flip all vectors having a negative `z`-component to ensure that all orientations are contained within the upper unit hemisphere. {func}`vectorose.util.create_symmetric_vectors_from_axes` : When analysing axial data, generate a pair of antiparallel vectors for each vector in the list. {func}`vectorose.util.normalise_vectors` : Return a set of unit vectors having the same orientations/directions as the loaded data. ## Example We have a collection of vectors in {download}`two_clusters.csv <./two_clusters.csv>`. Take a look at this file... the columns are separated by commas and the first row is a header, and there are no spatial coordinates present. Let's load these vectors, remove any zero-vectors and convert these vectors into an axial representation. Here's how we can perform this task: ```{code-cell} ipython3 import vectorose as vr # Load the vectors from the CSV file vectors = vr.io.import_vector_field( "two_clusters.csv", contains_headers=True, location_columns=None, separator="," ) print(f"We have loaded {vectors.shape[0]} vectors from the file.") # Remove zero-magnitude vectors vectors = vr.util.remove_zero_vectors(vectors) print(f"We have {vectors.shape[0]} non-zero vectors.") # Convert to axial data vectors = vr.util.convert_vectors_to_axes(vectors) vectors ``` We can now see that we've loaded the vectors, and we've managed to prune quite a few zero-vectors that we had in our dataset. ```{seealso} For more details about importing vector fields, check out the documentation on {mod}`vectorose.io` and for more on pre-processing, consult the page on {mod}`vectorose.util`. ``` ### Bundled Examples To make the process of loading sample data easier when following along with the documentation, we have bundled three sample datasets that can be loaded directly in VectoRose without needing to download any additional files. These datasets can be accessed via the class {class}`.data.SampleData`. ````{attention} The sample data are found in the {mod}`vectorose.data` module, which is not automatically imported. You **must** explicitly import the {mod}`.data` submodule using: ```python import vectorose.data ``` Even if the `vr` alias is used for `vectorose`, the full package name should still be used for this import. ```` In the case of the `two_clusters` dataset, we can open the vectors easily using the {meth}`.SampleData.load` method of the object {attr}`.SampleData.TWO_CLUSTERS`: ```{code-cell} ipython3 import vectorose.data vectors = vr.data.SampleData.TWO_CLUSTERS.load() print(f"We have loaded {vectors.shape[0]} vectors from the file.") # Remove zero-magnitude vectors vectors = vr.util.remove_zero_vectors(vectors) print(f"We have {vectors.shape[0]} non-zero vectors.") # Convert to axial data vectors = vr.util.convert_vectors_to_axes(vectors) vectors ``` But, loading vectors is just the beginning! Now that we know how to load and pre-process vectors, we can begin with data visualisation. [^text-editors]: And we'll politely sit out the fight over which text editor that would be...