Tensors¶
Tensors are multidimensional arrays of numbers on which element-wise operations such as arithmetic or trigonometry can be performed. Some tensors, those which are 2-D, support special matrix operations like matrix multiplication, computing the inverse or transpose, or solving a set of linear equations.
Furthermore, tensors can be created with a specific statistical distribution or reduced to a single number with basic arbitrary expressions (for example, to compute the sum or product).
Tensors provide the following features:
- Constant indexing using a number, range, or another (boolean) tensor
- Matrix operations (multiplication, power and inverse, transpose, outer, solving A x = b, least-squares)
- Matrix decompositions like QR, Cholesky
- Constructors to create tensors from arrays, filled with zeros or ones
- Constructors to create a range of numbers, or linear/log/geom space
- Constructors to create an identity matrix, or with a diagonal
- Reshape, flatten, or concatenate tensors
- Element-wise operations (arithmetic, rounding, trigonometry, compare)
- Logical operations on boolean tensors
- Reductions using arbitrary expressions
- Statistics (operations and generating statistical distributions)
Implementations¶
Two crates exist which implement the Tensor
interface: one which uses
SIMD instructions on the CPU and one which uses buffers and compute shaders
on the GPU:
Crate | Implementation | Evaluation | Tensors |
---|---|---|---|
orka_tensors_cpu | SIMD instructions on CPU | Eager | Small |
orka_tensors_gpu | Compute shaders on GPU | Lazy | Large |
The SIMD implementation uses x86 SIMD instructions and has certain characteristics:
-
Numbers in tensors are always floating-point numbers
-
No pointers are used; functions always return a new tensor and do not modify the tensor parameters of a function. Thus operations are evaluated immediately and there's little room for additional optimizations besides the use of SIMD instructions.
The GPU implementation uses compute shaders and stores tensors in buffers on the GPU. On an integrated GPU these buffers may be as small as 128 MiB, but discrete GPUs may support larger buffers of up to 2 GiB.
Furthermore, the GPU implementation builds a directed acyclic graph of operations and materializes the data only at the last possible moment, such as when one or more elements are retrieved from the tensor with a getter function or when you switch from a sequence of element-wise operations to a matrix operation, for example.
Limitations of tensors
All tensors have the following limitations:
-
Tensors of three dimensions or higher are partially supported. Element-wise operations are supported, but some matrix operations need to be modified to handle tensors with 3 or 4 axes.
-
Most functions operate on tensors containing floating-point numbers because of the generic parameter of the package
Orka.Numerics.Tensors
. Certain implementations may supports tensors containing boolean or (unsigned) integers.
Dependencies¶
The SIMD implementation in the orka_tensors_cpu requires one of the following x86 extensions: SSE 4.1, AVX, or AVX2.
The GPU implementation in orka_tensors_gpu requires OpenGL extensions for SSBOs and compute shaders, plus a few others:
Required OpenGL extensions for the GPU implementation
Extension | OpenGL |
---|---|
ARB_compute_shader | 4.3 |
ARB_compute_variable_group_size | |
ARB_shader_storage_buffer_object | 4.3 |
ARB_shader_clock |
Most GPUs from 2012 or later should have these extensions if you use a video driver provided by your Linux distribution.