Skip to content

Tensors

Tensors are multidimensional arrays of numbers on which element-wise operations such as arithmetic or trigonometry can be performed. Some tensors, those which are 2-D, support special matrix operations like matrix multiplication, computing the inverse or transpose, or solving a set of linear equations.

Furthermore, tensors can be created with a specific statistical distribution or reduced to a single number with basic arbitrary expressions (for example, to compute the sum or product).

Tensors provide the following features:

  • Constant indexing using a number, range, or another (boolean) tensor
  • Matrix operations (multiplication, power and inverse, transpose, outer, solving A x = b, least-squares)
  • Matrix decompositions like QR, Cholesky
  • Constructors to create tensors from arrays, filled with zeros or ones
  • Constructors to create a range of numbers, or linear/log/geom space
  • Constructors to create an identity matrix, or with a diagonal
  • Reshape, flatten, or concatenate tensors
  • Element-wise operations (arithmetic, rounding, trigonometry, compare)
  • Logical operations on boolean tensors
  • Reductions using arbitrary expressions
  • Statistics (operations and generating statistical distributions)

Implementations

Two crates exist which implement the Tensor interface: one which uses SIMD instructions on the CPU and one which uses buffers and compute shaders on the GPU:

Crate Implementation Evaluation Tensors
orka_tensors_cpu SIMD instructions on CPU Eager Small
orka_tensors_gpu Compute shaders on GPU Lazy Large

The SIMD implementation uses x86 SIMD instructions and has certain characteristics:

  • Numbers in tensors are always floating-point numbers

  • No pointers are used; functions always return a new tensor and do not modify the tensor parameters of a function. Thus operations are evaluated immediately and there's little room for additional optimizations besides the use of SIMD instructions.

The GPU implementation uses compute shaders and stores tensors in buffers on the GPU. On an integrated GPU these buffers may be as small as 128 MiB, but discrete GPUs may support larger buffers of up to 2 GiB.

Furthermore, the GPU implementation builds a directed acyclic graph of operations and materializes the data only at the last possible moment, such as when one or more elements are retrieved from the tensor with a getter function or when you switch from a sequence of element-wise operations to a matrix operation, for example.

Limitations of tensors

All tensors have the following limitations:

  • Tensors of three dimensions or higher are partially supported. Element-wise operations are supported, but some matrix operations need to be modified to handle tensors with 3 or 4 axes.

  • Most functions operate on tensors containing floating-point numbers because of the generic parameter of the package Orka.Numerics.Tensors. Certain implementations may supports tensors containing boolean or (unsigned) integers.

Dependencies

The SIMD implementation in the orka_tensors_cpu requires one of the following x86 extensions: SSE 4.1, AVX, or AVX2.

The GPU implementation in orka_tensors_gpu requires OpenGL extensions for SSBOs and compute shaders, plus a few others:

Required OpenGL extensions for the GPU implementation

Extension OpenGL
ARB_compute_shader 4.3
ARB_compute_variable_group_size
ARB_shader_storage_buffer_object 4.3
ARB_shader_clock

Most GPUs from 2012 or later should have these extensions if you use a video driver provided by your Linux distribution.