Constructing Cartesian tensors

Background

In physics and engineering, Cartesian tensors play an important role, often in a “stimulus-response” way, when some quantities in a system of interest have a “directional dependence”. For example, the moment of intertia tensor \(\mathbf{I} \in \mathbb{R}^{3\times3}\) of a physical object relates its angular velocity \(\boldsymbol{\omega} \in \mathbb{R}^{3}\) to its angular momentum \(\mathbf{J} \in \mathbb{R}^{3}\) and rotational kinetic energy \(K \in \mathbb{R}\):

\[\begin{split}\mathbf{J} &= \mathbf{I} \cdot \boldsymbol{\omega}\\ K &= \frac{1}{2} \boldsymbol{\omega} \cdot \mathbf{I} \cdot \boldsymbol{\omega}\,.\end{split}\]

The moment of inertia tensor \(\mathbf{I}\) is a tensor of degree (often also called “rank” or “order”) \(2\). In general, a Cartesian tensor of degree \(\ell\) is a \(3\times3\times\dots\times3\) (repeated \(\ell\) times) array of numbers (a tensor of degree \(\ell=0\) is just a single number) and can take \(m \leq \ell\) vectors and return a new tensor of degree \(\ell - m\).

Formal definition of Cartesian tensors

Cartesian tensors are tensors on vector spaces \(V\) with a given scalar product – in our case, on \(V=\mathbb{R}^3\) with the usual scalar product. The tensors of degree \(\ell\) can be defined as the multilinear functions \(V\times V \times ... \times V\rightarrow \mathbb{R}\), where we have \(\ell\) variables in \(V\). We also allow \(\ell=0\) variables, in that case, the function of no variable/tensor of degree \(0\) is just a single number. (Since we have a fixed scalar product, we do not have to distinguish between covariant and contravariant indices, so instead of the “\((p,q)\)-tensors” that are used in settings without a specified scalar product, Cartesian tensors only have one degree \(\ell\), corresponding to \(p+q=\ell\) in the other settings.) In our case \(V=\mathbb{R}^3\) and the multilinear functions of \(\ell\) variables can be given by an \(\ell\)-dimensional array of \(3^\ell\) numbers. There is a natural operation of rotations/reflections \(g\in \mathrm{O}(3)\) on these multilinear functions \(f\), its result \(g f\) is uniquely defined by the property \((g f)(g v_1, ..., g v_l) := f(v_1,...,v_l)\). Therefore we can also say what a covariant map from e.g. point configurations to tensors is, which gives us the notion of covariant tensor-valued features.

We already said that tensors of degree \(0\) are just numbers. Since we have a fixed scalar product, we can also identify vectors \(v \in V\) with their linear functions \(L(w) := \langle w, v \rangle\) and vice versa, so Cartesian tensors of degree \(1\) can be identified with vectors, and under this identification, also the operations of \(\mathrm{O}(3)\) match. Similarly, a matrix \(A\) or a linear map \(A:V\rightarrow V\) can be identified with the tensor of degree \(2\) given by \(T(v,w) := \langle v, A w \rangle\).

Under this identification, symmetric matrices correspond to tensors of degree \(2\) that satisfy \(T(v,w)=T(w,v)\). In general, a tensor is called symmetric if one can change any two variables without affecting the value. As is known from symmetric matrices and quadratic forms, a symmetric tensor \(T\) of degree \(2\) is already determined by its values \(T(v,v)\).

Traceless symmetric tensors

The notion of the trace of a matrix \(\mathrm{Tr}(a_{ij}) := \sum_i a_{ii}\) can also be generalized to tensors: For any tensor \(t_{{i_1}{i_2}...{i_l}}\) of degree \(\ell\) and any two indices \(1\leq n < m \leq \ell\), we can sum over all \(\ell\)-tuples of indices in which \(i_n = i_m\), this gives a tensor of degree \(\ell-2\). A tensor is called traceless if all of these tensors of degree \(\ell-2\) are zero.

It follows directly from the definitions that any \(g\in \mathrm{O}(3)\) maps symmetric tensors again to symmetric tensors, and one can show that it also maps traceless tensors to traceless tensors, so the subspace of symmetric traceless tensors is also a representation of any \(\mathrm{O}(3)\). One can show that its dimension is \(2\ell+1\) and that this is in fact the irreducible representation of degree \(\ell\) and parity \((-1)^\ell\).

Therefore, the irreps used as features in E3x are mathematically equivalent to symmetric traceless (pseudo)tensors of the corresponding degrees, meaning they transform in the same manner under rotations (and reflections), while being much more compact (requiring only \(2\ell+1\) instead of \(3^\ell\) numbers to store the same information). For this reason, all operations in E3x work with the \((2\ell+1)\)-sized representation of (pseudo)tensors. When predicting a tensorial quantity with a neural network, however, it is often convenient to convert the output features back to a \(3\times3\times\dots\times3\) array of numbers. E3x contains convenience functions that allow an easy back-and-forth transformation between the different representations. Let’s assume the output features predicted by our neural network are \(\mathbf{x} \in \mathbb{R}^{2\times (L+1)^2\times F}\) and we are interested in a specific tensor \(\mathbf{a} = \mathbf{x}_0^{(3_-)}\) (see here for a reminder on the notation used here and feature slicing).

We can extract \(\mathbf{a}\) with the following code snippet:

# Draw random features (in a real application,
# these would be the output of a neural network).
x = jax.random.normal(jax.random.PRNGKey(0), (2, 16, 4))

p = 1  # parity (0=even, 1=odd)
l = 3  # degree
f = 0  # feature index
irrep = x[p, l**2:(l+1)**2, f]  # a in irrep form
print(irrep)
[ 0.564  0.568 -1.476 -1.295 -1.506  0.429  0.768]

To convert \(\mathbf{a}\) to the corresponding traceless symmetric \(3 \times 3 \times 3\) tensor, we can use

tensor = e3x.so3.irreps_to_tensor(irrep, degree=l)  # a in tensor form
print(tensor)
[[[ 0.547  0.145 -0.535]
  [ 0.145 -0.055 -0.334]
  [-0.535 -0.334 -0.492]]

 [[ 0.145 -0.055 -0.334]
  [-0.055 -0.285  0.228]
  [-0.334  0.228  0.14 ]]

 [[-0.535 -0.334 -0.492]
  [-0.334  0.228  0.14 ]
  [-0.492  0.14   0.307]]]

To go back to the original irrep representation, we can use

irrep_from_tensor = e3x.so3.tensor_to_irreps(tensor, degree=l)
print(irrep_from_tensor)
[ 0.564  0.568 -1.476 -1.295 -1.506  0.429  0.768]

For some applications (e.g., predicting multipole moments), this simple conversion may already be sufficient.

General tensors of degree \(2\)

However, we may be interested in general tensors, not only traceless symmetric ones. For this, it is helpful to think of the desired tensor of degree \(\ell\) as a repeated tensor product of \(\ell\) (three-dimensional) irreps \(\mathbb{1}\). For example, for a general tensor of degree \(2\), we have (see also coupling of irreps)

\[\mathbb{1} \otimes \mathbb{1} = \mathbb{0} \oplus \mathbb{1} \oplus \mathbb{2}\,,\]

meaning that a general \(3 \times 3\) tensor can be constructed from irreps \(\mathbb{0}\), \(\mathbb{1}\), and \(\mathbb{2}\). First, we collect these irreps from the features. We have to make sure that the parity of the irreps is even/odd when the degree of the (proper) tensor we want to predict is even/odd, otherwise, we would not construct a tensor, but a pseudotensor. For degree \(2\), this means we need irreps with even parity.

tensor_components = []
for l in (0, 1, 2):
  tensor_components.append(x[0, l**2:(l+1)**2, f])
  print(f'l={l}\n', tensor_components[-1], '\n')
l=0
 [-1.304] 

l=1
 [ 0.916 -0.36  -1.243] 

l=2
 [ 0.177  0.72   0.141  1.119 -1.189] 

So far so good, but we’d like to have a \(3 \times 3\) output. To convert the irreps to the correct shape, we can use Clebsch-Gordan coefficients.

cg = e3x.so3.clebsch_gordan(1, 1, 2)
for l in range(3):
  tensor_components[l] = jnp.einsum('...n,lmn->lm',
      tensor_components[l],
      cg[1:4, 1:4, l**2:(l+1)**2]
  )
  print(f'l={l}\n', tensor_components[l], '\n')
l=0
 [[-0.753 -0.    -0.   ]
 [-0.    -0.753 -0.   ]
 [-0.    -0.    -0.753]] 

l=1
 [[ 0.    -0.879  0.255]
 [ 0.879  0.     0.648]
 [-0.255 -0.648  0.   ]] 

l=2
 [[ 0.61   0.509  0.1  ]
 [ 0.509  0.36   0.791]
 [ 0.1    0.791 -0.971]] 

In this form, we can clearly see that the irreps \(\mathbb{0}\), \(\mathbb{1}\), and \(\mathbb{2}\) contribute the trace, antisymmetric component, and symmetric traceless component. The final \(3 \times 3\) tensor is simply the sum of all these components.

tensor = sum(tensor_components)
print(tensor)
[[-0.142 -0.37   0.354]
 [ 1.388 -0.393  1.439]
 [-0.155  0.143 -1.724]]

Depending on what is known about the quantity of interest, it may be even better to remove certain irreps. For example, if we know that the \(3 \times 3\) tensor we want to predict is symmetric, we could simply remove the contribution from \(\mathbb{1}\) to always receive symmetric tensors. See also here for a full worked out example of predicting tensorial quantities using this recipe.

General tensors of higher degrees

The general method described above also works for tensors of higher degrees, but more irreps are necessary and the conversion with Clebsch-Gordan coefficients is slightly more complicated. For example, for a degree \(3\) tensor, we have:

\[\begin{split}\mathbb{1} \otimes \mathbb{1} \otimes \mathbb{1} &= (\mathbb{1} \otimes \mathbb{1}) \otimes \mathbb{1} \\ &= (\mathbb{0} \oplus \mathbb{1} \oplus \mathbb{2}) \otimes \mathbb{1} \\ &= (\mathbb{0} \otimes \mathbb{1}) \oplus (\mathbb{1} \otimes \mathbb{1}) \oplus (\mathbb{2} \otimes \mathbb{1}) \\ &= (\mathbb{1}) \oplus (\mathbb{0} \oplus \mathbb{1} \oplus \mathbb{2}) \oplus (\mathbb{1} \oplus \mathbb{2} \oplus \mathbb{3}) \\ &= \mathbb{0} \oplus \mathbb{1} \oplus \mathbb{1} \oplus \mathbb{1} \oplus \mathbb{2} \oplus \mathbb{2} \oplus \mathbb{3}\end{split}\]

This means that in total, we need the following seven irreps to represent an arbitrary \(3\times3\times3\) tensor: \(1 \times \mathbb{0}\), \(3 \times \mathbb{1}\), \(2 \times \mathbb{2}\), and \(1 \times \mathbb{3}\) (as a sanity check, it can be helpful to confirm that this corresponds to \(3\times3\times3=27\) individual numbers). To convert the irreps to the desired \(3\times3\times3\) shape, it is necessary to remember which “coupling path” (see above) led to it (differences between coupling paths are underlined). We have:

irreps of \(3\times3\times3\) tensor

#

irrep

coupling path

1

\(\mathbb{0}\)

\(((\mathbb{1}\otimes\mathbb{1})\rightarrow\underline{\mathbb{1}})\otimes\mathbb{1}\rightarrow\underline{\mathbb{0}}\)

2

\(\mathbb{1}\)

\(((\mathbb{1}\otimes\mathbb{1})\rightarrow\underline{\mathbb{0}})\otimes\mathbb{1}\rightarrow\underline{\mathbb{1}}\)

3

\(\mathbb{1}\)

\(((\mathbb{1}\otimes\mathbb{1})\rightarrow\underline{\mathbb{1}})\otimes\mathbb{1}\rightarrow\underline{\mathbb{1}}\)

4

\(\mathbb{1}\)

\(((\mathbb{1}\otimes\mathbb{1})\rightarrow\underline{\mathbb{2}})\otimes\mathbb{1}\rightarrow\underline{\mathbb{1}}\)

5

\(\mathbb{2}\)

\(((\mathbb{1}\otimes\mathbb{1})\rightarrow\underline{\mathbb{1}})\otimes\mathbb{1}\rightarrow\underline{\mathbb{2}}\)

6

\(\mathbb{2}\)

\(((\mathbb{1}\otimes\mathbb{1})\rightarrow\underline{\mathbb{2}})\otimes\mathbb{1}\rightarrow\underline{\mathbb{2}}\)

7

\(\mathbb{3}\)

\(((\mathbb{1}\otimes\mathbb{1})\rightarrow\underline{\mathbb{2}})\otimes\mathbb{1}\rightarrow\underline{\mathbb{3}}\)

Let’s first collect the seven irreps we need (different feature indices are used for irreps with the same degree, because they should be independent). Since the tensor we want to predict has degree \(3\) (odd), we also want irreps with odd parity (irreps with even parity would give us a pseudotensor).

tensor_components = []
for l, f in [(0, 0), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (3, 0)]:
  tensor_components.append(x[1, l**2:(l+1)**2, f])
  print(f'l={l} (feature channel {f})\n', tensor_components[-1], '\n')
l=0 (feature channel 0)
 [0.153] 

l=1 (feature channel 0)
 [0.685 0.468 0.624] 

l=1 (feature channel 1)
 [0.006 0.774 0.732] 

l=1 (feature channel 2)
 [0.671 0.694 1.755] 

l=2 (feature channel 0)
 [ 0.552 -1.03  -0.933 -2.26   0.121] 

l=2 (feature channel 1)
 [-0.266 -1.287  0.564  0.347  1.38 ] 

l=3 (feature channel 0)
 [ 0.564  0.568 -1.476 -1.295 -1.506  0.429  0.768] 

Now, the irreps can be converted to shape \(3\times3\times3\) based on their coupling path (see table above):

coupling_paths = [
  (1, 0), #1
  (0, 1), #2
  (1, 1), #3
  (2, 1), #4
  (1, 2), #5
  (2, 2), #6
  (2, 3), #7
]

cg = e3x.so3.clebsch_gordan(2, 1, 3)
for i, (l1, l2) in enumerate(coupling_paths):
  tensor_components[i] = jnp.einsum('...p,lmn,nop->...lmo',
      tensor_components[i],
      cg[1:4, 1:4, l1**2:(l1+1)**2],
      cg[l1**2:(l1+1)**2, 1:4, l2**2:(l2+1)**2]
  )
  print(f'irrep #{i} (l={l2})\n', tensor_components[i], '\n')
irrep #0 (l=0)
 [[[ 0.     0.     0.   ]
  [ 0.     0.     0.063]
  [ 0.    -0.063  0.   ]]

 [[ 0.     0.    -0.063]
  [ 0.     0.     0.   ]
  [ 0.063  0.     0.   ]]

 [[ 0.     0.063  0.   ]
  [-0.063  0.     0.   ]
  [ 0.     0.     0.   ]]] 

irrep #1 (l=1)
 [[[0.395 0.27  0.36 ]
  [0.    0.    0.   ]
  [0.    0.    0.   ]]

 [[0.    0.    0.   ]
  [0.395 0.27  0.36 ]
  [0.    0.    0.   ]]

 [[0.    0.    0.   ]
  [0.    0.    0.   ]
  [0.395 0.27  0.36 ]]] 

irrep #2 (l=1)
 [[[ 0.     0.     0.   ]
  [ 0.387 -0.003  0.   ]
  [ 0.366  0.    -0.003]]

 [[-0.387  0.003  0.   ]
  [ 0.     0.     0.   ]
  [ 0.     0.366 -0.387]]

 [[-0.366  0.     0.003]
  [ 0.    -0.366  0.387]
  [ 0.     0.     0.   ]]] 

irrep #3 (l=1)
 [[[ 0.346 -0.179 -0.453]
  [ 0.269  0.26   0.   ]
  [ 0.68   0.     0.26 ]]

 [[ 0.269  0.26   0.   ]
  [-0.173  0.358 -0.453]
  [ 0.     0.68   0.269]]

 [[ 0.68   0.     0.26 ]
  [ 0.     0.68   0.269]
  [-0.173 -0.179  0.906]]] 

irrep #4 (l=2)
 [[[ 0.     0.     0.   ]
  [-0.467 -1.13   0.07 ]
  [ 0.515  0.311  1.13 ]]

 [[ 0.467  1.13  -0.07 ]
  [ 0.     0.     0.   ]
  [ 0.241 -0.515 -0.467]]

 [[-0.515 -0.311 -1.13 ]
  [-0.241  0.515  0.467]
  [ 0.     0.     0.   ]]] 

irrep #5 (l=2)
 [[[ 0.     0.326  0.743]
  [-0.163  0.1   -0.154]
  [-0.372  0.767 -0.1  ]]

 [[-0.163  0.1   -0.154]
  [-0.2    0.    -0.743]
  [-0.613  0.372  0.163]]

 [[-0.372  0.767 -0.1  ]
  [-0.613  0.372  0.163]
  [ 0.2   -0.326  0.   ]]] 

irrep #6 (l=3)
 [[[ 0.865  0.228 -0.846]
  [ 0.228 -0.087 -0.529]
  [-0.846 -0.529 -0.778]]

 [[ 0.228 -0.087 -0.529]
  [-0.087 -0.45   0.36 ]
  [-0.529  0.36   0.221]]

 [[-0.846 -0.529 -0.778]
  [-0.529  0.36   0.221]
  [-0.778  0.221  0.486]]] 

Again, the different irreps correspond to different contributions to the final \(3\times3\times3\), which is obtained by summing over all contributions.

tensor = sum(tensor_components)
print(tensor)
[[[ 1.607  0.645 -0.195]
  [ 0.254 -0.86  -0.55 ]
  [ 0.343  0.486  0.509]]

 [[ 0.414  1.406 -0.815]
  [-0.066  0.179 -0.476]
  [-0.838  1.262 -0.2  ]]

 [[-1.419 -0.01  -1.746]
  [-1.445  1.56   1.506]
  [-0.355 -0.013  1.752]]]

Tensors with even higher degrees can be constructed analogously.