Linear Algebra

Scalars and Vectors

Scalar: A single number representing magnitude only.
Vector: An ordered array of scalars representing both magnitude and direction.
- Row Vector: $\mathbf{x} = [x_1, x_2, \dots, x_n]$
- Column Vector: $\mathbf{x} = \begin{bmatrix} x_1 \ x_2 \ \vdots \ x_n \end{bmatrix}$

Vector Operations

Transpose: Converts a column vector into a row vector. $\mathbf{x}^T = [x_1, x_2, \dots, x_n]$
Addition: Element-wise addition of two vectors. $\mathbf{x} + \mathbf{y} = [x_1+y_1, x_2+y_2, \dots, x_n+y_n]^T$
Scalar Multiplication: Multiplying every element of the vector by a scalar. $k\mathbf{x} = [kx_1, kx_2, \dots, kx_n]^T$
Inner Product (Dot Product): The sum of the products of corresponding elements; results in a scalar. $\langle \mathbf{x}, \mathbf{y} \rangle = \mathbf{x} \cdot \mathbf{y} = \mathbf{x}^T \mathbf{y} = \sum_{i=1}^n x_i y_i$ The angle $\theta$ between two vectors is expressed as: $\cos \theta = \frac{\mathbf{x}^T \mathbf{y}}{\|\mathbf{x}\| \|\mathbf{y}\|}$

Vector Norms

A norm is a function that represents the “length” of a vector.

$L_0$ Norm: The number of non-zero elements in a vector. $\|\mathbf{x}\|_0 = \#\{i: x_i \neq 0\}$
$L_1$ Norm (Manhattan Norm): Sum of the absolute values of the elements. $\|\mathbf{x}\|_1 = \sum_{i=1}^n |x_i|$
$L_2$ Norm (Euclidean Norm): The square root of the sum of the squares of the elements. $\|\mathbf{x}\|_2 = \sqrt{\sum_{i=1}^n x_i^2}$
$L_p$ Norm: $\|\mathbf{x}\|_p = \left( \sum_{i=1}^n |x_i|^p \right)^{1/p}$

Matrices and Tensors

An $m \times n$ matrix is a rectangular array with $m$ rows and $n$ columns. $\mathbb{R}^{m \times n}$ denotes the space of all real-valued $m \times n$ matrices.

Square Matrix: A matrix where the number of rows equals the number of columns.
Diagonal Matrix: A square matrix where all elements outside the main diagonal are zero.
Identity Matrix ($I$): A diagonal matrix where all diagonal elements are 1.

Matrix Multiplication

Multiplication of $A$ and $B$ is defined only if the number of columns in $A$ equals the number of rows in $B$. If $C = AB$:

$C_{ij} = \sum_{k} A_{ik} B_{kj}$

Properties:
- Associative: $(AB)C = A(BC)$
- Left Distributive: $A(B+C) = AB + AC$
- Right Distributive: $(B+C)A = BA + CA$
- Non-commutative: $AB \neq BA$ (generally).

Matrix Transpose

The transpose $A^T$ of an $m \times n$ matrix is an $n \times m$ matrix where $(A^T){ij} = A{ji}$.

Properties:
- $(A^T)^T = A$
- $(A+B)^T = A^T + B^T$
- $(AB)^T = B^T A^T$
- $(kA)^T = k A^T$

Matrix Inverse

For a square matrix $A$, if there exists a matrix $B$ such that $AB = BA = I$, then $B$ is the inverse, denoted as $A^{-1}$.

Other Matrix Operations

Vectorization ($\text{vec}$): Rearranging matrix elements column-wise into a single column vector.
Matrix Inner Product: $\langle A, B \rangle$ = $\sum_{i,j} A_{ij} B_{ij} = \text{tr}(A^T B)$ .
Hadamard Product ($\odot$): Element-wise multiplication of two matrices of the same dimension.
Kronecker Product ($\otimes$): Each element of $A$ is multiplied by the entire matrix $B$.

Tensors

A tensor is a multi-dimensional array, generalizing scalars (0D), vectors (1D), and matrices (2D) to $n$-dimensions.

Matrix Calculus

Common Derivatives

$\frac{\partial (\mathbf{a}^T \mathbf{x})}{\partial \mathbf{x}} = \mathbf{a}$
$\frac{\partial (\mathbf{x}^T A \mathbf{x})}{\partial \mathbf{x}} = (A + A^T)\mathbf{x}$
$\frac{\partial \text{tr}(AX)}{\partial X} = A^T$
$\frac{\partial \text{tr}(X^T A X)}{\partial X} = (A + A^T)X$

Jacobian and Gradient Matrices

Jacobian Matrix: For a function $\mathbf{f}: \mathbb{R}^n \to \mathbb{R}^m$, the Jacobian $J$ is an $m \times n$ matrix of first-order partial derivatives.
Gradient Matrix: For a scalar function $f(X)$, the gradient $\nabla_X f$ is the transpose of the Jacobian matrix: $\nabla_X f = (\frac{\partial f}{\partial X})^T$.
Hessian Matrix: A square matrix of second-order partial derivatives of a scalar-valued function.

Matrix Differentials and Trace

Trace Properties:
- $\text{tr}(A) = \sum A_{ii}$
- $\text{tr}(ABC) = \text{tr}(BCA) = \text{tr}(CAB)$ (Cyclic property)
Differential Rules:
- $d(A \pm B) = dA \pm dB$
- $d(AB) = (dA)B + A(dB)$
- $d(A^T) = (dA)^T$
- $d(\text{tr}(X)) = \text{tr}(dX)$
- $d(X^{-1}) = -X^{-1}(dX)X^{-1}$

Solving via Differentials

The relationship between the differential and the Jacobian for a scalar function $f(X)$ is:

$df = \text{tr}\left( \left( \frac{\partial f}{\partial X} \right)^T dX \right)$

By expanding the differential $df$ and rearranging it into the trace form $\text{tr}(G^T dX)$, the matrix $G$ is identified as the gradient.