SMACOF: A Tutorial

1 Introduction

For parameter estimation, if the cost function is quadratic, it is easy to find the solution by setting its derivative to zero. For example, suppose we want to estimate \(y_i\) from input \(\boldsymbol{x}_i\), where \(\boldsymbol{x}_i\) is a row vector. Suppose that from our understanding to the problem, we believe linear least square is a good estimator. That is,

$$ \hat{y}_i = \boldsymbol{x}_i^T \boldsymbol{w}, \label{eq-ls} $$

where row vector \(\boldsymbol{w}\) is the coefficients we want to estimate, and the object function is to minimize the sum of squared residues.

$$ \begin{align} f(\boldsymbol{w}) &= \sum_i{(y_i-\hat{y_i})^2} \nonumber \\ &= \sum_i{(y_i - \boldsymbol{x}_i^T \boldsymbol{w})^2} \nonumber \\ &= (\boldsymbol{Y} - \boldsymbol{X}\boldsymbol{w})^T(\boldsymbol{Y} - \boldsymbol{X}\boldsymbol{w}), \label{eq:ls} \end{align} $$

where \(\boldsymbol{Y}=[y_0, y_1, \cdots, y_{N-1}]^T\), \(\boldsymbol{X} = [\boldsymbol{x}_0; \boldsymbol{x}_1; \cdots; \boldsymbol{x}_{N-1}]^T\).

Eq. (\ref{eq:ls}) is quadratic. Its minimum can be found by setting its derivative to zero

$$ \frac{\partial f(\boldsymbol{w})}{\partial \boldsymbol{w}} = 0. $$

That is

$$ \boldsymbol{X}^T(\boldsymbol{Y} - \boldsymbol{X}\boldsymbol{w}) = 0. $$


$$ \begin{align} \hat{\boldsymbol{w}} = (\boldsymbol{X}^T\boldsymbol{X})^{-1}\boldsymbol{X}^T\boldsymbol{Y}. \label{eqn:ls} \end{align} $$

However, there are many problem whose cost function is non quadratic. Thus, the above linear least square method can not be applied directly. One workaround is to use Taylor expansion to linearize the object function. Imaging there are 4 sensors inside a room (\(m=[0, 1, 2, 3]\). The location of each sensor is known in advance (i.e., \(\boldsymbol{v}_m = [x_m, y_m]\)). The sensor can measure its distance (i.e., \(r_m\)) to an device, whose location is unknown and needs to be estimated (\(\boldsymbol{\theta} = [x, y]\)). So the distance can be written as

$$ \begin{align} r_m &= \sqrt{(x-x_m)^2 + (y-y_m)^2} + n_m\nonumber \\ &= \left\Vert\boldsymbol{\theta} - \boldsymbol{v}_m\right\Vert + n_m, \label{eq-distance} \end{align} $$

where \(n_m\) is noise or measurement error. Apparently, unlike Eq. (\ref{eq-ls}), it is not a linear function. Recall Taylor expansion

$$ \begin{align} f(x) \approx f(x_0) + (x-x_0)f^\prime(x_0). \end{align} $$

Similarly, Eq. (\ref{eq-distance}) can be written as

$$ \begin{align} r_m \approx \left\Vert\boldsymbol{\theta}_0 - \boldsymbol{v}_m\right\Vert + \frac{\left\langle \boldsymbol{\theta} - \boldsymbol{\theta}_0, \boldsymbol{\theta}_0 - \boldsymbol{v}_m\right\rangle}{\left\Vert\boldsymbol{\theta}_0 - \boldsymbol{v}_m\right\Vert}. \end{align} $$

Now we have an approximate linear equation, Eq. (\ref{eqn:ls}) can be used to estimate the target position, i.e., \(\boldsymbol{\hat{\theta}}\). Then we can use Taylor expansion to get the approximate linear equation around \(\boldsymbol{\hat{\theta}}\). We can keep the above procedure to approach the target position. However, the above procedure does not guarantee to converge. If the initial position is far away from the actual position, it may diverge.


SMACOF (Scaling by MAjorizing a COmplicated Function) uses a smart way to approach the (local) minimal of the non-quadratic cost function with a quadratic surrogate function.

Imaging we have a cost function \(f(\theta)\) to minimize (Fig. 1), which is not quadratic.

Fig.1. Cost function \(f(\theta)\)

As show in Fig. 2, the first step is to get the initial guess \(\theta_0\).

Fig.2. Initial guess \(\theta_0\)

From the initial guess, we construct the quadratic surrogate function (Fig. 3) with initial guess \(\theta_0\), such that

  • \(g(\theta, \theta_0) \geq f(\theta) \), \(\forall \theta\),
  • \(g(\theta_0, \theta_0) = f(\theta_0)\).
Fig.3. Quadratic surrogate function \(g(\theta, \theta_0)\)

Since \(g(\theta, \theta_0)\) is quadratic, its minimum can be calculated by setting its derivative to zero (i.e., \(\theta_1\)).

Fig.4. Calculate the minimum of quadratic surrogate function \(g(\theta, \theta_0)\)

It is easy to see that for the original cost function \(f(\theta)\), \(\theta_1\) is a better estimation than \(\theta_0\); that is

$$ \begin{align} f(\theta_1) &\leq g(\theta_1, \theta_0) \nonumber \\ &\leq g(\theta_0, \theta_0) \nonumber \\ &= f(\theta_0). \end{align} $$

Now, we are back to the similar situation as the initial condition shown in Fig. 2. And fortunately, we are close to the (local) optimal solution now.

Fig.5. Cost function \(f(\theta)\), and the new guess \(\theta_1\)

Same as Fig. 3, we construct a surrogate function with \(\theta_1\), that is \(g(\theta, \theta_1)\), such that

  • \(g(\theta, \theta_1) \geq f(\theta)\), \(\forall \theta\)
  • \(g(\theta_1, \theta_1) = f(\theta_1)\)
Fig.6. Quadratic surrogate function \(g(\theta, \theta_1)\)

Similarly, set the derivative of \(g(\theta, \theta_1)\) to zero to find its minimum \(\theta_2\). It is easy to see that \(f(\theta_2) \leq f(\theta_1)\).

Fig.7. Calculate the minimum of quadratic surrogate function \(g(\theta, \theta_1)\)

Thus an iterative algorithm can be applied to approach the (local) minimal of the cost function:

initial theta[0]
while not converged:
    theta[n] = argmin(g(theta, theta[n-1]))

And after each iteration, the cost function will guarantee to decrease (not increase).

The only problem is to find a quadratic surrogate function \(g(\theta, \theta_0)\) for arbitrary \(\theta_0\), such that

  1. \(g(\theta, \theta_0) \geq f(\theta)\), \(\forall \theta\),
  2. \(g(\theta_0, \theta_0) = f(\theta_0)\).

For example, for the above position estimation problem, the cost function is

$$ \begin{align} f(\boldsymbol{\theta})=\sum_{m=1}^M{\left(r_m-d_m(\boldsymbol{\theta})\right)^2}, \end{align} $$

where \(d_m(\boldsymbol{\theta})=\left\Vert\boldsymbol{\theta}-v_m\right\Vert\).

Let's look at one item in the cost function

$$ \begin{align} f_m(\boldsymbol{\theta}) &= \left(r_m-d_m(\boldsymbol{\theta})\right)^2 \nonumber \\ &= r_m^2 + d_m(\boldsymbol{\theta})^2 - 2r_m d_m(\boldsymbol{\theta}), \end{align} $$

\(r_m^2\) is a constant since it is independent of \(\boldsymbol{\theta{}}\), \(d_m(\boldsymbol{\theta})\) is a quadratic function of \(\boldsymbol{\theta{}}\), and \(2r_md_m(\boldsymbol{\theta})\) is a function of \(\boldsymbol{\theta{}}\), but not quadratic.

So if some proper quadratic surrogate functions can be found for \(2r_md_m(\boldsymbol{\theta})\), then the problem can be solved via iteration as shown above. Since \(r_m\) is the range measurement (observation) between the target and the \(m\)th anchor, \(r_m\geq0\). According to Cauchy-Schwarz inequality

$$ \begin{align} \left\Vert\boldsymbol{\theta}-v_m \right\Vert \geq \frac{\langle \boldsymbol{\theta}-v_m, \boldsymbol{\theta}_0-v_m \rangle}{\left\Vert\boldsymbol{\theta}_0-v_m\right\Vert}, \end{align} $$

where \(\langle a, b\rangle\) denotes the inner product of vectors \(a\) and \(b\) (assume \(\boldsymbol{\theta}\) will not overlap with \(v_m\)). Therefore,

$$ \begin{align} r_md_m(\boldsymbol{\theta})&\geq r_m\frac{\langle \boldsymbol{\theta}-v_m, \boldsymbol{\theta_0}-v_m \rangle}{\left\Vert\boldsymbol{\theta}_0-v_m\right\Vert}, \end{align} $$

Now, we have everything to define the surrogate function

$$ \begin{align} g_m(\boldsymbol{\theta}, \boldsymbol{\theta}_0) = r_m^2 + d_m(\boldsymbol{\theta})^2 - 2r_m\frac{\langle \boldsymbol{\theta}-v_m, \boldsymbol{\theta}_0-v_m \rangle}{\left\Vert\boldsymbol{\theta}_0-v_m\right\Vert}. \end{align} $$

It is easy to verify that

$$ \begin{align} f_m(\boldsymbol{\theta})&\leq g_m(\boldsymbol{\theta}, \boldsymbol{\theta}_0)\\ f_m(\boldsymbol{\theta}_0) &= g_m(\boldsymbol{\theta}_0, \boldsymbol{\theta}_0). \end{align} $$

Similarly, we can construct the quadratic function for each anchor (\(m=1\dots M\)). Then the overall surrogate function is

$$ \begin{align} g(\boldsymbol{\theta}, \boldsymbol{\theta}_0) = \sum_{m=1}^M{g_m(\boldsymbol{\theta}, \boldsymbol{\theta}_0)}. \end{align} $$

Obviously, function \(g(\boldsymbol{\theta}, \boldsymbol{\theta}_0)\) is a quadratic function. Its minimum can be reached by setting its first derivative to zero.