1 Introduction
For parameter estimation, if the cost function is quadratic, it is easy to find the solution by setting its derivative to zero. For example, suppose we want to estimate \(y_i\) from input \(\boldsymbol{x}_i\), where \(\boldsymbol{x}_i\) is a row vector. Suppose that from our understanding to the problem, we believe linear least square is a good estimator. That is,
where row vector \(\boldsymbol{w}\) is the coefficients we want to estimate, and the object function is to minimize the sum of squared residues.
where \(\boldsymbol{Y}=[y_0, y_1, \cdots, y_{N-1}]^T\), \(\boldsymbol{X} = [\boldsymbol{x}_0; \boldsymbol{x}_1; \cdots; \boldsymbol{x}_{N-1}]^T\).
Eq. (\ref{eq:ls}) is quadratic. Its minimum can be found by setting its derivative to zero
That is
Thus
However, there are many problem whose cost function is non quadratic. Thus, the above linear least square method can not be applied directly. One workaround is to use Taylor expansion to linearize the object function. Imaging there are 4 sensors inside a room (\(m=[0, 1, 2, 3]\). The location of each sensor is known in advance (i.e., \(\boldsymbol{v}_m = [x_m, y_m]\)). The sensor can measure its distance (i.e., \(r_m\)) to an device, whose location is unknown and needs to be estimated (\(\boldsymbol{\theta} = [x, y]\)). So the distance can be written as
where \(n_m\) is noise or measurement error. Apparently, unlike Eq. (\ref{eq-ls}), it is not a linear function. Recall Taylor expansion
Similarly, Eq. (\ref{eq-distance}) can be written as
Now we have an approximate linear equation, Eq. (\ref{eqn:ls}) can be used to estimate the target position, i.e., \(\boldsymbol{\hat{\theta}}\). Then we can use Taylor expansion to get the approximate linear equation around \(\boldsymbol{\hat{\theta}}\). We can keep the above procedure to approach the target position. However, the above procedure does not guarantee to converge. If the initial position is far away from the actual position, it may diverge.
2 SMACOF
SMACOF (Scaling by MAjorizing a COmplicated Function) uses a smart way to approach the (local) minimal of the non-quadratic cost function with a quadratic surrogate function.
Imaging we have a cost function \(f(\theta)\) to minimize (Fig. 1), which is not quadratic.
As show in Fig. 2, the first step is to get the initial guess \(\theta_0\).
From the initial guess, we construct the quadratic surrogate function (Fig. 3) with initial guess \(\theta_0\), such that
- \(g(\theta, \theta_0) \geq f(\theta) \), \(\forall \theta\),
- \(g(\theta_0, \theta_0) = f(\theta_0)\).
Since \(g(\theta, \theta_0)\) is quadratic, its minimum can be calculated by setting its derivative to zero (i.e., \(\theta_1\)).
It is easy to see that for the original cost function \(f(\theta)\), \(\theta_1\) is a better estimation than \(\theta_0\); that is
Now, we are back to the similar situation as the initial condition shown in Fig. 2. And fortunately, we are close to the (local) optimal solution now.
Same as Fig. 3, we construct a surrogate function with \(\theta_1\), that is \(g(\theta, \theta_1)\), such that
- \(g(\theta, \theta_1) \geq f(\theta)\), \(\forall \theta\)
- \(g(\theta_1, \theta_1) = f(\theta_1)\)
Similarly, set the derivative of \(g(\theta, \theta_1)\) to zero to find its minimum \(\theta_2\). It is easy to see that \(f(\theta_2) \leq f(\theta_1)\).
Thus an iterative algorithm can be applied to approach the (local) minimal of the cost function:
initial theta[0] while not converged: theta[n] = argmin(g(theta, theta[n-1]))
And after each iteration, the cost function will guarantee to decrease (not increase).
The only problem is to find a quadratic surrogate function \(g(\theta, \theta_0)\) for arbitrary \(\theta_0\), such that
- \(g(\theta, \theta_0) \geq f(\theta)\), \(\forall \theta\),
- \(g(\theta_0, \theta_0) = f(\theta_0)\).
For example, for the above position estimation problem, the cost function is
where \(d_m(\boldsymbol{\theta})=\left\Vert\boldsymbol{\theta}-v_m\right\Vert\).
Let's look at one item in the cost function
\(r_m^2\) is a constant since it is independent of \(\boldsymbol{\theta{}}\), \(d_m(\boldsymbol{\theta})\) is a quadratic function of \(\boldsymbol{\theta{}}\), and \(2r_md_m(\boldsymbol{\theta})\) is a function of \(\boldsymbol{\theta{}}\), but not quadratic.
So if some proper quadratic surrogate functions can be found for \(2r_md_m(\boldsymbol{\theta})\), then the problem can be solved via iteration as shown above. Since \(r_m\) is the range measurement (observation) between the target and the \(m\)th anchor, \(r_m\geq0\). According to Cauchy-Schwarz inequality
where \(\langle a, b\rangle\) denotes the inner product of vectors \(a\) and \(b\) (assume \(\boldsymbol{\theta}\) will not overlap with \(v_m\)). Therefore,
Now, we have everything to define the surrogate function
It is easy to verify that
Similarly, we can construct the quadratic function for each anchor (\(m=1\dots M\)). Then the overall surrogate function is
Obviously, function \(g(\boldsymbol{\theta}, \boldsymbol{\theta}_0)\) is a quadratic function. Its minimum can be reached by setting its first derivative to zero.