Jekyll2023-06-09T15:10:15+00:00http://gtsam.org/feed.xmlGTSAMGTSAM is a BSD-licensed C++ library that implements sensor fusion for robotics and computer vision using factor graphs.LOST in Triangulation2023-02-04T00:00:00+00:002023-02-04T00:00:00+00:00http://gtsam.org/2023/02/04/lost-triangulation<link rel="stylesheet" href="/assets/css/slideshow.css" />
<p>Authors: <a href="https://akshay-krishnan.github.io">Akshay Krishnan</a>, <a href="https://seal.ae.gatech.edu/node/14">Sebastien Henry</a>, <a href="https://dellaert.github.io/">Frank Dellaert</a>, <a href="https://seal.ae.gatech.edu/node/1">John Christian</a></p>
<div style="display:none"> <!-- custom latex commands here -->
$
\usepackage{amsmath}
\usepackage[makeroom]{cancel}
$
</div>
<style>
.MJXc-display {
overflow-x: auto;
overflow-y: hidden;
scrollbar-width: none; /* Firefox */
-ms-overflow-style: none; /* Internet Explorer 10+ */
}
.MJXc-display::-webkit-scrollbar {
width: 5px;
height: 2px;
}
.MJXc-display::-webkit-scrollbar-track {
background: transparent;
}
.MJXc-display::-webkit-scrollbar-thumb {
background: #ddd;
visibility:hidden;
}
.MJXc-display:hover::-webkit-scrollbar-thumb {
visibility:visible;
}
</style>
<p><!-- horizontal scrolling --></p>
<!-- - TOC -->
<p><br /></p>
<p>This post introduces the triangulation problem and some commonly used solutions (both optimal and sub-optimal), along with GTSAM code examples. We also discuss a recently proposed linear optimal triangulation method (LOST) and compare its performance to other methods.</p>
<p><em>TL;DR: LOST can be easily enabled in GTSAM by setting <code class="language-plaintext highlighter-rouge">useLOST=True</code> in <code class="language-plaintext highlighter-rouge">triangulatePoint3</code> to get state-of-the-art linear triangulation!</em></p>
<h2 id="the-triangulation-problem">The triangulation problem</h2>
<p>Triangulation is the problem of estimating the location of a 3D point from multiple direction measurements. The problem is commonly encountered in the domains of 3D reconstruction, robot localization, and spacecraft navigation, amongst others. The “direction measurements” are usually obtained from 2D observations of the same 3D point in different cameras.</p>
<h2 id="a-simple-solution-the-direct-linear-transform">A simple solution: the Direct Linear Transform</h2>
<p>A commonly used method for triangulation is the <strong>Direct Linear Transform (DLT)</strong>, which poses the estimation of the 3D point as a least squares problem. Given the position of camera $i$ in the world frame as $\mathbf{^wp_i}$ and the 2D measurement (in homogeneous coordinates) of the point in this camera as $\mathbf{x_i}$, it uses the following constraint:</p>
\[\begin{equation}
\mathbf{x_i} \times \mathbf{R^i_w} (\mathbf{^wr} - \mathbf{^wp_i}) = \mathbf{0}
\label{eq:dlt-constraint}
\end{equation}\]
\[\text{i.e } [\mathbf{x_i} \times] \mathbf{R^i_w} (\mathbf{^wr} - \mathbf{^wp_i}) = \mathbf{0}\]
<p>Here:</p>
<ul>
<li>$\mathbf{^wr}$ is the position of the 3D point in the world frame, which is to be estimated</li>
<li>$[\mathbf{x_i} \times]$ is a skew-symmetric matrix with elements $\mathbf{x_i}$, such that $[\mathbf{x_i} \times] \ \mathbf{b} = \mathbf{x_i} \times \mathbf{b}$</li>
<li>$\mathbf{R^i_w} \in SO(3)$ is the rotation from world to camera frame $i$</li>
</ul>
<p>(1) can be expressed in terms of the pixel measurements $\mathbf{u_i} = \mathbf{K_i} \mathbf{x_i}$ using the intrinsic calibration matrix of the camera $\mathbf{K_i}$:</p>
\[\begin{equation}
[\mathbf{K_i^{-1}} \mathbf{u_i} \times] \mathbf{R^i_w} (\mathbf{^wr} - \mathbf{^wp_i}) = \mathbf{0}
\end{equation}\]
<p>With at least 2 camera measurements, this can be expressed as a least-squares problem of the form $\mathbf{A x} = \mathbf{b}$ with a unique solution (except in some degenerate conditions). For $n$ measurements, this equation is:</p>
\[\begin{equation}
\left[\begin{array}{c}
\left[\mathbf{K_1^{-1}} \mathbf{u_1} \times\right] \mathbf{R^1_w} \cr \\
\left[\mathbf{K_2^{-1}} \mathbf{u_2} \times\right] \mathbf{R^2_w} \cr \\
... \cr \\
\left[\mathbf{K_n^{-1}} \mathbf{u_n} \times\right] \mathbf{R^n_w} \cr
\end{array}\right] \mathbf{^wr} =
\left[\begin{array}{c}
\left[\mathbf{K_1^{-1}} \mathbf{u_1} \times\right] \mathbf{R^1_w} \mathbf{^wp_1} \cr \\
\left[\mathbf{K_2^{-1}} \mathbf{u_2} \times\right] \mathbf{R^2_w} \mathbf{^wp_2} \cr \\
... \cr \\
\left[\mathbf{K_n^{-1}} \mathbf{u_n} \times\right] \mathbf{R^n_w} \mathbf{^wp_n} \cr
\end{array}\right]
\end{equation}\]
<p>DLT solves the above equation using standard least-squares techniques. It also has interesting connections to the trigonometric sine rule, as discussed in the <a href="https://doi.org/10.2514/1.G006989">LOST paper</a> (<a href="https://arxiv.org/pdf/2205.12197.pdf">Arxiv</a>).</p>
<p>The <code class="language-plaintext highlighter-rouge">triangulatePoint3</code> function in GTSAM provides a convenient way to triangulate points using DLT. Let’s look at an example of triangulating a point from noisy measurements.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">gtsam</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">from</span> <span class="nn">gtsam</span> <span class="kn">import</span> <span class="n">Pose3</span><span class="p">,</span> <span class="n">Rot3</span><span class="p">,</span> <span class="n">Point3</span>
<span class="c1"># Define parameters for 2 cameras, with shared intrinsics.
</span><span class="n">pose1</span> <span class="o">=</span> <span class="n">Pose3</span><span class="p">()</span>
<span class="n">pose2</span> <span class="o">=</span> <span class="n">Pose3</span><span class="p">(</span><span class="n">Rot3</span><span class="p">(),</span> <span class="n">Point3</span><span class="p">(</span><span class="mf">5.</span><span class="p">,</span> <span class="mf">0.</span><span class="p">,</span> <span class="o">-</span><span class="mf">5.</span><span class="p">))</span>
<span class="n">intrinsics</span> <span class="o">=</span> <span class="n">gtsam</span><span class="p">.</span><span class="n">Cal3_S2</span><span class="p">()</span>
<span class="n">camera1</span> <span class="o">=</span> <span class="n">gtsam</span><span class="p">.</span><span class="n">PinholeCameraCal3_S2</span><span class="p">(</span><span class="n">pose1</span><span class="p">,</span> <span class="n">intrinsics</span><span class="p">)</span>
<span class="n">camera2</span> <span class="o">=</span> <span class="n">gtsam</span><span class="p">.</span><span class="n">PinholeCameraCal3_S2</span><span class="p">(</span><span class="n">pose2</span><span class="p">,</span> <span class="n">intrinsics</span><span class="p">)</span>
<span class="n">cameras</span> <span class="o">=</span> <span class="n">gtsam</span><span class="p">.</span><span class="n">CameraSetCal3_S2</span><span class="p">([</span><span class="n">camera1</span><span class="p">,</span> <span class="n">camera2</span><span class="p">])</span>
<span class="c1"># Define a 3D point, generate measurements by projecting it to the
# cameras and adding some noise.
</span><span class="n">landmark</span> <span class="o">=</span> <span class="n">Point3</span><span class="p">(</span><span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.1</span><span class="p">,</span> <span class="mf">1.5</span><span class="p">)</span>
<span class="n">m1_noisy</span> <span class="o">=</span> <span class="n">cameras</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="n">project</span><span class="p">(</span><span class="n">landmark</span><span class="p">)</span> <span class="o">+</span> <span class="n">gtsam</span><span class="p">.</span><span class="n">Point2</span><span class="p">(</span><span class="mf">0.00817</span><span class="p">,</span> <span class="mf">0.00977</span><span class="p">)</span>
<span class="n">m2_noisy</span> <span class="o">=</span> <span class="n">cameras</span><span class="p">[</span><span class="mi">1</span><span class="p">].</span><span class="n">project</span><span class="p">(</span><span class="n">landmark</span><span class="p">)</span> <span class="o">+</span> <span class="n">gtsam</span><span class="p">.</span><span class="n">Point2</span><span class="p">(</span><span class="o">-</span><span class="mf">0.00610</span><span class="p">,</span> <span class="mf">0.01969</span><span class="p">)</span>
<span class="n">measurements</span> <span class="o">=</span> <span class="n">gtsam</span><span class="p">.</span><span class="n">Point2Vector</span><span class="p">([</span><span class="n">m1_noisy</span><span class="p">,</span> <span class="n">m2_noisy</span><span class="p">])</span>
<span class="c1"># Triangulate!
</span><span class="n">dlt_estimate</span> <span class="o">=</span> <span class="n">gtsam</span><span class="p">.</span><span class="n">triangulatePoint3</span><span class="p">(</span><span class="n">cameras</span><span class="p">,</span> <span class="n">measurements</span><span class="p">,</span> <span class="n">rank_tol</span><span class="o">=</span><span class="mf">1e-9</span><span class="p">,</span> <span class="n">optimize</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"DLT estimation error: {:.04f}"</span><span class="p">.</span><span class="nb">format</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">linalg</span><span class="p">.</span><span class="n">norm</span><span class="p">(</span><span class="n">dlt_estimate</span> <span class="o">-</span> <span class="n">landmark</span><span class="p">)))</span>
</code></pre></div></div>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>DLT estimation error: 0.0832
</code></pre></div></div>
<p>Here is an interactive visualization of the DLT estimate, the two camera frustums, and the back-projected rays. Note the difference between the ground truth and the triangulated point. We shall revisit this example below, and compare it against the result from other methods.</p>
<!-- ==================== Start of visualization HTML ========================== -->
<p><br /></p>
<div>
<script type="text/javascript">window.PlotlyConfig = {MathJaxConfig: 'local'};
</script>
<script src="https://cdn.plot.ly/plotly-2.8.3.min.js">
</script>
<div id="ff2ecfc1-2997-4c71-8bdd-a6d3dc27274d" class="plotly-graph-div" style="height:100%; width:100%;">
</div>
<script type="text/javascript">
window.PLOTLYENV=window.PLOTLYENV || {}; if (document.getElementById("ff2ecfc1-2997-4c71-8bdd-a6d3dc27274d")) { Plotly.newPlot( "ff2ecfc1-2997-4c71-8bdd-a6d3dc27274d", [{"legendgroup":"camera 1","mode":"lines","name":"camera 1","x":[0.0,-0.8,null,0.0,-0.8,null,0.0,0.8,null,0.0,0.8,null,-0.8,-0.8,null,-0.8,0.8,null,0.8,-0.8,null,0.8,0.8,null],"y":[0.0,-0.6,null,0.0,0.6,null,0.0,-0.6,null,0.0,0.6,null,-0.6,0.6,null,0.6,0.6,null,-0.6,-0.6,null,0.6,-0.6,null],"z":[0.0,1.0,null,0.0,1.0,null,0.0,1.0,null,0.0,1.0,null,1.0,1.0,null,1.0,1.0,null,1.0,1.0,null,1.0,1.0,null],"type":"scatter3d"},{"legendgroup":"camera 2","mode":"lines","name":"camera 2","x":[5.0,4.2,null,5.0,4.2,null,5.0,5.8,null,5.0,5.8,null,4.2,4.2,null,4.2,5.8,null,5.8,4.2,null,5.8,5.8,null],"y":[0.0,-0.6,null,0.0,0.6,null,0.0,-0.6,null,0.0,0.6,null,-0.6,0.6,null,0.6,0.6,null,-0.6,-0.6,null,0.6,-0.6,null],"z":[-5.0,-4.0,null,-5.0,-4.0,null,-5.0,-4.0,null,-5.0,-4.0,null,-4.0,-4.0,null,-4.0,-4.0,null,-4.0,-4.0,null,-4.0,-4.0,null],"type":"scatter3d"},{"legendgroup":"GT point","marker":{"size":2.0},"mode":"markers","name":"GT point","x":[0.1],"y":[0.1],"z":[1.5],"type":"scatter3d"},{"legendgroup":"DLT estimate","marker":{"size":2.0},"mode":"markers","name":"DLT estimate","x":[0.1023714151218403],"y":[0.16890260533047632],"z":[1.453409909325872],"type":"scatter3d"},{"legendgroup":"ray 1","marker":{"size":[0,6,0],"symbol":"diamond"},"mode":"lines+markers","name":"ray 1","x":[0.0,0.14967333333333332,null],"y":[0.0,0.15287333333333333,null],"z":[0.0,2.0,null],"type":"scatter3d"},{"legendgroup":"ray 2","marker":{"size":[0,6,0],"symbol":"diamond"},"mode":"lines+markers","name":"ray 2","x":[5.0,-0.3196230769230777,null],"y":[0.0,0.2455223076923077,null],"z":[-5.0,2.0,null],"type":"scatter3d"}], {"template":{"data":{"bar":[{"error_x":{"color":"#2a3f5f"},"error_y":{"color":"#2a3f5f"},"marker":{"line":{"color":"#E5ECF6","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"bar"}],"barpolar":[{"marker":{"line":{"color":"#E5ECF6","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"barpolar"}],"carpet":[{"aaxis":{"endlinecolor":"#2a3f5f","gridcolor":"white","linecolor":"white","minorgridcolor":"white","startlinecolor":"#2a3f5f"},"baxis":{"endlinecolor":"#2a3f5f","gridcolor":"white","linecolor":"white","minorgridcolor":"white","startlinecolor":"#2a3f5f"},"type":"carpet"}],"choropleth":[{"colorbar":{"outlinewidth":0,"ticks":""},"type":"choropleth"}],"contour":[{"colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"type":"contour"}],"contourcarpet":[{"colorbar":{"outlinewidth":0,"ticks":""},"type":"contourcarpet"}],"heatmap":[{"colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"type":"heatmap"}],"heatmapgl":[{"colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"type":"heatmapgl"}],"histogram":[{"marker":{"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"histogram"}],"histogram2d":[{"colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"type":"histogram2d"}],"histogram2dcontour":[{"colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"type":"histogram2dcontour"}],"mesh3d":[{"colorbar":{"outlinewidth":0,"ticks":""},"type":"mesh3d"}],"parcoords":[{"line":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"parcoords"}],"pie":[{"automargin":true,"type":"pie"}],"scatter":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scatter"}],"scatter3d":[{"line":{"colorbar":{"outlinewidth":0,"ticks":""}},"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scatter3d"}],"scattercarpet":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scattercarpet"}],"scattergeo":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scattergeo"}],"scattergl":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scattergl"}],"scattermapbox":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scattermapbox"}],"scatterpolar":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scatterpolar"}],"scatterpolargl":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scatterpolargl"}],"scatterternary":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scatterternary"}],"surface":[{"colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"type":"surface"}],"table":[{"cells":{"fill":{"color":"#EBF0F8"},"line":{"color":"white"}},"header":{"fill":{"color":"#C8D4E3"},"line":{"color":"white"}},"type":"table"}]},"layout":{"annotationdefaults":{"arrowcolor":"#2a3f5f","arrowhead":0,"arrowwidth":1},"autotypenumbers":"strict","coloraxis":{"colorbar":{"outlinewidth":0,"ticks":""}},"colorscale":{"diverging":[[0,"#8e0152"],[0.1,"#c51b7d"],[0.2,"#de77ae"],[0.3,"#f1b6da"],[0.4,"#fde0ef"],[0.5,"#f7f7f7"],[0.6,"#e6f5d0"],[0.7,"#b8e186"],[0.8,"#7fbc41"],[0.9,"#4d9221"],[1,"#276419"]],"sequential":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"sequentialminus":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]},"colorway":["#636efa","#EF553B","#00cc96","#ab63fa","#FFA15A","#19d3f3","#FF6692","#B6E880","#FF97FF","#FECB52"],"font":{"color":"#2a3f5f"},"geo":{"bgcolor":"white","lakecolor":"white","landcolor":"#E5ECF6","showlakes":true,"showland":true,"subunitcolor":"white"},"hoverlabel":{"align":"left"},"hovermode":"closest","mapbox":{"style":"light"},"paper_bgcolor":"white","plot_bgcolor":"#E5ECF6","polar":{"angularaxis":{"gridcolor":"white","linecolor":"white","ticks":""},"bgcolor":"#E5ECF6","radialaxis":{"gridcolor":"white","linecolor":"white","ticks":""}},"scene":{"xaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","gridwidth":2,"linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white"},"yaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","gridwidth":2,"linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white"},"zaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","gridwidth":2,"linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white"}},"shapedefaults":{"line":{"color":"#2a3f5f"}},"ternary":{"aaxis":{"gridcolor":"white","linecolor":"white","ticks":""},"baxis":{"gridcolor":"white","linecolor":"white","ticks":""},"bgcolor":"#E5ECF6","caxis":{"gridcolor":"white","linecolor":"white","ticks":""}},"title":{"x":0.05},"xaxis":{"automargin":true,"gridcolor":"white","linecolor":"white","ticks":"","title":{"standoff":15},"zerolinecolor":"white","zerolinewidth":2},"yaxis":{"automargin":true,"gridcolor":"white","linecolor":"white","ticks":"","title":{"standoff":15},"zerolinecolor":"white","zerolinewidth":2}}},"margin":{"l":0,"r":0,"t":0,"b":0},"legend":{"y":0.96},"scene":{"aspectmode":"data"}}, {"responsive": true} ) };
</script>
</div>
<p><br />
<!-- ==================== End of visualization HTML ========================== --></p>
<h2 id="the-problem-with-dlt">The problem with DLT</h2>
<p>DLT provides an exact solution to the triangulation problem in the absence of measurement noise. In practice however, the <strong>2D measurements we use are almost always noisy</strong>, so their back-projected rays may no longer intersect in 3D. The DLT, being an unweighted least squares solution, places equal weight on the residual from each measurement (each row of the linear system in (3)). The trouble is that this does not minimize the proper cost function. The covariance of the estimate with respect to a measurement depends on factors like the <em>range</em> of the estimate from the camera (i.e, measurements from closer cameras should be trusted more). Ideally, we should minimize the <em>covariance-weighted</em> reprojection errors on the image plane instead of the residuals of the arbitrarily scaled rows of (3).</p>
<h2 id="optimal-triangulation">Optimal triangulation</h2>
<p>Optimal triangulation computes the maximum likelihood estimate of the weighted residual norm. Consider the common case of noisy measurements \(\mathbf{\tilde{u}_i} = \mathbf{u_i} + \mathbf{n_i}\), where $\mathbf{n_i}$ is Gaussian noise in 2D pixel space with covariance $\mathbf{\Sigma_{u_i}}$. The noise-free measurement $\mathbf{u_i}$ is a function of the unknown point $\mathbf{^wr}$:</p>
\[\begin{equation}
\mathbf{u_i} = \mathbf{\mathit{h_i}(^wr)} = \mathbf{K_i} \mathbf{R^i_w} (\mathbf{^wr} - \mathbf{^wp_i})
\end{equation}\]
<!-- where $\mathbf{S}$ is a matrix of the form $[\mathbf{I}_{2 \times 2} \ \mathbf{0}_{2 \times 1}]$. -->
<p>The optimal estimate in an MLE framework is the solution that minimizes the weighted residual norm:</p>
\[\begin{equation}
J(\mathbf{^wr}) = \sum_{i=1}^{n} \mathbf{n_i^T} \mathbf{\Sigma_{u_i}^{-1}} \mathbf{n_i} = \sum_{i=1}^{n} \mathbf{(\tilde{u}_i - \mathbf{\mathit{h_i}(^wr)})^T} \mathbf{\Sigma_{u_i}^{-1}} \mathbf{(\tilde{u}_i - \mathbf{\mathit{h_i}(^wr)})}
\end{equation}\]
<p>For the case of triangulating a point from 2 camera measurements, (5) can be solved analytically as shown by <a href="https://doi.org/10.1006/cviu.1997.0547">Hartley and Strum</a> and the <a href="https://doi.org/10.2514/1.G006989">LOST paper</a> (<a href="https://arxiv.org/pdf/2205.12197.pdf">Arxiv</a>). This usually involves solving a polynomial of degree 6.</p>
<p>We often encounter triangulation problems with more than two measurements where the degree 6 polynomial solution of Hartley & Sturm no longer applies. In such cases, it is common to minimize (5) using an iterative nonlinear solver starting with the DLT estimate as the initialization. In GTSAM, this can be achieved using a factor graph by setting <code class="language-plaintext highlighter-rouge">optimize=True</code> in the call to <code class="language-plaintext highlighter-rouge">triangulatePoint3</code>. This scales to the general case with more than 2 camera measurements, but like all iterative methods, significantly increases the estimation latency.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Optimization needs the measurement noise model.
</span><span class="n">noisemodel</span> <span class="o">=</span> <span class="n">gtsam</span><span class="p">.</span><span class="n">noiseModel</span><span class="p">.</span><span class="n">Isotropic</span><span class="p">.</span><span class="n">Sigma</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mf">1e-3</span><span class="p">)</span>
<span class="n">optimal_estimate</span> <span class="o">=</span> <span class="n">gtsam</span><span class="p">.</span><span class="n">triangulatePoint3</span><span class="p">(</span><span class="n">cameras</span><span class="p">,</span> <span class="n">measurements</span><span class="p">,</span> <span class="n">rank_tol</span><span class="o">=</span><span class="mf">1e-9</span><span class="p">,</span> <span class="n">optimize</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">model</span><span class="o">=</span><span class="n">noisemodel</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Optimal estimation error: {:.04f}"</span><span class="p">.</span><span class="nb">format</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">linalg</span><span class="p">.</span><span class="n">norm</span><span class="p">(</span><span class="n">optimal_estimate</span> <span class="o">-</span> <span class="n">landmark</span><span class="p">)))</span>
</code></pre></div></div>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Optimal estimation error: 0.0549
</code></pre></div></div>
<h2 id="the-linear-optimal-sine-triangulation-lost-approach">The Linear Optimal Sine Triangulation (LOST) approach</h2>
<p>In their paper on <a href="https://doi.org/10.2514/1.G006989">“Absolute Triangulation Algorithms for Space Exploration”</a> (<a href="https://arxiv.org/pdf/2205.12197.pdf">Arxiv</a>), Henry and Christian propose the linear optimal sine triangulation (LOST) method that non-iteratively solves the statistically optimal triangulation problem as a linear system. When only two measurements are available, LOST provides the same answer as Hartley & Sturm’s polynomial solution. However, unlike the polynomial solution, LOST scales linearly to an arbitrary number of measurements and remains non-iterative. LOST is a weighted least squares approach that both (1) provides the same solution as and (2) is significantly faster than the iterative nonlinear optimization approach commonly used for optimal triangulation with many (more than two) cameras.</p>
<p>Their approach minimizes the weighted residual norm in the image-plane coordinates (it can easily be rewritten in terms of pixel coordinates). For a noisy measurement on the image plane $\mathbf{\tilde{x}_i} = \mathbf{x_i} + \mathbf{w_i}$, (1) will have a residual $\mathbf{\epsilon_i}$ given by:</p>
\[\begin{equation}
{\mathbf{\tilde{x}_i}} \times \mathbf{R^i_w} (\mathbf{^wr} - \mathbf{^wp_i}) = \mathbf{\epsilon_i}
\end{equation}\]
<p>Since $[\mathbf{R^i_w} (\mathbf{^wr} - \mathbf{^wp_i}) \times]$ is a skew symmetric matrix, and $\mathbf{\tilde{x}_i} = \mathbf{K_i}^{-1} \mathbf{\tilde{u}_i}$</p>
\[\begin{equation}
\mathbf{\epsilon_i} = [\mathbf{R^i_w} (\mathbf{^wp_i} - \mathbf{^wr}) \times] \ \mathbf{K_i^{-1}} \mathbf{\tilde{u}_i}
\end{equation}\]
<p>The weighted residual norm to be minimized is:
\(\begin{equation}
J(\mathbf{^wr}) = \sum_{i=1}^{n} \mathbf{\epsilon_i^T} \mathbf{\Sigma_{\epsilon_i}^{-1}} \mathbf{\epsilon_i}
\end{equation}\)</p>
<p>We skip through the bulk of the math in what follows, so please refer to the <a href="https://doi.org/10.2514/1.G006989">LOST paper</a> (<a href="https://arxiv.org/pdf/2205.12197.pdf">Arxiv</a>) by Henry and Christian for a neat derivation. The error covariance $\mathbf{\Sigma_{\epsilon_i}}$ can be expressed in terms of the 2D measurement covariance $\mathbf{\Sigma_{x_i}}$ as:</p>
\[\begin{equation}
\mathbf{\Sigma_{\epsilon_i}} = -\frac{\rho_i^{2}}{||\mathbf{K_i^{-1}} \mathbf{\tilde{u}_i}||^2} [\mathbf{K_i^{-1}} \mathbf{\tilde{u}_i} \times ] \mathbf{\Sigma_{x_i}} [\mathbf{K_i^{-1}} \mathbf{\tilde{u}_i} \times ]
\end{equation}\]
<p>where $\rho_i$ is the range of the range of the 3D point from the center of camera $i$. The difficulty with this expression for $\mathbf{\Sigma_{\epsilon_i}}$ is that it is not full rank (not invertible) and the ranges $\rho_i$ are not known a priori (since the 3D point’s location is not yet known). The LOST paper shows how both of these difficulties can be avoided and provides a general non-iterative solution. An especially nice result occurs when the image plane measurement errors are isotropic. In this case, minimizing the cost function $J(\mathbf{^wr})$ leads to a simple least squares expression:</p>
\[\begin{equation}
\left[\begin{array}{c}
q_1 \left[\mathbf{K_1^{-1}} \mathbf{u_1} \times\right] \mathbf{R^1_w} \cr \\
q_2 \left[\mathbf{K_2^{-1}} \mathbf{u_2} \times\right] \mathbf{R^2_w} \cr \\
... \cr \\
q_n \left[\mathbf{K_n^{-1}} \mathbf{u_n} \times\right] \mathbf{R^n_w} \cr
\end{array}\right] \mathbf{^wr} =
\left[\begin{array}{c}
q_1 \left[\mathbf{K_1^{-1}} \mathbf{u_1} \times\right] \mathbf{R^1_w} \mathbf{^wp_1} \cr \\
q_2 \left[\mathbf{K_2^{-1}} \mathbf{u_2} \times\right] \mathbf{R^2_w} \mathbf{^wp_2} \cr \\
... \cr \\
q_n \left[\mathbf{K_n^{-1}} \mathbf{u_n} \times\right] \mathbf{R^n_w} \mathbf{^wp_n} \cr
\end{array}\right]
\end{equation}\]
<p>Note that this looks very much like (3), except for the inclusion of the coefficients $q_i$ (weighted least squares, of course!). The final expression for $q_i$ is:</p>
\[\begin{equation}
q_i = \frac{||\mathbf{K_i^{-1}} \mathbf{u_i}||}{\sigma_{x_i} \rho_i} = \frac{|| \mathbf{R^w_i} \mathbf{K_i^{-1}} \mathbf{\tilde{u}_i} \times \ \mathbf{R^w_j} \mathbf{K_j^{-1}} \mathbf{\tilde{u}_j} ||}{\sigma_{x_i} || \mathbf{d_{ij}} \times \ \mathbf{R^w_j} \mathbf{K_j^{-1}} \mathbf{\tilde{u}_j} ||}
\end{equation}\]
<p>where $\mathbf{d_{ij}} = (\mathbf{^wp_j} - \mathbf{^wp_i})$ is the known baseline between cameras $i$ and $j$. Note that everything on the right-hand side of this expression for $q_i$ is known a priori, and so the optimal LOST weights $q_i$ may be found directly and without any iteration.</p>
<p>Starting from release <a href="https://github.com/borglab/gtsam/releases">4.2a8</a>, gtsam includes an implementation of LOST which can be easily used as follows:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">lost_estimate</span> <span class="o">=</span> <span class="n">gtsam</span><span class="p">.</span><span class="n">triangulatePoint3</span><span class="p">(</span><span class="n">cameras</span><span class="p">,</span> <span class="n">measurements</span><span class="p">,</span> <span class="mf">1e-9</span><span class="p">,</span> <span class="n">optimize</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span> <span class="n">model</span><span class="o">=</span><span class="n">noisemodel</span><span class="p">,</span> <span class="n">useLOST</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"LOST estimation error: {:.04f}"</span><span class="p">.</span><span class="nb">format</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">linalg</span><span class="p">.</span><span class="n">norm</span><span class="p">(</span><span class="n">lost_estimate</span> <span class="o">-</span> <span class="n">landmark</span><span class="p">)))</span>
</code></pre></div></div>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>LOST estimation error: 0.0581
</code></pre></div></div>
<p>The estimation error obtained above is comparable to that obtained by iterative optimization. Although the LOST error in this particular instance is slightly larger than the iterative solution, sometimes the reverse is true (i.e., sometimes LOST is slightly better). Indeed, as we will see in the next section, the statistics of LOST and the iterative LS solution are identical.</p>
<p>The visualization below adds the LOST estimate to the DLT estimate we obtained above. Note how the LOST estimate is much closer to the ground truth (especially along X and Y axes) compared to the DLT estimate.</p>
<!-- ==================== Start of visualization HTML ========================== -->
<p><br /></p>
<div>
<script type="text/javascript">window.PlotlyConfig = {MathJaxConfig: 'local'};</script>
<script src="https://cdn.plot.ly/plotly-2.8.3.min.js"></script>
<div id="b23b6f24-84b0-4d35-bf97-82930bca6c5d" class="plotly-graph-div" style="height:100%; width:100%;">
</div>
<script type="text/javascript"> window.PLOTLYENV=window.PLOTLYENV || {}; if (document.getElementById("b23b6f24-84b0-4d35-bf97-82930bca6c5d")) { Plotly.newPlot( "b23b6f24-84b0-4d35-bf97-82930bca6c5d", [{"legendgroup":"camera 1","mode":"lines","name":"camera 1","x":[0.0,-0.8,null,0.0,-0.8,null,0.0,0.8,null,0.0,0.8,null,-0.8,-0.8,null,-0.8,0.8,null,0.8,-0.8,null,0.8,0.8,null],"y":[0.0,-0.6,null,0.0,0.6,null,0.0,-0.6,null,0.0,0.6,null,-0.6,0.6,null,0.6,0.6,null,-0.6,-0.6,null,0.6,-0.6,null],"z":[0.0,1.0,null,0.0,1.0,null,0.0,1.0,null,0.0,1.0,null,1.0,1.0,null,1.0,1.0,null,1.0,1.0,null,1.0,1.0,null],"type":"scatter3d"},{"legendgroup":"camera 2","mode":"lines","name":"camera 2","x":[5.0,4.2,null,5.0,4.2,null,5.0,5.8,null,5.0,5.8,null,4.2,4.2,null,4.2,5.8,null,5.8,4.2,null,5.8,5.8,null],"y":[0.0,-0.6,null,0.0,0.6,null,0.0,-0.6,null,0.0,0.6,null,-0.6,0.6,null,0.6,0.6,null,-0.6,-0.6,null,0.6,-0.6,null],"z":[-5.0,-4.0,null,-5.0,-4.0,null,-5.0,-4.0,null,-5.0,-4.0,null,-4.0,-4.0,null,-4.0,-4.0,null,-4.0,-4.0,null,-4.0,-4.0,null],"type":"scatter3d"},{"legendgroup":"GT point","marker":{"size":2.0},"mode":"markers","name":"GT point","x":[0.1],"y":[0.1],"z":[1.5],"type":"scatter3d"},{"legendgroup":"DLT estimate","marker":{"size":2.0},"mode":"markers","name":"DLT estimate","x":[0.1023714151218403],"y":[0.16890260533047632],"z":[1.453409909325872],"type":"scatter3d"},{"legendgroup":"LOST estimate","marker":{"size":2.0},"mode":"markers","name":"LOST estimate","x":[0.10783485812100543],"y":[0.11608849005416458],"z":[1.444684619571376],"type":"scatter3d"},{"legendgroup":"ray 1","marker":{"size":[0,6,0],"symbol":"diamond"},"mode":"lines+markers","name":"ray 1","x":[0.0,0.14967333333333332,null],"y":[0.0,0.15287333333333333,null],"z":[0.0,2.0,null],"type":"scatter3d"},{"legendgroup":"ray 2","marker":{"size":[0,6,0],"symbol":"diamond"},"mode":"lines+markers","name":"ray 2","x":[5.0,-0.3196230769230777,null],"y":[0.0,0.2455223076923077,null],"z":[-5.0,2.0,null],"type":"scatter3d"}], {"template":{"data":{"bar":[{"error_x":{"color":"#2a3f5f"},"error_y":{"color":"#2a3f5f"},"marker":{"line":{"color":"#E5ECF6","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"bar"}],"barpolar":[{"marker":{"line":{"color":"#E5ECF6","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"barpolar"}],"carpet":[{"aaxis":{"endlinecolor":"#2a3f5f","gridcolor":"white","linecolor":"white","minorgridcolor":"white","startlinecolor":"#2a3f5f"},"baxis":{"endlinecolor":"#2a3f5f","gridcolor":"white","linecolor":"white","minorgridcolor":"white","startlinecolor":"#2a3f5f"},"type":"carpet"}],"choropleth":[{"colorbar":{"outlinewidth":0,"ticks":""},"type":"choropleth"}],"contour":[{"colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"type":"contour"}],"contourcarpet":[{"colorbar":{"outlinewidth":0,"ticks":""},"type":"contourcarpet"}],"heatmap":[{"colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"type":"heatmap"}],"heatmapgl":[{"colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"type":"heatmapgl"}],"histogram":[{"marker":{"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"histogram"}],"histogram2d":[{"colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"type":"histogram2d"}],"histogram2dcontour":[{"colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"type":"histogram2dcontour"}],"mesh3d":[{"colorbar":{"outlinewidth":0,"ticks":""},"type":"mesh3d"}],"parcoords":[{"line":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"parcoords"}],"pie":[{"automargin":true,"type":"pie"}],"scatter":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scatter"}],"scatter3d":[{"line":{"colorbar":{"outlinewidth":0,"ticks":""}},"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scatter3d"}],"scattercarpet":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scattercarpet"}],"scattergeo":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scattergeo"}],"scattergl":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scattergl"}],"scattermapbox":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scattermapbox"}],"scatterpolar":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scatterpolar"}],"scatterpolargl":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scatterpolargl"}],"scatterternary":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scatterternary"}],"surface":[{"colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"type":"surface"}],"table":[{"cells":{"fill":{"color":"#EBF0F8"},"line":{"color":"white"}},"header":{"fill":{"color":"#C8D4E3"},"line":{"color":"white"}},"type":"table"}]},"layout":{"annotationdefaults":{"arrowcolor":"#2a3f5f","arrowhead":0,"arrowwidth":1},"autotypenumbers":"strict","coloraxis":{"colorbar":{"outlinewidth":0,"ticks":""}},"colorscale":{"diverging":[[0,"#8e0152"],[0.1,"#c51b7d"],[0.2,"#de77ae"],[0.3,"#f1b6da"],[0.4,"#fde0ef"],[0.5,"#f7f7f7"],[0.6,"#e6f5d0"],[0.7,"#b8e186"],[0.8,"#7fbc41"],[0.9,"#4d9221"],[1,"#276419"]],"sequential":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"sequentialminus":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]},"colorway":["#636efa","#EF553B","#00cc96","#ab63fa","#FFA15A","#19d3f3","#FF6692","#B6E880","#FF97FF","#FECB52"],"font":{"color":"#2a3f5f"},"geo":{"bgcolor":"white","lakecolor":"white","landcolor":"#E5ECF6","showlakes":true,"showland":true,"subunitcolor":"white"},"hoverlabel":{"align":"left"},"hovermode":"closest","mapbox":{"style":"light"},"paper_bgcolor":"white","plot_bgcolor":"#E5ECF6","polar":{"angularaxis":{"gridcolor":"white","linecolor":"white","ticks":""},"bgcolor":"#E5ECF6","radialaxis":{"gridcolor":"white","linecolor":"white","ticks":""}},"scene":{"xaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","gridwidth":2,"linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white"},"yaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","gridwidth":2,"linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white"},"zaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","gridwidth":2,"linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white"}},"shapedefaults":{"line":{"color":"#2a3f5f"}},"ternary":{"aaxis":{"gridcolor":"white","linecolor":"white","ticks":""},"baxis":{"gridcolor":"white","linecolor":"white","ticks":""},"bgcolor":"#E5ECF6","caxis":{"gridcolor":"white","linecolor":"white","ticks":""}},"title":{"x":0.05},"xaxis":{"automargin":true,"gridcolor":"white","linecolor":"white","ticks":"","title":{"standoff":15},"zerolinecolor":"white","zerolinewidth":2},"yaxis":{"automargin":true,"gridcolor":"white","linecolor":"white","ticks":"","title":{"standoff":15},"zerolinecolor":"white","zerolinewidth":2}}},"margin":{"l":0,"r":0,"t":0,"b":0},"legend":{"y":0.96},"scene":{"aspectmode":"data"}}, {"responsive": true} ) };
</script>
</div>
<p><br />
<!-- ==================== End of visualization HTML ========================== --></p>
<h2 id="how-does-lost-compare-to-dlt-and-iterative-optimization">How does LOST compare to DLT and iterative optimization?</h2>
<p>When triangulating a point several times starting from different noisy measurements, it was found that the standard deviation of the LOST error is comparable to that of iterative optimization, and much lesser than DLT. This improvement reduces with increasing number of camera measurements. The results from 100 trials for each camera configuration are shown in Fig 1.</p>
<center>
<p>
<img src="/assets/images/lost_triangulation/triang_std_dev.png" alt="Fig 1: Error standard deviation for different triangulation methods as a function of number of cameras, with 100 trials for each camera configuration." />
<br />
<em>Fig 1: Error standard deviation for different triangulation methods as a function of number of cameras, with 100 trials for each camera configuration.</em>
</p>
</center>
<p>The latency of LOST is also comparable to DLT and is much lesser than that of iterative optimization. The mean latencies for these methods as a function of the number of cameras are plotted in Fig 2. The latency of iterative optimization increases more rapidly with an increase in the number of camera measurements.</p>
<center>
<p>
<img src="/assets/images/lost_triangulation/triang_time_linear.png" alt="Fig 2: Mean runtime of different triangulation methods as a function of number of cameras." />
<br />
<em>Fig 2: Mean runtime of different triangulation methods as a function of number of cameras.</em>
</p>
</center>
<p>There are two key conclusions from these numerical experiments. First, LOST provides identical triangulation performance (i.e., identical errors) as the iterative nonlinear least squares (DLT+OPT) but at a fraction of the computational cost. There seems to be a strong case for using LOST instead of the conventional iterative methods in most situations - especially when runtime is important. Second, there are certainly some cases (such as this one) where using optimal triangulation provides substantial performance benefits when compared to the DLT.</p>
<h2 id="when-does-using-optimal-triangulation-make-the-most-difference">When does using optimal triangulation make the most difference?</h2>
<p>Is optimal triangulation always better than DLT? There are two cases when it can provide much better results than DLT:</p>
<ul>
<li>The 3D point is at significantly different ranges from each camera that observes it. Since the covariance of the estimate increases with its range from the camera, optimal triangulation weighs measurements from closer cameras more than those from farther cameras. This can also be seen in (12): the weights of each measurement are inversely proportional to the $\rho_i$. The optimal estimates are therefore closer to the measurements from closer cameras.</li>
<li>When different measurements have different 2D noise models. This can happen if the cameras observing these measurements are of different quality, or the measurement noise is non-uniformly distributed across the image (more on the edges and lesser at the center, for instance).</li>
</ul>
<p>In other cases, the results from DLT and optimal triangulation are unlikely to be very different. For highly runtime-constrained applications that encounter the above scenarios, using LOST instead of results from DLT can reduce estimation errors.</p>
<h2 id="references">References</h2>
<ol>
<li><a href="https://doi.org/10.2514/1.G006989">“Absolute Triangulation Algorithms for Space Exploration”</a>, Sébastien Henry, John Christian, Journal of Guidance, Control, and Dynamics 2023 46:1, Pages 21-46. <a href="https://arxiv.org/pdf/2205.12197.pdf">Arxiv version Aug ‘22</a></li>
<li><a href="https://doi.org/10.1006/cviu.1997.0547">“Triangulation”</a>, Richard Hartley, Peter Sturm, Computer Vision and Image Understanding, November 1997, Volume 68, Issue 2, Pages 146-157</li>
</ol>Reducing the uncertainty about the uncertainties, part 1: Linear and nonlinear2021-02-23T00:00:00+00:002021-02-23T00:00:00+00:00http://gtsam.org/2021/02/23/uncertainties-part1<link rel="stylesheet" href="/assets/css/slideshow.css" />
<p>Author: <a href="https://mmattamala.github.io">Matias Mattamala</a></p>
<div style="display:none"> <!-- custom latex commands here -->
$
\usepackage{amsmath}
\usepackage[makeroom]{cancel}
\DeclareMathOperator*{\argmin}{argmin}
\newcommand{\coloneqq}{\mathrel{:=}}
$
</div>
<style>
.MJXc-display {
overflow-x: auto;
overflow-y: hidden;
scrollbar-width: none; /* Firefox */
-ms-overflow-style: none; /* Internet Explorer 10+ */
}
.MJXc-display::-webkit-scrollbar {
width: 5px;
height: 2px;
}
.MJXc-display::-webkit-scrollbar-track {
background: transparent;
}
.MJXc-display::-webkit-scrollbar-thumb {
background: #ddd;
visibility:hidden;
}
.MJXc-display:hover::-webkit-scrollbar-thumb {
visibility:visible;
}
</style>
<p><!-- horizontal scrolling --></p>
<!-- - TOC -->
<h2 id="introduction">Introduction</h2>
<p>In these posts I will review some general aspects of optimization-based state estimation methods, and how to input and output consistent quantities and uncertainties, i.e, covariance matrices, from them. We will take a (hopefully) comprehensive tour that will cover <em>why we do the things the way we do</em>, aiming to clarify some <em>uncertain</em> things about working with covariances. We will see how most of the explanations naturally arise by making explicit the definitions and conventions that sometimes we implicitly assume when using these tools.</p>
<p>They summarize and extend some of the interesting discussions we had in the <a href="https://groups.google.com/g/gtsam-users/c/c-BhH8mfqbo/m/7Wsj_nogBAAJ">gtsam-users</a> group. We hope that such space will continue to bring to the table relevant questions shared by all of us.</p>
<h2 id="a-simple-example-a-pose-graph">A simple example: a pose graph</h2>
<p>As a motivation, we will use a similar pose graph to those used in other GTSAM examples:</p>
<p><a name="fg_example"></a></p>
<figure class="center">
<img src="/assets/images/uncertainties/motivation.png" alt="Simple pose graph example" />
<figcaption> We consider a robot moving on the 2D plane (top), which has an odometer that provides relative measurements of the robot's displacement. The graph at the bottom represent the factor graph model. Modeling the odometry factor in a consistent way is the main topic we will cover in these posts.</figcaption>
</figure>
<p><br /></p>
<p>To start, we will consider that the variables in the graph \(\mathbf{x}_i\)
correspond to positions \({\mathbf{x}_{i}} = (x,y) \in \mathbb{R}^2\). The variables are related by a linear model given by a transition matrix \(\mathbf{A}_{i}\) and a constant term \(\mathbf{b}_{i} = (b_x, b_y)\), which can be obtained from some sensor such as a wheel odometer. We can then establish the following relationship between variables \(i\) and \(i+1\):</p>
\[\begin{equation}
\mathbf{x}_{i+1} = \mathbf{A}_{i} \mathbf{x}_i + \mathbf{b}_i
\end{equation}\]
<p>However, we know that in reality things do not work that way, and we will usually have errors produced by noise in our sensors or actuators. The most common way to address this problem is adding some <em>zero-mean Gaussian noise</em> \(\eta_i\sim Gaussian(\mathbf{0}_{2\times1}, \Sigma_i)\) to our model to represent this uncertainty:</p>
\[\begin{equation}
\mathbf{x}_{i+1} = \mathbf{A}_{i}\mathbf{x}_i + \mathbf{b}_i + \eta_i
\end{equation}\]
<p>We can recognize here the typical <em>motion</em> or <em>process model</em> we use in Kalman filter for instance, that describes how our state evolves. We say that the noise we introduced on the right side states that our next state \(\mathbf{x}_{i+1}\) will be <em>around</em> \(\mathbf{A}_{i}\mathbf{x}_i + \mathbf{b}_i\), and the covariance matrix \(\Sigma_i\) describes the region where we expect \(\mathbf{x}_{i+1}\) to lie.</p>
<p>We can also notice that with a bit of manipulation, it is possible to establish the following relationship:</p>
\[\begin{equation}
\eta_i = \mathbf{x}_{i+1} - \mathbf{A}_{i}\mathbf{x}_i - \mathbf{b}_i
\end{equation}\]
<p>This is an important expression because we know that the left-hand expression follows a Gaussian distribution. But since we have an equivalence, the right-hand term must do as well. We must note here that what distributes as a Gaussian is neither \(\mathbf{x}_{i}\) nor \(\mathbf{x}_{i+1}\), but the <em>difference</em> \((\mathbf{x}_{i+1} - \mathbf{A}_{i}\mathbf{x}_i - \mathbf{b}_i)\). This allows us to use the difference as an <strong>odometry factor</strong> that relates \(\mathbf{x}_i\) and \(\mathbf{x}_{i+1}\) probabillistically in our factor graph:</p>
\[\begin{equation}
(\mathbf{x}_{i+1} - \mathbf{A}_{i}\mathbf{x}_i - \mathbf{b}_i) \sim Gaussian(\mathbf{0}_{2\times1}, \Sigma_i)
\end{equation}\]
<h3 id="analyzing-the-solution">Analyzing the solution</h3>
<p>Solving the factor graph using the previous expression for the odometry factors is equivalent to solve the following least squares problem under the assumption that all our factors are Gaussian and we use <em>maximum-a-posteriori</em> (MAP) estimation, which is fortunately our case:</p>
\[\begin{equation}
\mathcal{X}^{*} = \argmin\displaystyle\sum_{i} || \mathbf{A}_i \mathbf{x}_{i} + \mathbf{b}_{i} - {\mathbf{x}_{i+1}} ||^{2}_{\Sigma_i}
\end{equation}\]
<p>Please note here that while we swapped the terms in the factor to be consistent with GTSAM’s documentation, it does not affect the formulation since the term is squared.</p>
<p>The previous optimization problem is linear with respect to the variables, hence solvable in closed form. By differentiating the squared cost, setting it to zero and doing some manipulation, we end up with the so-called <em>normal equations</em>, which are particularly relevant for our posterior analysis:</p>
\[\begin{equation}
\mathbf{A}^{T} \Sigma^{-1} \mathbf{A}\ \mathcal{X}^{*} = \mathbf{A}^{T} \Sigma^{-1} \mathbf{b}
\end{equation}\]
<p>where the vector $\mathcal{X}^{*}$ stacks all the variables in the problem, $\mathbf{A}$ is a matrix that stacks the linear terms in the factors, and $\mathbf{b}$ does the same for the constant vector terms.</p>
<p>First point to note here is that finding the solution $\mathcal{X}^{*}$ requires to invert the matrix $\mathbf{A}^{T} \Sigma^{-1} \mathbf{A}$, which in general is hard since it can be huge and dense in some parts. However we know that there are clever ways to solve it, such as iSAM and iSAM2 that GTSAM already implements, which are covered in <a href="https://www.cc.gatech.edu/~dellaert/pubs/Dellaert17fnt.pdf">this comprehensive article</a> by Frank Dellaert and Michael Kaess.</p>
<p><a name="hessian"></a></p>
<figure class="center">
<img src="/assets/images/uncertainties/hessian.png" alt="Hessian matrix" />
<figcaption>When solving the normal equations, the left-hand side is known as the Fisher information matrix or Hessian. Its inverse approximates the covariance of the least squares solution.</figcaption>
</figure>
<p><br /></p>
<p>Our interest in this matrix, though, known as <strong>Fisher information matrix</strong> or <strong>Hessian</strong> (since it approximates the Hessian of the original quadratic cost), is that its inverse \(\Sigma^{*} = (\mathbf{A}^{T} \Sigma^{-1} \mathbf{A})^{-1}\) also approximates the covariance of our solution - known as <em>Laplace approximation</em> in machine learning (<a href="https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf">Bishop (2006), Chap. 4.4</a>). This is a quite important result because by solving the factor graph we are not only recovering an estimate of the mean of the solution but also a measure of its uncertainty. In GTSAM, the <code class="language-plaintext highlighter-rouge">Marginals</code> class implements this procedure and a example can be further explored in the <a href="https://gtsam.org/tutorials/intro.html#subsec_Full_Posterior_Inference">tutorial</a>.</p>
<p>As a result, we can say that after solving the factor graph the <em>probability distribution of the solution</em> is given by \(Gaussian(\mathcal{X}^{*}, \Sigma^{*})\).</p>
<h2 id="getting-nonlinear">Getting nonlinear</h2>
<p>The previous pose graph was quite simple and probably not applicable for most of our problems. Having linear factors as the previous one is an <em>impossible dream</em> for most of the applications. It is more realistic to think that our state \(i+1\) will evolve as a nonlinear function of the state $i$ and the measurements, which is more general than the formerly assummed linear model \(\mathbf{A}_{i} \mathbf{x}_i + \mathbf{b}_i\):</p>
\[\begin{equation}
\mathbf{x}_{i+1} = f(\mathbf{x}_i)
\end{equation}\]
<p>Despite nonlinear, we can still say that our mapping has some errors involved, which are embeded into a zero-mean Gaussian noise that we can simply add as before:</p>
\[\begin{equation}
\mathbf{x}_{i+1} = f(\mathbf{x}_i) + \eta_i
\end{equation}\]
<p>Following a similar procedure as before isolating the noise, we can reformulate the problem as the following <em>nonlinear least squares</em> (NLS) problem:</p>
\[\begin{equation}
\mathcal{X} = \argmin \displaystyle\sum_i || f(\mathbf{x}_i) - \mathbf{x}_{i+1} ||^2_{\Sigma_i}
\end{equation}\]
<p>Since the system is not linear with respect to the variables anymore we cannot solve it in close form. We need to use nonlinear optimization algorithms such as <a href="https://en.wikipedia.org/wiki/Gauss%E2%80%93Newton_algorithm">Gauss-Newton</a>, <a href="https://en.wikipedia.org/wiki/Levenberg%E2%80%93Marquardt_algorithm">Levenberg-Marquardt</a> or <a href="https://en.wikipedia.org/wiki/Powell%27s_dog_leg_method">Dogleg</a>. They will try to approximate our <em>linear dream</em> by linearizing the residual with respect to some <strong>linearization or operation point</strong> given by a guess \(\mathcal{\bar{X}}^{k}\) valid at iteration $k$:</p>
\[\begin{equation}
f(\mathbf{x}_i) \approx f(\mathbf{\bar{x}}^{k}) + \mathbf{H}(\mathbf{\bar{x}}^{k})\mathbf{x}_i = \mathbf{b}_k + \mathbf{H}^k \mathbf{x}_i
\end{equation}\]
<p>where \(\mathbf{H}(\mathbf{\bar{x}}^{k})\) denotes the <em>Jacobian</em> of \(f(\cdot)\) evaluated at the linearization point \(\mathbf{\bar{x}}^{k}\). Hence the problem becomes:</p>
\[\begin{equation}
\delta\mathcal{X}^{k}= \argmin \displaystyle\sum_i || \mathbf{H}^k\mathbf{x}_i + \mathbf{b}_i - \mathbf{x}_{i+1} ||^2_{\Sigma_i^k}
\end{equation}\]
<p>It is important to observe here that we are not obtaining the global solution $\mathcal{X}$, but just a small increment $\delta\mathcal{X}^{k}$ that will allow us to move closer to some minima that depends on the initial values. This linear problem <em>can</em> be solved in closed form as we did before. In fact, the normal equations now become something similar to the linear case:</p>
\[\begin{equation}
\label{nonlinear-normal-equations}
(\mathbf{H}^k)^{T} (\Sigma^{k})^{-1}\ \mathbf{H}^k\ \delta\mathcal{X} = (\mathbf{H}^k)^{T} (\Sigma^{k})^{-1} \mathbf{b}
\end{equation}\]
<p>where \(\mathbf{H}^k\) is built by stacking the Jacobians at iteration $k$. The solution $\delta\mathcal{X}^{k}$ will be used to update our current solution at iteration $k$ using the update rule:</p>
\[\begin{equation}
\mathcal{X}^{k+1} = \mathcal{X}^k + \delta\mathcal{X}^{k}
\end{equation}\]
<p>where \(\mathcal{X}^{k+1}\) corresponds to the best estimate so far, and can be used as a new linearization point for a next iteration.</p>
<p>Similarly, the expression on the left-hand side \(\Sigma^{k+1} = ((\mathbf{H}^k)^{T} (\Sigma^{k})^{-1} \mathbf{H}^k)^{-1}\) also corresponds to the Fisher information or Hessian as before, <strong>but with respect to the linearization point</strong>. This is quite important because both the best solution so far $\mathcal{X}^{k+1}$ and its associated covariance $\Sigma^{k+1}$ will be valid only at this linearization point and will change with every iteration of the nonlinear optimization. But understanding this, we still can say that <strong>at iteration $k+1$ our best solution will follow a distribution</strong> $Gaussian(\mathcal{X}^{k+1}, \Sigma^{k+1})$.</p>
<h2 id="conclusions">Conclusions</h2>
<p>In this first post we reviewed the basics ideas to solve factor graphs and extract estimates from them. We showed how the linear problem described by the <em>normal equations</em> not only allows us to find the solution, but also encodes information about its covariance - given by the inverse of the Fisher information matrix. Additionally, in the context of nonlinear problems, we showed how <strong>such covariance is only valid around the linearization point</strong>.</p>
<p>In our <a href="https://gtsam.org/2021/02/23/uncertainties-part2.html">next post</a> we will review how these ideas generalize when we are not estimating <em>vector</em> variables anymore but other objects such as rotation matrices and rigid-body transformations - generally known as <strong>manifolds</strong>. While this will introduce a bit of complexity not only in the maths but also the notation, by being explicit about our conventions we will be able to consistently solve a wider set of problems usually found in robotics and computer vision.</p>Reducing the uncertainty about the uncertainties, part 2: Frames and manifolds2021-02-23T00:00:00+00:002021-02-23T00:00:00+00:00http://gtsam.org/2021/02/23/uncertainties-part2<link rel="stylesheet" href="/assets/css/slideshow.css" />
<p>Author: <a href="https://mmattamala.github.io">Matias Mattamala</a></p>
<div style="display:none"> <!-- custom latex commands here -->
$
\DeclareMathOperator*{\argmin}{argmin}
\newcommand{\coloneqq}{\mathrel{:=}}
$
</div>
<style>
.MJXc-display {
overflow-x: auto;
overflow-y: hidden;
scrollbar-width: none; /* Firefox */
-ms-overflow-style: none; /* Internet Explorer 10+ */
}
.MJXc-display::-webkit-scrollbar {
width: 5px;
height: 2px;
}
.MJXc-display::-webkit-scrollbar-track {
background: transparent;
}
.MJXc-display::-webkit-scrollbar-thumb {
background: #ddd;
visibility:hidden;
}
.MJXc-display:hover::-webkit-scrollbar-thumb {
visibility:visible;
}
</style>
<p><!-- horizontal scrolling --></p>
<!-- - TOC -->
<h2 id="introduction">Introduction</h2>
<p>In our <a href="https://gtsam.org/2021/02/23/uncertainties-part1.html">previous post</a> we discussed some basic concepts to solve linear and nonlinear factor graphs and to extract uncertainty estimates from them. We reviewed that the Fisher information matrix corresponds to an approximation of the inverse covariance of the solution, and that in the nonlinear case, both the solution and its covariance estimate are <strong>valid for the current linearization point only</strong>.</p>
<p>While the nonlinear case effectively allows us to model a bunch of problems, we need to admit we were not too honest when we said that it was <em>everything</em> we needed to model real problems. Our formulation so far assumed that the variables in our factor graph are <strong>vectors</strong>, which is not the case for robotics and computer vision at least.</p>
<p>This second part will review the concept of <em>manifold</em> and how the reference frames affect the formulation of our factor graph. The tools we cover here will be also useful to define probability distributions in a more general way, as well as to extend our estimation framework to more general problems.</p>
<h2 id="getting-non-euclidean">Getting non-Euclidean</h2>
<p>In robotics we do not estimate the state of a robot only by their position <em>but also its orientation</em>. Then we say that is its <strong>pose</strong>, i.e position and orientation together, what matters to define its state.</p>
<p>Representing pose is a tricky thing. We could say <em>“ok, but let’s just append the orientation to the position vector and do everything as before”</em> but that does not work in practice. Problem arises when we want to compose two poses $\mathbf{T}_1 = (x_1, y_1, \theta_1)$ and $\mathbf{T}_2 = (x_2, y_2, \theta_2)$. Under the vector assumption, we can write the following expression as the composition of two poses:</p>
\[\begin{equation}
\mathbf{T}_1 + \mathbf{T}_2 = \begin{bmatrix} x_1\\ y_1 \\ \theta_1 \end{bmatrix} + \begin{bmatrix} x_2 \\ y_2 \\ \theta_2\end{bmatrix} = \begin{bmatrix} x_1 + x_2 \\ y_1+y_2 \\ \theta_1 + \theta_2\end{bmatrix}
\end{equation}\]
<p>This is basically saying that <em>it does not matter if we start in pose $\mathbf{T}_1$ or $\mathbf{T}_2$, we will end up at the same final pose</em> by composing both, because in vector spaces we can commute the elements. <strong>But this does not work in reality, because rotations and translations do not commute</strong>. A simple example is that if you are currently sitting at your desk, and you <em>stand up, rotate 180 degrees and walk a step forward</em>, is completely different to <em>stand up, walk a step forward and then rotate 180 degrees</em> (apart from the fact that you will hit your desk if you do the latter).</p>
<p>So we need a different representation for poses that allow us to describe accurately what we observe in reality. Long story short, we rather have to represent planar poses as $3\times3$ matrices known as <em>rigid-body transformations</em>:</p>
\[\begin{equation}
\mathbf{T}_1 = \left[\begin{matrix} \cos{\theta_1} && -\sin{\theta_1} && x_1 \\ \sin{\theta_1} && \cos{\theta_1} && y_1 \\ 0 && 0 && 1\end{matrix} \right] = \left[\begin{matrix} \mathbf{R}_1 && \mathbf{t}_1 \\ 0 && 1\end{matrix}\right]
\end{equation}\]
<p>Here \(\mathbf{R}_1 = \left[ \begin{matrix} \cos{\theta_1} && -\sin{\theta_1}\\ \sin{\theta_1} && \cos{\theta_1}\end{matrix} \right]\) is a 2D rotation matrix, while $\mathbf{t}_1 = \left[ \begin{matrix}x_1 \ y_1 \end{matrix}\right]$ is a translation vector.
While we are using a $3\times3$ matrix now to represent the pose, <em>its degrees of freedom</em> are still $3$, since it is a function of $(x_1, y_1, \theta_1)$.</p>
<p>Working with transformation matrices is great, because we can now describe the behavior we previously explained with words using matrix operations. If we start in pose $\mathbf{T}_1$ and we apply the transformation $\mathbf{T}_2$:</p>
\[\begin{equation}
\mathbf{T}_1 \mathbf{T}_2 = \left[\begin{matrix} \mathbf{R}_1 && \mathbf{t}_1 \\ 0 && 1\end{matrix}\right] \left[\begin{matrix} \mathbf{R}_2 && \mathbf{t}_2 \\ 0 && 1\end{matrix}\right] = \left[\begin{matrix} \mathbf{R}_1 \mathbf{R}_2 && \mathbf{R}_1 \mathbf{t}_2 + \mathbf{t}_1 \\ 0 && 1\end{matrix}\right]
\end{equation}\]
<p>We can now rewrite the same problem as before but now using our transformation matrices:</p>
<p>\begin{equation}
\mathbf{T}_{i+1} = \mathbf{T}_i \ \Delta\mathbf{T}_i
\end{equation}</p>
<p>Here we are saying that the next state \(\mathbf{T}_{i+1}\) is gonna be the previous state \(\mathbf{T}_{i}\) <em>plus</em> some increment \(\Delta\mathbf{T}_i\) given by a sensor -odometry in this case. Please note that since we are now working with transformation matrices, \(\Delta\mathbf{T}_i\) is almost everything we need to model the problem more accurately, since it will represent a change in both position and orientation.</p>
<p>Now, <strong>there are two things here that we must discuss</strong> before moving on, which are fundamental to make everything consistent. We will review them carefully now.</p>
<h2 id="the-importance-of-reference-frames">The importance of reference frames</h2>
<p>While some people (myself included) prefer to write the process as we did before, applying the increment on the <strong>right-hand</strong> side:</p>
\[\begin{equation}
\mathbf{T}_{i+1} = \mathbf{T}_i \ \Delta\mathbf{T}_i
\end{equation}\]
<p>Others do it in a different way, on the <strong>left-hand</strong> side:</p>
\[\begin{equation}
\mathbf{T}_{i+1} = \Delta\mathbf{T}_i \ \mathbf{T}_i
\end{equation}\]
<p>These expressions are not equivalent as we already discussed because rigid-body transformations do not commute. However, <strong>both make sense under specific conventions</strong>. We will be more explicit now by introducing the concept of <strong>reference frames</strong> in our notation. We will use the notation presented by Paul Furgale in his blog post <a href="http://paulfurgale.info/news/2014/6/9/representing-robot-pose-the-good-the-bad-and-the-ugly">“Representing Robot Pose: The good, the bad, and the ugly”</a>.</p>
<p>Let us say that the robot trajectory in space is expressed in a fixed frame; we will called it the <em>world frame</em> $W$. The robot itself defines a coordinate frame in its body, the <em>body frame</em> $B$.</p>
<p>Then we can define the <strong>pose of the robot body $B$ expressed in the world frame $W$</strong> by using the following notation:</p>
\[\begin{equation}
\mathbf{T}_{WB}
\end{equation}\]
<p>Analogously, using we can express <strong>the pose of the origin/world $W$ expressed in the robot frame $B$</strong>:</p>
\[\begin{equation}
\mathbf{T}_{BW}
\end{equation}\]
<p>According to this formulation, the first is a description <em>from a fixed, world frame</em> while the left-hand one does it <em>from the robot’s perspective</em>. The interesting thing is that we can always go from one to the other by inverting the matrices:</p>
\[\begin{equation}
\mathbf{T}_{WB}^{-1} = \mathbf{T}_{BW}
\end{equation}\]
<p>So the inverse <em>swaps</em> the subindices, effectively changing our <em>point-of-view</em> to describe the world.</p>
<p><strong>Using one or the other does not matter. The important thing is to stay consistent</strong>.
In fact, by making the frames explicit in our convention we can always check if we are making mistakes. Let us say we are using the first definition, and our odometer is providing increments <em>relative to the previous body state</em>. Our increments \(\Delta\mathbf{T}_i\) will be \(\Delta\mathbf{T}_{B_{i} B_{i+1} }\), i.e, it is a transformation that describes the pose of the body at time $i+1$ expressed in the previous one $i$.</p>
<p>Hence, the actual pose of the body at time $i+1$ expressed in the world frame $W$ is given by <strong>matrix multiplication of the increment on the right side</strong>:</p>
\[\begin{equation}
\mathbf{T}_{WB_{i+1}} = \mathbf{T}_{WB_i} \ \Delta\mathbf{T}_{B_{i} B_{i+1} }
\end{equation}\]
<p>We know this is correct because the inner indices cancel each other, effectively representing the pose at $i+1$ in the frame we want:</p>
\[\begin{equation}
\mathbf{T}_{WB_{i+1}} = \mathbf{T}_{W {\bcancel{B_i}} } \ \Delta\mathbf{T}_{ {\bcancel{B_{i}}} B_{i+1} }
\end{equation}\]
<p>Since we <strong>applied the increment from the right-hand side</strong>, we will refer to this as the <strong>right-hand convention</strong>.</p>
<p><a name="right-convention"></a></p>
<figure class="center">
<img src="/assets/images/uncertainties/right-convention.png" alt="Right-hand convention" />
<figcaption>When using right-hand convention, the increments to describe the pose of the robot at instant $i$ are applied on the right-hand side. All our estimates are expressed with respect to a fixed world frame $W$ (in black).</figcaption>
</figure>
<p><br /></p>
<p>If we want to apply the increment to $\mathbf{T}_{BW}$, we would need to invert the measurement:</p>
\[\begin{equation}
\Delta\mathbf{T}_{B_{i+1} B_{i}} = \Delta\mathbf{T}_{B_{i} B_{i+1}}^{-1}
\end{equation}\]
<p>so we can apply the transformation <strong>on the left-hand side</strong>:</p>
\[\begin{equation}
\mathbf{T}_{B_{i+1}W} = \Delta\mathbf{T}_{B_{i+1} B_{i}} \mathbf{T}_{B_i W}
\end{equation}\]
<p>which we will refer onward as <strong>left-hand convention</strong>.</p>
<p><a name="left-convention"></a></p>
<figure class="center">
<img src="/assets/images/uncertainties/left-convention.png" alt="Left-hand convention" />
<figcaption>With left-hand convention the increments are applied from the left, changing our reference frame to the new robot pose $B_{i+1}$ (in black) .</figcaption>
</figure>
<p><br /></p>
<p><strong>In GTSAM we use the right-hand convention: we assume that the variables are expressed with respect to a fixed frame $W$, hence the increments are applied on the right-hand side</strong>.</p>
<p>It is important to have this in mind because all the operations that are implemented follow this. As a matter of fact, 3D computer vision has generally used left-hand convention because it is straightforward to apply the projection models from the left:</p>
\[\begin{equation}
{_I}\mathbf{p} = \mathbf{K}_{IC}\ \mathbf{T}_{CW}\ {_{W}}\mathbf{P}
\end{equation}\]
<p>Here we made explicit all the frames involved in the transformations we usually can find in textbooks: An homogeneous point \({_{W}}\mathbf{P}\) expressed in the world frame $W$, is transformed by the <em>extrinsic calibration matrix</em> \(\mathbf{T}_{CW}\) (a rigid body transformation ) which represents the world $W$ in the camera frame $C$, producing a vector \({_{C}}\mathbf{P}\) in the camera frame (not shown). This is projected onto the image by means of the intrinsic calibration matrix \(\mathbf{K}_{IC}\), producing the vector \(_I\mathbf{p}\), expressed in the image frame $I$.</p>
<p>Since in GTSAM the extrinsic calibration is defined the other way to be consistent with the right-hand convention, i.e $\mathbf{T}_{WC}$, the implementation of the <code class="language-plaintext highlighter-rouge">CalibratedCamera</code> handles it by <a href="https://github.com/borglab/gtsam/blob/develop/gtsam/geometry/CalibratedCamera.cpp#L120">properly inverting the matrix before doing the projection</a> as we would expect:</p>
\[\begin{equation}
{_I}\mathbf{p} = \mathbf{K}_{IC}\ (\mathbf{T}_{WC})^{-1}\ {_{W}}\mathbf{P}
\end{equation}\]
<h2 id="they-are-not-simply-matrices">They are not <em>simply matrices</em></h2>
<p>The second important point we need to discuss, is that while rigid-body transformations are nice, the new expression we have to represent the process presents some challenges when trying to use it in our factor graph using our previous method:</p>
\[\begin{equation}
\mathbf{T}_{WB_{i+1}} = \mathbf{T}_{WB_i} \ \Delta\mathbf{T}_{B_{i} B_{i+1} }
\end{equation}\]
<p>In first place, we need to figure out a way to include the <em>noise</em> term \(\eta_i\) we used before to handle the uncertainties about our measurement \(\Delta\mathbf{T}_{B_{i} B_{i+1} }\), which was also the <em>trick</em> we used to generate the Gaussian factors we use in our factor graph. For now, we will say that the noise will be given by a matrix \(\mathbf{N}_i \in \mathbb{R}^{4\times4}\), which holds the following:</p>
\[\begin{equation}
\mathbf{T}_{WB_{i+1}} = \mathbf{T}_{WB_i} \ \Delta\mathbf{T}_{B_{i} B_{i+1}} \mathbf{N}_i
\end{equation}\]
<p>Secondly, assuming the noise \(\mathbf{N}_i\) was defined somehow, we can isolate it as we did before. However, we need to use matrix operations now:</p>
\[\begin{equation}
\mathbf{N}_i = (\mathbf{T}_{WB_i} \ \Delta\mathbf{T}_{B_{i} B_{i+1}})^{-1} \mathbf{T}_{WB_{i+1}}
\end{equation}\]
<p>and we can manipulate it a bit for clarity:</p>
\[\begin{equation}
\mathbf{N}_i = (\Delta\mathbf{T}_{B_{i} B_{i+1}})^{-1} (\mathbf{T}_{WB_i})^{-1} \mathbf{T}_{WB_{i+1}}
\end{equation}\]
<p>If we do the subindex cancelation trick we did before, properly swapping the indices due to the inverses, we can confirm that the error is defined <em>in frame</em> \(B_{i+1}\) <em>and expressed in frame</em> $B_{i+1}$. This may seen counterintuitive, but in fact describes what we want: the degree of mismatch between our models and measurements to express the pose at frame \(B_{i+1}\), which ideally should be zero.</p>
<p>However, the error is still a matrix, which is impossible to include as a factor in the framework we have built so far. We cannot compute a <em>vector residual</em> as we did before, nor we are completely sure that the matrix \(\mathbf{N}_i\) follows some sort of Gaussian distribution.</p>
<p>We need to define some important concepts used in GTSAM to get a better understanding on how the objects are manipulated.</p>
<h3 id="some-objects-are-groups">Some objects are <em>groups</em></h3>
<p>First of all, some objects used in GTSAM are <a href="https://en.wikipedia.org/wiki/Group_(mathematics)"><strong>groups</strong></a>. Rigid-body transformations <code class="language-plaintext highlighter-rouge">Pose3</code> and <code class="language-plaintext highlighter-rouge">Pose2</code> in GTSAM), rotation matrices (<code class="language-plaintext highlighter-rouge">Rot2</code> and <code class="language-plaintext highlighter-rouge">Rot3</code>) and quaternions, and of course vector spaces (<code class="language-plaintext highlighter-rouge">Point2</code> and <code class="language-plaintext highlighter-rouge">Point3</code>) have some basic operations that allow us to manipulate them. These concepts <a href="https://gtsam.org/notes/GTSAM-Concepts.html">have been discussed in previous posts</a> at the implementation level:</p>
<ol>
<li><strong>Composition</strong>: How to compose 2 objects from the same manifold, with associativity properties. It is similar to the <em>addition</em> operation for vectors.</li>
<li><strong>Identity</strong>: An identity operator under the composition/between.</li>
<li><strong>Inverse</strong>: An element that when composed with its inverse becomes the identity.</li>
</ol>
<p>The previous 3 operations allows us to define the operation <strong>between</strong>, which computes the difference between 2 objects from the same group, similarly to a <em>subtraction</em> operation.</p>
<p>Since the operations can be defined in different ways for each kind of object (it is different to compose rotation matrices than quaternions or vectors for instance), some authors define special operators to refer to composition/between operations. Some examples are the <em>box-plus</em> $\boxplus$ for <em>composition</em> and <em>box-minus</em> $\boxminus$ for <em>between</em> as done by <a href="https://arxiv.org/abs/1107.1119">Hertzberg et al.</a>, <a href="https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.468.5407&rep=rep1&type=pdf">Blanco-Claraco</a>, <a href="https://arxiv.org/abs/1606.05285">Bloesch et al.</a>, and the <a href="https://github.com/ANYbotics/kindr">Kindr library</a>.</p>
<p>However, composition can be defined from the <em>left</em> or the <em>right</em> side because <strong>composition is associative but is not distributive</strong>. This is the same problem we described before when talking about reference frames and how to add small increments with the <em>left</em> and <em>right</em> hand conventions, since both are valid depending on our decisions or other authors’. While we can be clear about our own decisions, it is important to be aware of the definitions of each author because sometimes are not clearly stated. <a href="https://arxiv.org/abs/1812.01537">Solà et al</a> for example make the difference explicit by defining <em>left-</em>$\oplus$ and <em>right-</em>$\oplus$ for composition using <em>left-hand convention</em> or <em>right-hand convention</em> respectively, but we need to be careful to recognize each other’s choices.</p>
<p><strong>We recall again that in GTSAM and the rest of this post we use the <em>right</em> convention</strong> (<em>pun intended</em>), because we represent our poses with respect to a fixed world frame $W$ and the increments are defined with respect to the body frame $B$.</p>
<h3 id="some-are-manifolds">Some are <em>manifolds</em></h3>
<p>Additionally, rigid-body transformations, rotation matrices, quaternions and even vectors are <strong>differentiable manifolds</strong>. This means that even though they do not behave as Euclidean spaces at a global scale, they can be <em>locally approximated</em> as such by using local vector spaces called <strong>tangent spaces</strong>. The main advantage of analyzing all these objects from the manifold perspective is that we can build general algorithms based on common principles that apply to all of them.</p>
<p><a name="manifold"></a></p>
<figure class="center">
<img src="/assets/images/uncertainties/manifold.png" alt="Manifold and tangents spaces" />
<figcaption> While a manifold have a non-Euclidean structure, it can be locally approximated by tangent spaces. In the figure, we illustrate different tangent spaces defined at the objects $\mathbf{p}_1$ and $\mathbf{p}_2$ on the manifold.</figcaption>
</figure>
<p><br /></p>
<p>As we briefly mentioned before, objects such as rotation matrices are difficult to manipulate in the estimation framework because they are matrices. A 3D rotation matrix $\mathbf{R}$ represents 3 orientations with respect to a reference frame but, in raw terms, they are using 9 values to do so, which seems to <em>overparametrize</em> the object. However, the constraints that define a rotation matrix -and consequently the manifold- such as orthonormality \(\mathbf{R}^{T}\mathbf{R} = \mathbf{I}\) and \(\text{det}(\mathbf{R}) = 1\) make the inherent dimensionality of the rotation still 3. Interestingly, this is <em>exactly the dimensionality of the tangent spaces that can be defined over the manifold</em>.</p>
<p><strong>That is what makes working with manifolds so convenient</strong>: All the constraints that are part of the definition of the object are naturally handled, and we can work in tangent vector spaces using their <em>inherent</em> dimension. The same happens for rigid-body transformations (6 dimensions represented by a 16 elements matrix), quaternions (3 orientations represented by a 4D vector), <strong>and even objects that are not groups, such as unit vectors</strong>.</p>
<p>In order to work with manifolds, we need to define 2 operations, which are the key to transform objects between them and the tangent spaces:</p>
<ol>
<li><strong>Retract or retraction</strong>: An operation that maps elements \(\mathbf{\xi}\) from the tangent space at \(\mathbf{p}_1\) to the manifold: \(\mathbf{p} = \text{retract}_{\mathbf{p}_1}(\mathbf{\xi})\).</li>
<li><strong>Local</strong>: the opposite operation: mapping elements \(\mathbf{p}\) from the manifold to the tangent space $\mathbf{\xi} = \text{local}_{\mathbf{p}_1}(\mathbf{p})$</li>
</ol>
<p><a name="manifold_retract"></a></p>
<figure class="center">
<img src="/assets/images/uncertainties/manifold-retract-zoom.png" alt="Retract operation on a manifold" />
<figcaption>The <i>retract</i> operation maps an element $\mathbf{\xi}$ defined in the tangent space at $\mathbf{p}_1$ back to the manifold.</figcaption>
</figure>
<p><br /></p>
<p><a name="manifold_retract"></a></p>
<figure class="center">
<img src="/assets/images/uncertainties/manifold-retract.png" alt="Retract operation on a manifold simplified" />
<figcaption>An alternative figure to represent the retraction that will be used in the rest of the post, since it shows the interactions between the manifold and the tangent space more explicitly. We can clearly observe that the element $\mathbf{p}$ generated by the retraction is also defined with respect to $\mathbf{p}_1$ on the manifold.</figcaption>
</figure>
<p><br /></p>
<p><a name="manifold_local"></a></p>
<figure class="center">
<img src="/assets/images/uncertainties/manifold-local.png" alt="Local operation on a manifold" />
<figcaption>The <i>local</i> operation does the opposite, and maps an element $\mathbf{p}$ defined with respect to $\mathbf{p}_1$ as an vector $\mathbf{\xi}$ in the tangent space.</figcaption>
</figure>
<p><br /></p>
<p>Retractions are the fundamental concept to <a href="https://press.princeton.edu/absil">solve optimization problems</a> and to define uncertainties on manifolds. While the former will be covered later in this post with an example, it is useful to explain the latter now. The general idea is that we can define distributions on the tangent space, and map them back on the manifold using the retraction. For instance, we can define a zero-mean Gaussian variable \(\eta \sim Gaussian(\mathbf{0}_{n\times1}, \Sigma)\) in the tangent space centered at $\mathbf{p}$ and use the retraction:</p>
\[\begin{equation}
Gaussian(\mathbf{p}_1, \mathbf{\eta}) := \text{retract}_{\mathbf{p}_1}(\mathbf{\eta})
\end{equation}\]
<p>which graphically corresponds to:</p>
<p><a name="manifold_gaussian"></a></p>
<figure class="center">
<img src="/assets/images/uncertainties/manifold-gaussian.png" alt="Gaussian on a manifold" />
<figcaption>Using the retraction, we can define Gaussians on the tangent space centered at some element $\mathbf{p}_1$ and map them back on the manifold to construct Gaussians on the manifold with mean $\mathbf{p}_1$ and covariance $\text{Cov}(\eta) = \Sigma$.</figcaption>
</figure>
<p><br /></p>
<p>Please note that we have defined the retraction <strong>from the right</strong>, since this matches the convention used by GTSAM, which coincidently also matches the composition operator for groups. However, in the literature we can find different definitions: <a href="http://ncfrn.cim.mcgill.ca/members/pubs/barfoot_tro14.pdf">Barfoot and Furgale (2014, left-hand convention)</a>, <a href="https://arxiv.org/abs/1512.02363">Forster et al (2017, right-hand convention)</a>, and <a href="https://arxiv.org/abs/1906.07795">Mangelson et al. (2020, left-hand convention)</a> are some examples. Other definitions to define probability distributions on manifolds include <a href="https://arxiv.org/abs/1909.05946">Calinon (2020)</a> and <a href="https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.175.5054&rep=rep1&type=pdf">Lee et al. (2008)</a>, please refer to their work for further details.</p>
<h3 id="and-others-are-both-lie-groups">And others are both: Lie groups</h3>
<p>In GTSAM <strong>we only need the objects to be differentiable manifolds in order to optimize them</strong> in the estimation framework, <strong>being groups is not a requirement at all</strong>. However, it is important to be aware that <em>some</em> of the objects we deal with are <strong>both groups and differentiable manifolds</strong>, known as <a href="https://en.wikipedia.org/wiki/Lie_group"><strong>Lie groups</strong></a>, which is the formulation generally found in state estimation literature.</p>
<p><a name="lie_group_tangent"></a></p>
<figure class="center">
<img src="/assets/images/uncertainties/lie-group.png" alt="" />
<figcaption>Lie groups are both groups and differentiable manifolds. They define a tangent space at the identity that allows to represent any object on the manifold through a retraction defined at the identity, known as exponential map, as well as a local operation also at the identity - the logarithm map.</figcaption>
</figure>
<p><br /></p>
<p>Objects such as rigid-body matrices and quaternions are Lie groups. 2D rigid-body transformations are elements of the <em>Special Euclidean group</em> $\text{SE(2)}$, while 2D and 3D rotations are from the <em>Special Orthogonal groups</em> $\text{SO(2)}$ and $\text{SO(3)}$ respectively. 3D rigid-body transformations are objects of $\text{SE(3)}$ and we can use those definitions to define the operations we described before for groups and manifolds:</p>
<ol>
<li><strong>Composition</strong>: Matrix multiplication \(\mathbf{T}_{1} \ \mathbf{T}_{2}\).</li>
<li><strong>Identity</strong>: Identity matrix \(\mathbf{I}\).</li>
<li><strong>Inverse</strong>: Matrix inverse \((\mathbf{T}_{1})^{-1}\)</li>
<li><strong>Retract</strong>: The retraction is defined at the tangent space at the identity, which is known as the <em>exponential map</em> of \(\text{SE(3)}\): \(\text{retract}_{\mathbf{I}}(\mathbf{\xi}) := \text{Exp}(\mathbf{\xi})\).</li>
<li><strong>Local</strong>: It is also defined at the identity and known as the <em>logarithm map</em> of \(\text{SE(3)}\): \(\text{local}_{\mathbf{I}} := \text{Log}(\mathbf{T}_{1} )\).</li>
</ol>
<p>Please note here that we used <em>capitalized</em> \(\text{Log}(\cdot) := \text{log}( \cdot)^{\vee}\) and \(\text{Exp}(\cdot):=\text{exp}( (\cdot)^{\wedge})\) operators for simplicity as used by <a href="https://arxiv.org/abs/1512.02363">Forster et al (2017),</a> and <a href="https://arxiv.org/abs/1812.01537">Solà et al. (2020)</a>, since they are easy to understand under the retractions perspective. Refer to Solà et al. for a more detailed description, including the relationship between tangents spaces and the <em>Lie algebra</em>.</p>
<p>In GTSAM, 3D poses are implemented as <code class="language-plaintext highlighter-rouge">Pose3</code> objects and we can think of them as $\text{SE(3)}$ elements. Therefore, to keep things simple, we will stay using the logarithm map and exponential map to talk about their retraction and local operators. However, GTSAM also allows us to use alternative retractions for some cases, which would be like <em>using other Lie groups</em> to represent poses, such as the $\mathbb{R}^{3}\times\text{SO}(3)$ group, which is explained <a href="https://gtsam.org/notes/GTSAM-Concepts.html">here</a>.</p>
<h3 id="reference-frames-on-manifolds">Reference frames on manifolds</h3>
<p>Reference frames <strong>are preserved when applying the local and retract operations</strong>. We will cover a few important ideas using \(\text{SE(3)}\), since it is related to our original problem of pose estimation.</p>
<p>First of all, when we talked about <em>the tangent space defined at the identity</em>, in physical terms it refers to having a fixed, global frame, which we use to express our poses. Then, when using the <em>local</em> operation defined as the logarithm map of \(\text{SE(3)}\), we obtain a vector \(_W\mathbf{\xi}_{W} \in \mathbb{R}^{6}\) (sometimes also called <em>tangent vector</em> or <em>twist</em>), which is defined in the tangent space <em>at the world frame</em>:</p>
\[\begin{equation}
{_W}\mathbf{\xi} = \text{Log}(\mathbf{T}_{WB_i} )
\end{equation}\]
<p><a name="lie_group_frames"></a></p>
<figure class="center">
<img src="/assets/images/uncertainties/lie-group-frames.png" alt="" />
<figcaption>In this example, the tangent space at the identity represents the world frame, on which any transformation $\mathbf{T}_{WB_i}$ expressed in the world frame can be defined.</figcaption>
</figure>
<p><br /></p>
<p>Additionally, <em>we can add incremental changes to a transformation using the retraction</em>, which for \(\text{SE(3)}\) is done with the exponential map as follows:</p>
\[\begin{equation}
\mathbf{T}_{WB_{i+1}} = \mathbf{T}_{WB_i} \text{Exp}( {_{B_i}}\mathbf{\xi})
\end{equation}\]
<p>In this case we added an increment from the body frame at time $i$, that represents the new pose at time $i+1$. Please note that <strong>the increments are defined with respect to a reference frame, but they do not require to specify the resulting frame</strong>. Their meaning (representing a new pose at time $i+1$) is something that we -as users- define but is not explicit in the formulation. (<em>While we could do it, it can lead to confusions because in this specific case we are representing the pose at the next instant but we can also use retractions to describe corrections to the world frame as we will see in the final post.</em>)</p>
<p>The graphical interpretation with the manifold is consistent with our general definition of retractions and frames. Using the exponential map on the right is defining a tangent space at \(\mathbf{T}_{WB_i}\), which can be interpreted as a new <em>reference frame</em> at \(B_i\), which we used to define the increment:</p>
<p><a name="lie_group_frames_increment"></a></p>
<figure class="center">
<img src="/assets/images/uncertainties/lie-group-frames-increment.png" alt="Retraction with frames" />
<figcaption>When we add a small increment ${_{B_i}}\mathbf{\xi}$ to the pose $\mathbf{T}_{WB_i}$, the expression $\mathbf{T}_{WB_i} \text{Exp}( {_{B_i}}\mathbf{\xi})$ corresponds to the retraction defined for the tangent space at $\mathbf{T}_{WB_i}$.</figcaption>
</figure>
<p><br /></p>
<p>The incremental formulation via retractions is also convenient when we have local (body frame) velocity measurements -also known as <em>twists</em>- \(({_{B_i}}\omega, {_{B_i}}{v})\), with \({_{B_i}}\omega \in \mathbb{R}^{3}, {_{B_i}}v \in \mathbb{R}^{3}\) and we want to do <a href="https://en.wikipedia.org/wiki/Dead_reckoning"><em>dead reckoning</em></a>:</p>
\[\begin{equation}
\mathbf{T}_{WB_{i+1}} = {\mathbf{T}_{WB_i}} \text{Exp}\left(
\begin{bmatrix} _{B_i}\omega \\ _{B_i} v \end{bmatrix}\
\delta t \right)
\end{equation}\]
<p>The product
\(\begin{bmatrix} _{B_i}\omega \ \delta t \\ _{B_i} v \ \delta t\end{bmatrix}\)
represents the tangent vector resulting from time-integrating the velocity, which is map onto the manifold by means of the $\text{SE(3)}$ retraction:</p>
<p><a name="lie_group_frames_velocities"></a></p>
<figure class="center">
<img src="/assets/images/uncertainties/lie-group-frames-velocities.png" alt="Retraction with velocities" />
<figcaption>The same formulation can be used to map velocities into increments via retractions/exponential map. In this case, the time increment $\delta t$ controls the length of the curve that is projected on the manifold when we do dead-reckoning.</figcaption>
</figure>
<p><br /></p>
<p>This last example is also useful to understand the physical meaning of the retraction/local operations in the case of 3D poses, which is not always clear. An interesting aspect of illustrating this with $\text{SE}(3)$ is that the meaning also applies if we use rotations only (\(\text{SO}(3)\), $\text{SO}(2)$) or 2D poses ($\text{SE}(2)$):</p>
<p><a name="lie_group_frames_velocities"></a></p>
<figure class="center">
<img src="/assets/images/uncertainties/lie-group-frames-velocities-physical.png" alt="Retraction with velocities, physical interpretation" />
<figcaption>This figure illustrates the physical meaning of the $\text{SE}(3)$ retraction of the dead-reckoning example. In light blue we show the increment applied to pose ${\mathbf{T}_{WB_i}}$, which generates a <i>new</i> frame. At the right hand side we illustrate the decomposed increment: orientation at the top, and position at the bottom.
Each component of the <i>orientation increment</i> represents a change with respect to each rotation axis defined by the body pose at ${\mathbf{T}_{WB_i}}$ (rotations with respect to $X$, $Y$ and $Z$ axes, respectively, are shown). When combined, the vector represents an <i>axis-angle</i> representation of the rotation increment. <i>Position increments</i> (right-side, bottom) are easier to interpret because they only represent a shift in position with respect to ${\mathbf{T}_{WB_i}}$'s origin.
Since the retraction actually <i>combines</i> both orientation and position increments on the manifold, it generates a smooth trajectory as shown on the left-hand side (light blue).</figcaption>
</figure>
<p><br /></p>
<h3 id="some-remarks-on-the-retract-and-local-operations">Some remarks on the retract and local operations</h3>
<p>Lastly, we would like to share some important remarks that are not always stated in the math but <strong>can become real issues during the implementation</strong>. First, we need to be careful about <strong>the convention of the order of the components in the retraction/local operation</strong> (yes, more conventions again). Having clarity about the definition that every software or paper uses for these operations, even implicitly, is fundamental to make sense of the quantities we put into our estimation problems and the estimates we extract.</p>
<p>For instance, the definition of the $\text{SE(3)}$ retraction we presented, which matches <code class="language-plaintext highlighter-rouge">Pose3</code> in GTSAM, uses an <em>orientation-then-translation</em> convention, i.e, the 6D tangent vector has orientation in the first three coordinates, and translation in the last three $\mathbf{\xi} = (\phi, \rho)$, where $\phi \in \mathbb{R}^3$ denotes the orientation components while $\rho \in \mathbb{R}^3$ the translational ones. On the other hand, <code class="language-plaintext highlighter-rouge">Pose2</code> uses <em>translation-then-orientation</em> $(x, y, \theta)$ for <a href="https://github.com/borglab/gtsam/issues/160#issuecomment-562161665">historical reasons</a>.</p>
<p><strong>This is also ties the definition of the covariances</strong>. For <code class="language-plaintext highlighter-rouge">Pose3</code> objects, for instance, we must define covariance matrices that match the definition and meaning of the tangent vectors. In fact, the upper-left block must encode orientation covariances, i.e, our degree of uncertainty for each rotation axis, while the bottom-right encodes position covariances:</p>
\[\begin{equation}
\left[\begin{matrix} \Sigma_{\phi\phi} & \Sigma_{\phi\rho} \\ \Sigma_{\phi\rho}^{T} & \Sigma_{\rho\rho} \end{matrix}\right]
\end{equation}\]
<p>This is of paramount importance because, for example, the previous covariance matrix does not match the convention used in ROS to store the covariances in <a href="http://docs.ros.org/en/melodic/api/geometry_msgs/html/msg/PoseWithCovariance.html"><code class="language-plaintext highlighter-rouge">PoseWithCovariance</code></a>. Similar issues can arise when using different libraries or software, so we must be aware of their conventions. Our advice is <strong>to always check the retraction/exponential map definition</strong> if not stated in the documentation.</p>
<h2 id="bringing-everything-together">Bringing everything together</h2>
<p>Now we can return to our original estimation problem using rigid-body transformations. Let us recall that we defined the following process model given by odometry measurements:</p>
\[\begin{equation}
\mathbf{T}_{WB_{i+1}} = \mathbf{T}_{WB_i} \ \Delta\mathbf{T}_{B_{i} B_{i+1}} \mathbf{N}_i
\end{equation}\]
<p>We reported two problems before:</p>
<ol>
<li>We needed to define the noise $\mathbf{N}_i$ as a Gaussian noise, but it was a $4\times4$ matrix.</li>
<li>When isolating the noise to create the residual, we ended up with a matrix expression, not vector one as we needed in our estimation framework.</li>
</ol>
<p>We can identify now that our poses are objects of $\text{SE(3)}$, hence we can use the tools we just defined.</p>
<h3 id="defining-the-noise">Defining the noise</h3>
<p>The first problem of defining the noise appropriately is solved by using probability distributions on manifolds as we described before. We define <em>a zero-mean Gaussian in the tangent space at \(\Delta\mathbf{T}_{B_{i} B_{i+1}}\) and retract it onto the manifold</em>:</p>
\[\begin{equation}
\mathbf{T}_{WB_{i+1}} = \mathbf{T}_{WB_i} \ \Delta\mathbf{T}_{B_{i} B_{i+1}} \text{Exp}(_{B_{i+1}}\mathbf{\eta})
\end{equation}\]
<p><a name="manifold_gaussian_frame"></a></p>
<figure class="center">
<img src="/assets/images/uncertainties/lie-group-frames-gaussian.png" alt="Defining noise for the relative increment." />
<figcaption>Graphical interpretation of the definition of the noise, which is defined in frame $B_{i+1}$ and mapped using the retraction.</figcaption>
</figure>
<p><br /></p>
<p>where we have defined \({_{B_{i+1}}}\eta \sim Gaussian(\mathbf{0}_{6\times1},\ _{B_{i+1}}\Sigma)\). Please note that in order to match our right-hand convention, <strong>the covariance we use must be defined in the body frame at time $i+1$</strong>, i.e \(B_{i+1}\). Additionally, <strong>the covariance matrix must follow the same ordering defined by the retraction as we mentioned before</strong>.</p>
<h3 id="defining-the-residual">Defining the residual</h3>
<p>Having solved the first problem, we can now focus on the residual definition. We can isolate the noise as we did before, which holds:</p>
\[\begin{equation}
\text{Exp}( {_{B_{i+1}}}\mathbf{\eta}) = \Delta\mathbf{T}_{B_{i} B_{i+1}}^{-1}\ \mathbf{T}_{WB_i}^{-1} \ \mathbf{T}_{WB_{i+1}}
\end{equation}\]
<p>We can now apply the <strong>local</strong> operator of the manifold on both sides to <strong>map the residual to the tangent space</strong> (recall we are using the <em>logarithm map</em> for $\text{SE(3)}$ elements for simplicity):</p>
\[\begin{equation}
_{B_{i+1}}\mathbf{\eta} = \text{Log}\left( \Delta\mathbf{T}_{B_{i} B_{i+1}}^{-1}\ \mathbf{T}_{WB_i}^{-1} \ \mathbf{T}_{WB_{i+1}} \right)
\end{equation}\]
<p>Since the noise is defined in the tangent space, both sides denote vector expressions in $\mathbb{R}^{6}$. Both also correspond to zero-mean Gaussians, hence the right-hand side can be used as a proper factor in our estimation framework. In fact, the expression on the right side is <em>exactly</em> the same used in GTSAM to define the <a href="https://github.com/borglab/gtsam/blob/master/gtsam/slam/BetweenFactor.h#L90"><code class="language-plaintext highlighter-rouge">BetweenFactor</code></a>.</p>
<p><a name="lie_groups_frames_residual"></a></p>
<figure class="center">
<img src="/assets/images/uncertainties/lie-group-frames-residual.png" alt="Defining noise for the relative increment." />
<figcaption>The Between residual, illustrated. On the left hand side, we have the corresponding transformations involved in the computation expressed on the manifold. Since the error (in red) is also defined on the manifold, it is not described by a straight line. However, by applying the local operation, given by the logarithm map for $\text{SE(3)}$, we can map it to to the tangent space at $\Delta\mathbf{T}_{B_{i} B_{i+1}}$, which is a vector space (right). The error itself should lie within the Gaussian we created using the noise $_{B_{i+1}}\mathbf{\eta}$, which denotes the noise of the odometry model.</figcaption>
</figure>
<p><br /></p>
<p>We must also keep in mind here that by using the <strong>local</strong> operation, <strong>the residual vector will follow the same ordering</strong>. As we mentioned before for <code class="language-plaintext highlighter-rouge">Pose3</code> objects, it will encode orientation error in the first 3 components, while translation error in the last ones. In this way, if we write the expanded expression for the Gaussian factor, we can notice that all the components are weighted accordingly by the inverse covariance as we reviewed before (orientation first, and then translation):</p>
\[\begin{equation}
\begin{split}
\mathbf{r}_{\text{between}}(\mathbf{T}_{WB_i}, \mathbf{T}_{WB_{i+1}}) &= \left|\left| \text{Log}\left( \Delta\mathbf{T}_{B_{i} B_{i+1}}^{-1}\ \mathbf{T}_{WB_i}^{-1} \ \mathbf{T}_{WB_{i+1}} \right)\right|\right|^{2}_{\Sigma_i} \\
& =
\text{Log}\left( \Delta\mathbf{T}_{B_{i} B_{i+1}}^{-1}\ \mathbf{T}_{WB_i}^{-1} \ \mathbf{T}_{WB_{i+1}} \right)^{T} \ \Sigma_i^{-1} \\
& \qquad \qquad \qquad \text{Log}\left( \Delta\mathbf{T}_{B_{i} B_{i+1}}^{-1}\ \mathbf{T}_{WB_i}^{-1} \ \mathbf{T}_{WB_{i+1}} \right)
\end{split}
\end{equation}\]
<p>The factor is now a nonlinear vector expression that can be solved using the nonlinear optimization techniques we presented before. In fact, GTSAM already implements Jacobians for the <strong>local</strong> operator of all the objects implemented, which simplifies the process. However, there are subtle differences that we must clarify.</p>
<p>First, since the factor defines a residual in the tangent space at the current linearization point, the optimization itself is executed <strong>in the tangent space defined in the current linearization point</strong>. This means that when we linearize the factors and build the normal equations, the increment ${_{B_i}}\delta\mathbf{T}^{k}$ we compute lies in the tangent space.</p>
<p>For this reason, we need to update the variables <em>on the manifold</em> using the retraction. For example, for $\text{SE(3)}$:</p>
\[\begin{equation}
\mathbf{\tilde{T}}_{WB_i}^{k+1} = \mathbf{T}_{WB_i}^k \text{Exp}( {_{B_i}}\delta\mathbf{T}^{k} )
\end{equation}\]
<p>The second important point, is that the covariance that we can recover from the information matrix, <strong>will be also defined in the tangent space around the linearization point, following the convention of the retraction</strong>. It means that if our information matrix at the current linearization point is given by:</p>
\[\begin{equation}
\Sigma^{k+1} = ((\mathbf{H}^k)^{T} (\Sigma^{k})^{-1} \mathbf{H}^k)^{-1}
\end{equation}\]
<p>Then, the corresponding distribution of the solution around the current linearization point $\mathbf{T}_{WB_i}^{k+1}$ will be given by:</p>
\[\begin{equation}
Gaussian(\mathbf{T}_{WB_i}^{k+1}, \Sigma^{k+1}) := \mathbf{T}_{WB_i}^{k+1} \text{Exp}( {_{B_i}}\eta^{k+1} )
\end{equation}\]
<p>where \({_{B_i}}\eta^{k+1} \sim Gaussian(\mathbf{0}_{6\times1}, {_{B_i}}\Sigma^{k+1})\). As a consequence of the convention on the retraction, <strong>the resulting covariance is expressed in the body frame as well, and uses orientation-then-translation for the ordering of the covariance matrix</strong>.</p>
<h2 id="conclusions">Conclusions</h2>
<p>In this second part we extended the estimation framework presented previously by introducing the reference frames in an explicit manner into our notation, which helped to understand the meaning of the quantities.</p>
<p>We also reviewed the concept of groups of manifolds. We discussed that manifolds are the essential concept to generalize the optimization framework and probability distributions as defined in GTSAM.</p>
<p>We presented the idea of <em>right-hand</em> and <em>left-hand</em> conventions which, while not standard, allowed us to identify different formulations that can be found in the literature to operate groups and manifolds. By explicitly stating that GTSAM uses a right-hand convention for the composition of groups as well as retractions on manifolds we could identify the frames used to define the variables, as well as the covariance we obtain from the solution via <code class="language-plaintext highlighter-rouge">Marginals</code>.</p>
<p>Additionally, the definition of the local and retract operations also have direct impact on the ordering of the covariance matrices, which also varies depending on the object. For instance, we discussed that <code class="language-plaintext highlighter-rouge">Pose2</code> use a <em>translation-then-orientation</em> convention, while <code class="language-plaintext highlighter-rouge">Pose3</code> does <em>orientation-then-translation</em>. This is particularly important when we want to use GTSAM quantities with different software, such as ROS, which use a <em>translation-then-orientation</em> convention for their 3D pose structures for instance (<a href="http://docs.ros.org/en/melodic/api/geometry_msgs/html/msg/PoseWithCovariance.html"><code class="language-plaintext highlighter-rouge">PoseWithCovariance</code></a>).</p>
<p>The <a href="https://gtsam.org/2021/02/23/uncertainties-part3.html">final post</a> will cover some final remarks regarding the conventions we defined and applications of the <em>adjoint</em> of a Lie group, which is quite useful to properly transform covariance matrices defined for poses.</p>Reducing the uncertainty about the uncertainties, part 3: Adjoints and covariances2021-02-23T00:00:00+00:002021-02-23T00:00:00+00:00http://gtsam.org/2021/02/23/uncertainties-part3<link rel="stylesheet" href="/assets/css/slideshow.css" />
<p>Author: <a href="https://mmattamala.github.io">Matias Mattamala</a></p>
<div style="display:none"> <!-- custom latex commands here -->
$
\usepackage{amsmath}
\usepackage[makeroom]{cancel}
\DeclareMathOperator*{\argmin}{argmin}
\newcommand{\coloneqq}{\mathrel{:=}}
$
</div>
<style>
.MJXc-display {
overflow-x: auto;
overflow-y: hidden;
scrollbar-width: none; /* Firefox */
-ms-overflow-style: none; /* Internet Explorer 10+ */
}
.MJXc-display::-webkit-scrollbar {
width: 5px;
height: 2px;
}
.MJXc-display::-webkit-scrollbar-track {
background: transparent;
}
.MJXc-display::-webkit-scrollbar-thumb {
background: #ddd;
visibility:hidden;
}
.MJXc-display:hover::-webkit-scrollbar-thumb {
visibility:visible;
}
</style>
<p><!-- horizontal scrolling --></p>
<!-- - TOC -->
<h2 id="introduction">Introduction</h2>
<p>In our <a href="https://gtsam.org/2021/02/23/uncertainties-part2.html">previous post</a> we presented the concepts of reference frames, and how they are related to groups and manifolds to give physical meaning to the variables of our estimation framework. We remarked that the only property required for our GTSAM objects to be optimized and define probability distributions was to be <em>manifolds</em>. However, groups were useful to understand the composition of some objects, and to present the idea of <em>Lie groups</em>, which were both groups and differentiable manifolds.</p>
<p>This last section is mainly concerned about Lie groups, which is the case for most of the objects we use to represent robot pose. In particular, we will review the concept of <strong>adjoint</strong> of a Lie group, which will helps us to relate increments or correction applied on the right-hand side, with those applied on the left-side. Such property will allow us to manipulate uncertainties defined for Lie groups algebraically, and to obtain expressions for different covariance transformations. We will focus on 3D poses, i.e $\text{SE(3)}$, because of its wide applicability but similar definitions should apply for other Lie groups since they mainly rely on a definition of the adjoint.</p>
<p>Most of this expressions have been already shown in the literature by <a href="http://ncfrn.cim.mcgill.ca/members/pubs/barfoot_tro14.pdf">Barfoot and Furgale (2014)</a> and <a href="https://arxiv.org/abs/1906.07795">Mangelson et al. (2020)</a> but since they follow a left-hand convention they are not straightforward to use with GTSAM. We provide the resulting expressions for the covariance transformations following Mangelson et al. but we recommend to refer to their work to understand the details of the process.</p>
<h2 id="adjoints">Adjoints</h2>
<p>Let us consider a case similar to previous examples in which we were adding a small increment \(_{B_i}\mathbf{\xi}\) to a pose \(\mathbf{T}_{WB_i}\).</p>
\[\begin{equation}
\mathbf{T}_{W_i B_{i}} \text{Exp}( _{B_i}\mathbf{\xi})
\end{equation}\]
<p>where we have used the retraction of $\text{SE(3)}$ to add a small increment using the <em>right-hand convention</em> as GTSAM does.</p>
<p><a name="right-correction"></a></p>
<figure class="center">
<img src="/assets/images/uncertainties/adjoint-right.png" alt="Correction from the right" />
<figcaption>Similarly to the cases discussed previously, applying a correction from the right can be interpreted as creating a new body frame at time $i+1$ from a previous frame $B_i$.</figcaption>
</figure>
<p><br /></p>
<p>However, for some applications we can be interested in applying a correction <em>from the left</em>:</p>
\[\begin{equation}
\mathbf{T}_{W_{i+1} B} = \text{Exp}( _{W_i}\mathbf{\xi}) \mathbf{T}_{W_i B_i}
\end{equation}\]
<p>Please note that this case is different to the example in the previous post in which we had the frames inverted. <strong>Here we are effectively applying a correction on the world frame</strong>.</p>
<p><a name="left-correction"></a></p>
<figure class="center">
<img src="/assets/images/uncertainties/adjoint-left.png" alt="Correction from the left" />
<figcaption>If we define a correction from the left, it is like we are creating a new reference frame at time $i+1$ from a frame $W_i$.</figcaption>
</figure>
<p><br /></p>
<p>We are interested in finding out a correction on the body that can lead to the same result that a correction applied on the right. For that specific correction both formulations would be effectively representing the same pose but using <strong>different reference frames</strong>. This is shown in the next figure:</p>
<p><a name="adjoint"></a></p>
<figure class="center">
<img src="/assets/images/uncertainties/adjoint-all-transformations.png" alt="All transformations together" />
<figcaption>There exists a correction applied to the body frame (right convention) that can lead to equivalent transformations than corrections applied on the world frame (left convention)</figcaption>
</figure>
<p><br /></p>
<p>In order to find it, we can write an equivalence between corrections applied on the body frame and the world frame as follows:</p>
\[\begin{equation}
\text{Exp}( _{W}\mathbf{\xi}) \mathbf{T}_{WB} = \mathbf{T}_{WB} \text{Exp}( _{B}\mathbf{\xi})
\end{equation}\]
<p>We dropped the time-related indices for simplicity, since this is a geometric relationship. In order to satisfy this condition, the incremental change we applied on the left-hand side, i.e in the world frame, \(_{W}\mathbf{\xi}\) must be given by:</p>
\[\begin{equation}
\text{Exp}( _{W}\mathbf{\xi}) = \mathbf{T}_{WB} \text{Exp}( _{B}\mathbf{\xi}) \mathbf{T}_{WB}^{-1}
\end{equation}\]
<p>which we obtained by isolating the term. The expression on the right-hand side is known as the <em>adjoint action</em> of $\text{SE(3)}$. This relates increments applied on the left to ones applied on the right.</p>
<p><a name="adjoint"></a></p>
<figure class="center">
<img src="/assets/images/uncertainties/adjoint-equal.png" alt="Correction from the left" />
<figcaption>Our previous procedure allowed us to identify that such relationship is described by the Adjoint, shown at the right-hand side of the equation.</figcaption>
</figure>
<p><br /></p>
<p>For our purposes, it is useful to use an equivalent alternative expression that applies directly to elements from the tangent space (<a href="https://arxiv.org/abs/1812.01537">Solà et al.</a> gives a more complete derivation as a result of some properties we omitted here):</p>
\[\begin{equation}
\text{Exp}( _{W}\mathbf{\xi}) \mathbf{T}_{WB_i} = \mathbf{T}_{WB_i} \text{Exp}( \text{Ad}_{T_{WB_i}^{-1}} {_{W}}\mathbf{\xi})
\end{equation}\]
<p>where \(\text{Ad}_{T_{WB_i}^{-1}}\) is known as the <em>adjoint matrix</em> or <strong><em>adjoint</em> of</strong> \(T_{WB_i}^{-1}\). The adjoint acts over elements of the tangent space directly, changing their reference frame. Please note that the same subindex cancelation applies here, so we can confirm that the transformations are correctly defined.</p>
<p>We can also interpret this as a way to consistently <em>move</em> increments applied on the left-hand side (in world frame) to the right-hand side (body frame), which is particularly useful to keep the right-hand convention for the retractions and probability distributions consistent. This is the main property we will use in the next sections to define some covariance transformations, and it is already implemented in <code class="language-plaintext highlighter-rouge">Pose3</code> as <a href="https://github.com/borglab/gtsam/blob/develop/gtsam/geometry/Pose3.cpp#L58"><code class="language-plaintext highlighter-rouge">AdjointMap</code></a>.</p>
<h2 id="distribution-of-the-inverse">Distribution of the inverse</h2>
<p>As a first example on how the adjoint helps to manipulate covariances, let us consider the case in which we have the solution of a factor graph, with covariances defined in the body frame as we have discussed previously. We are interested in obtaining an expression to express the covariance in the world frame:</p>
<p><a name="inverse-pose"></a></p>
<figure class="center">
<img src="/assets/images/uncertainties/covariance-inverse.png" alt="Distribution of the inverse" />
<figcaption>Given the covariance of the robot in the body frame $B_i$ (left), and we are interested in obtaining a formula to express the covariance in the world frame $W$ following GTSAM's right-hand convention (right).</figcaption>
</figure>
<p><br /></p>
<p>The distribution of the pose assuming Gaussian distribution will be expressed by:</p>
\[\begin{equation}
\mathbf{\tilde{T}}_{WB} = \mathbf{T}_{WB} \text{Exp}( _{B}\mathbf{\eta})
\end{equation}\]
<p>with \(_{B}\mathbf{\eta}\) zero-mean Gaussian noise with covariance $\Sigma_{B}$ as before. The distribution of the inverse can be computed by inverting the expression:</p>
\[\begin{align}
(\mathbf{\tilde{T}}_{WB})^{-1} & = (\mathbf{T}_{WB} \text{Exp}( _{B}\mathbf{\eta}) )^{-1}\\
& = (\text{Exp}( _{B}\mathbf{\eta}) )^{-1}\ \mathbf{T}_{WB}^{-1}\\
& = \text{Exp}(- _{B}\mathbf{\eta}) \ \mathbf{T}_{WB}^{-1}
\end{align}\]
<p>However, the <em>noise</em> is defined on the left, which is inconvenient because is still a covariance in the body frame, but it is also inconsistent with right-hand convention. We can move it to the right using the adjoint:</p>
\[\begin{equation}
(\mathbf{\tilde{T}}_{WB})^{-1} = \ \mathbf{T}_{WB}^{-1}\ \text{Exp}(- \text{Ad}_{\mathbf{T}_{WB}} {_{B}}\mathbf{\eta})
\end{equation}\]
<p>This is a proper distribution following the right-hand convention, which defines the covariance in the world frame. The covariance of the inverse is then given by:</p>
\[\begin{equation}
\Sigma_{W} = \text{Ad}_{\mathbf{T}_{WB}} \Sigma_B \text{Ad}_{\mathbf{T}_{WB}}^{T}
\end{equation}\]
<h2 id="distribution-of-the-composition">Distribution of the composition</h2>
<p>A different case is if we have the distribution of a pose \(\mathbf{\tilde{T}}_{WB_i}\) and we also have an increment given by another distribution \(\mathbf{\tilde{T}}_{B_i B_{i+1}}\). By doing some algebraic manipulation, we can determine the distribution of the composition \(\mathbf{\tilde{T}}_{WB_{i+1}}\). This is useful when doing dead reckoning for instance:</p>
<p><a name="composition-pose"></a></p>
<figure class="center">
<img src="/assets/images/uncertainties/covariance-composition.png" alt="Distribution of the composition" />
<figcaption>In our second example, we want to algebraically determine the covariance of $\mathbf{\tilde{T}}_{WB_{i+1}}$ (right), given the distributions of the initial pose $\mathbf{\tilde{T}}_{WB_i}$ and the relative increment $\mathbf{\tilde{T}}_{B_i B_{i+1}}$ (left).</figcaption>
</figure>
<p><br /></p>
<p>We can determine the distribution of the composition (and its covariance) if we follow a similar formulation as before:</p>
\[\begin{align}
\mathbf{\tilde{T}}_{WB_{i+1}} &= \mathbf{\tilde{T}}_{WB_i} \mathbf{\tilde{T}}_{B_i B_{i+1}}\\
&= \mathbf{T}_{WB_i} \text{Exp}( _{B_i}\mathbf{\eta})\ \mathbf{T}_{B_i B_{i+1}} \text{Exp}( _{B_{i+1}}\mathbf{\eta})
\end{align}\]
<p>Analogously, we need to <em>move</em> the noise \(_{B_i}\mathbf{\eta}\) to the right, so as to have <em>the transformations to the left</em> (which represent the mean of the distribution), <em>and the noises to the right</em> (that encode the covariance). We can use the adjoint again:</p>
\[\begin{equation}
\mathbf{\tilde{T}}_{WB_i} = \mathbf{T}_{WB_i} \ \mathbf{T}_{B_i B_{i+1}} \text{Exp}(\text{Ad}_{\mathbf{T}_{B_i B_{i+1}}^{-1}} {_{B_i}}\mathbf{\eta})\ \text{Exp}( _{B_{i+1}}\mathbf{\eta})
\end{equation}\]
<p>However, we cannot combine the exponentials because that would assume commutativity in the group that does not hold for $\text{SE(3)}$ as we discussed previously. Still, it is possible to use some approximations (also discussed in Mangelson’s) to end up with the following expressions for the covariance of the composition:</p>
\[\begin{equation}
\Sigma_{B_{i+1}} = \text{Ad}_{\mathbf{T}_{B_i B_{i+1}}^{-1}} \Sigma_{B_{i}} \text{Ad}_{\mathbf{T}_{B_i B_{i+1}}^{-1}}^{T} + \Sigma_{B_{i+1}}
\end{equation}\]
<p>Additionally, if we consider that the poses are correlated and then the covariance of the joint distribution is given by:</p>
\[\begin{equation}
\begin{bmatrix} \Sigma_{B_{i}} & \Sigma_{B_{i}, B_{i+1}} \\ \Sigma_{B_{i}, B_{i+1}}^{T} & \Sigma_{B_{i+1}} \end{bmatrix}
\end{equation}\]
<p>We have that the covariance of the composition is:</p>
\[\begin{equation}
\begin{split}
\Sigma_{B_{i+1}} &= \text{Ad}_{\mathbf{T}_{B_i B_{i+1}}^{-1}} \Sigma_{B_{i}} \text{Ad}_{\mathbf{T}_{B_i B_{i+1}}^{-1}}^{T} + \Sigma_{B_{i+1}} \\
&\qquad + \text{Ad}_{\mathbf{T}_{B_i B_{i+1}}^{-1}} \Sigma_{B_{i}, B_{i+1}} + \Sigma_{B_{i}, B_{i+1}}^{T} \text{Ad}_{\mathbf{T}_{B_i B_{i+1}}^{-1}}^{T}
\end{split}
\end{equation}\]
<h2 id="distribution-of-the-relative-transformation">Distribution of the relative transformation</h2>
<p>Lastly, it may be the case that we have the distributions of two poses expressed in the same reference frame, and we want to compute the distribution of the relative transformation between them. For example, if we have an odometry system that provides estimates with covariance and we want to use relative measurements as factors for a pose graph SLAM system, we will need the mean and covariance of the relative transformation.</p>
<p><a name="relative-pose"></a></p>
<figure class="center">
<img src="/assets/images/uncertainties/covariance-relative.png" alt="Distribution of the relative transformation" />
<figcaption>In this last example we want to determine the covariance of the relative transformation $\mathbf{\tilde{T}}_{B_{i} B_{i+1}}$ (right) given the distributions of the poses $\mathbf{\tilde{T}}_{W B_{i}}$ and $\mathbf{\tilde{T}}_{B_i B_{i+1}}$ (left).</figcaption>
</figure>
<p><br /></p>
<p>To determine it we follow the same algebraic procedure:</p>
\[\begin{align}
\mathbf{\tilde{T}}_{B_i B_{i+1}} &= \mathbf{\tilde{T}}_{W B_{i}}^{-1} \mathbf{\tilde{T}}_{WB_{i+1}}\\
&= \left( \mathbf{T}_{WB_i} \text{Exp}( _{B_i}\mathbf{\eta}) \right)^{-1} \ \mathbf{T}_{WB_{i+1}} \text{Exp}( _{B_{i+1}}\mathbf{\eta})\\
&= \text{Exp}(- _{B_i}\mathbf{\eta}) \ \mathbf{T}_{WB_i}^{-1} \ \mathbf{T}_{WB_{i+1}} \text{Exp}( _{B_{i+1}}\mathbf{\eta})
\end{align}\]
<p>Similarly, we use the adjoint to move the exponential twice:</p>
\[\begin{align}
\mathbf{\tilde{T}}_{B_i B_{i+1}} &= \mathbf{T}_{WB_i}^{-1} \text{Exp}(- \text{Ad}_{\mathbf{T}_{WB_i}} \ {_{B_i}}\mathbf{\eta}) \ \mathbf{T}_{WB_{i+1}} \text{Exp}( _{B_{i+1}}\mathbf{\eta})\\
&= \mathbf{T}_{WB_i}^{-1} \ \mathbf{T}_{WB_{i+1}} \ \text{Exp}(- \text{Ad}_{\mathbf{T}_{WB_{i+1}}^{-1}} \ \text{Ad}_{\mathbf{T}_{WB_i}}\ {_{B_i}}\mathbf{\eta}) \ \text{Exp}( _{B_{i+1}}\mathbf{\eta})
\end{align}\]
<p>Hence, the following covariance holds for the relative pose assuming independent poses:</p>
\[\begin{equation}
\Sigma_{B_{i+1}} = \left(\text{Ad}_{\mathbf{T}_{WB_{i+1}}^{-1}} \text{Ad}_{\mathbf{T}_{WB_i}} \right) \Sigma_{B_{i}} \left(\text{Ad}_{\mathbf{T}_{WB_{i+1}}^{-1}} \text{Ad}_{\mathbf{T}_{WB_i}}\right)^{T} + \Sigma_{B_{i+1}}
\end{equation}\]
<p>A problem that exist with this expression, however, is that by assuming independence the covariance of the relative poses <em>will be larger than the covariance of each pose separately</em>. This is consistent with the <a href="https://mathworld.wolfram.com/NormalDifferenceDistribution.html">1-dimensional case in which we compute the distribution of the difference of <em>independent Gaussians</em></a>, in which the mean is the difference while the covariance gets increased. However, this is not the result that we would want, since our odometry factors will degrade over time.</p>
<p>Mangelson et al. showed that if some correlations exists (as we showed for the composition example) and it is explicitly considered in the computation, the estimates get more accurate and the covariance is not over or underestimated. The corresponding expression that complies with GTSAM is then:</p>
\[\begin{equation}
\begin{split}
\Sigma_{B_{i+1}} & = \left(\text{Ad}_{\mathbf{T}_{WB_{i+1}}^{-1}} \text{Ad}_{\mathbf{T}_{WB_i}} \right) \Sigma_{B_{i}} \left(\text{Ad}_{\mathbf{T}_{WB_{i+1}}^{-1}} \text{Ad}_{\mathbf{T}_{WB_i}}\right)^{T} + \Sigma_{B_{i+1}} \\
&\qquad - \left(\text{Ad}_{\mathbf{T}_{WB_{i+1}}^{-1}} \text{Ad}_{\mathbf{T}_{WB_i}} \right) \Sigma_{B_{i} B_{i+1}} - \Sigma_{B_{i} B_{i+1}}^{T} \left(\text{Ad}_{\mathbf{T}_{WB_{i+1}}^{-1}} \text{Ad}_{\mathbf{T}_{WB_i}} \right)^{T}
\end{split}
\end{equation}\]
<h2 id="conclusions">Conclusions</h2>
<p>In this final post we reviewed the concept of the adjoint of a Lie group to explain the relationship between increments applied on the left with increments applied on the right. This was the final piece we needed to ensure that our estimates are consistent with the conventions used in GTSAM.</p>
<p>We also presented a few expressions to operate distributions of poses. While they were presented previously in the literature, we showed general guidelines on how to manipulate the uncertainties to be consistent with the frames and convention we use.</p>
<h2 id="acknowledgments">Acknowledgments</h2>
<p>I would like to thank again the interesting discussions originated in the <a href="https://groups.google.com/g/gtsam-users/c/c-BhH8mfqbo/m/7Wsj_nogBAAJ">gtsam-users</a> group. Stefan Gächter guided a rich conversation doing some important questions that motivated the idea of writing this post.</p>
<p>Coincidently, similar discussions we had at the <a href="https://ori.ox.ac.uk/labs/drs/">Dynamic Robot Systems</a> group at the University of Oxford were aligned with the topics discussed here and facilitated the writing process. Special thanks to Yiduo Wang and Milad Ramezani for our conversations, derivation and testing of the formulas for the covariance of relative poses presented here, Marco Camurri for feedback on the notation and proofreading, and Maurice Fallon for encouraging to write this post.</p>
<p>Finally, thanks to Frank Dellaert, José Luis Blanco-Claraco, and John Lambert for proofreading the posts and their feedback, as well as Michael Bosse for providing insightful comments to ensure the consistency of the topics presented here.</p>Mount Rainier’s Eigenvectors2020-08-30T00:00:00+00:002020-08-30T00:00:00+00:00http://gtsam.org/2020/08/30/Laplacian<h3 id="frank-dellaert-august-30-2020"><a href="https://dellaert.github.io">Frank Dellaert</a>, August 30, 2020</h3>
<p>In this post I’ll talk a bit about estimating absolute quantities from relative measurements, using the reconstruction of <a href="https://en.wikipedia.org/wiki/Mount_Rainier">Mount Rainier</a> as a motivating example. I’ll show how the Hessian of that problem is exactly the “Graph Laplacian” from graph theory, and relate the eigen-decomposition of that graph with the properties of the reconstruction.</p>
<p>There is an accompaning <a href="https://www.youtube.com/watch?v=-U00XjFLQag">Youtube video</a>.</p>
<div>
Also, this post was created as a Colab notebook, and you can run it yourself here: <a href="https://colab.research.google.com/github/borglab/gtsam.org/blob/master/notebooks/Laplacian.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" /></a>
<p />
</div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#@Libraries we need
import networkx as nx
import matplotlib.pyplot as plt
from matplotlib import image
from mpl_toolkits.mplot3d import Axes3D
%matplotlib inline
import numpy as np
import scipy
from sklearn.neighbors import NearestNeighbors
</code></pre></div></div>
<h2 id="reconstructing-mount-rainier">Reconstructing Mount Rainier</h2>
<p>Imagine you and your friends have climbed Mount Rainier, and you were able to obtain a large number of <em>relative</em> height measurements: can we reconstruct the shape of the mountain from those? It turns out the answer is yes, modulo an unknown height offset.</p>
<p>To simulate this scenario, I will use a <a href="https://en.wikipedia.org/wiki/Digital_elevation_model">Digital Elevation Model (DEM)</a>. It’s not super-easy to get one’s hands on digital elevation data, but luckily <a href="https://mattbooty.wordpress.com/2019/01/20/3d-printing-a-model-of-mount-rainier-using-digital-elevation-model-dem-files/">Matt Booty has done it for us</a> for Mount Rainier. Below we download the image and load it into this notebook.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>!wget -q https://mattbooty.files.wordpress.com/2019/01/mr-dem-height-map-output-03.jpg
dem = image.imread('mr-dem-height-map-output-03.jpg')
fig, ax = plt.subplots(figsize=(6, 6))
ax.imshow(dem);
</code></pre></div></div>
<p><img src="/assets/images/Laplacian//Laplacian_3_0.png" alt="png" /></p>
<p>First, we’ll get some sample points, corresponding to where you and your friends would have taken measurements. In this example we assume we know the 2D locations of the points, although it is not hard to extend the example beyond that assumption.</p>
<p>The code below samples 500 different locations and shows them on the DEM:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>coordinates = np.random.multivariate_normal(mean=[420,800], cov=100*100*np.eye(2),size=500)
fig, ax = plt.subplots(figsize=(6, 6))
ax.imshow(dem);
ax.scatter(coordinates[:,0], coordinates[:,1], marker='.')
plt.show()
</code></pre></div></div>
<p><img src="/assets/images/Laplacian//Laplacian_5_0.png" alt="png" /></p>
<p>Let’s collect all 2D and their associated ground-truth heights into a numpy array, filtering out samples from the black border.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>truth = np.array([(x,y,dem[y,x,0]) for x,y in np.round(coordinates).astype(np.int) if dem[y,x,0]>0])
N = truth.shape[0]
z = truth[:,2]
</code></pre></div></div>
<p>To create relative measurements, we’ll use a graph derived from the K-nearest neighbors.</p>
<p>The associated estimation problem is quite simple, and we can see it as a factor graph with just pairwise linear factors $|z_i -z_j - \delta_{ij}|^2$. In this linear case, the factor graph is completely equivalent to a sparse rectangular matrix $H$, with entries 1 and -1. For every edge $(i,j)$ in the nearest neighbor graph, there is one row/factor.</p>
<p>From the graph $G$ we can easily generate this sparse measurement matrix $H$ using the (oriented) <code class="language-plaintext highlighter-rouge">incidence_matrix</code> function in <code class="language-plaintext highlighter-rouge">networkx</code>. The relative measurements $d$ are then obtained as $d=H z$ where $z$ are the ground truth height values.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># Find nearest Neighbors
neigh = NearestNeighbors(n_neighbors=6, radius=100)
neigh.fit(truth)
A = neigh.kneighbors_graph()
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># Create graph
G0 = nx.from_scipy_sparse_matrix(A)
# Create measurement matrix H
H = nx.incidence_matrix(G0, oriented=True).transpose()
M, N = H.shape
# Simulate measurements d
d = H @ z + np.random.normal(size=(M,),scale=1)
plt.hist(d);
</code></pre></div></div>
<p><img src="/assets/images/Laplacian//Laplacian_10_0.png" alt="png" /></p>
<p>Unfortunately relative measurements alone can never tell us about absolute height. To remedy this we add one row to $H$ to clamp $\hat{z}[0]$ to $100$, and we then use the <code class="language-plaintext highlighter-rouge">lsqr</code> method in scipy to obtain estimated heights $\hat{z}$:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>H1 = scipy.sparse.bmat([[H], [scipy.sparse.eye(1,N)]])
d1 = np.concatenate((d,[truth[0,2]]))
z, istop, itn, r1norm = scipy.sparse.linalg.lsqr(H1, d1)[:4]
</code></pre></div></div>
<p>Finally, let’s render the result in 3D:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>fig = plt.figure(figsize=(12,8))
ax = fig.add_subplot(111, projection='3d')
ax.scatter(truth[:,0], truth[:,1], z, marker='.');
</code></pre></div></div>
<p><img src="/assets/images/Laplacian//Laplacian_14_0.png" alt="png" /></p>
<p>That does not look half-bad ;-)</p>
<h2 id="of-jacobians-hessians-and-graph-laplacians">Of Jacobians, Hessians, and Graph Laplacians</h2>
<p>The large (but sparse) measurement matrix $H$ we used above is also called the <strong>Jacobian</strong>, and the <code class="language-plaintext highlighter-rouge">lsqr</code> method solved the linear least-squares problem $\hat{z} = \arg\min_z |H z - d |^2$. This is really just a multivariate quadratic with its curvature given by the <strong>Hessian</strong> $Q\doteq H^TH$. The Hessian is also known as the <strong>information matrix</strong>, and is equal to the inverse of the <strong>covariance matrix</strong> $P\doteq Q^{-1}$.</p>
<p>The significance of all this that the Hessian $Q$ tells us exactly <em>how much we know</em> about the estimated heights given the measurements and the uncertainty associated with the measurements (which here we silently assumed to be zero-mean Gaussian additive noise with unit standard deviation $\sigma=1$).</p>
<p>To explore this in a bit more depth, let’s simplify the problem even more and only consider 1D mountains. Below we walk up a mountain and record relative height measurements between 6 subsequent points:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>z = np.array([100,120,130,120,110,110])
plt.plot(z,'-*');
</code></pre></div></div>
<p><img src="/assets/images/Laplacian//Laplacian_18_0.png" alt="png" /></p>
<p>Let us take measurements both between neighboring points as well as points one hop away, which should intuitively improve our estimates:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>G = nx.Graph()
G.add_edges_from([(0,1),(0,2),(1,2),(1,3),(2,3),(2,4),(3,4),(3,5),(4,5)])
H = nx.incidence_matrix(G, oriented=True).transpose()
M, N = H.shape
d = H @ z + np.random.normal(size=(M,),scale=1)
</code></pre></div></div>
<p>If we calculate at the Hessian $Q=H^T H$ for this simpler problem, we see that it is a $6\times 6$ matrix:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Q = H.T@ H
print(Q.toarray().astype(np.int))
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[[ 2 -1 -1 0 0 0]
[-1 3 -1 -1 0 0]
[-1 -1 4 -1 -1 0]
[ 0 -1 -1 4 -1 -1]
[ 0 0 -1 -1 3 -1]
[ 0 0 0 -1 -1 2]]
</code></pre></div></div>
<p>Graph theory also defines the <a href="https://en.wikipedia.org/wiki/Laplacian_matrix">Laplacian Matrix</a> $L=D-A$ of an undirected graph $G$, where $D$ is a diagonal degree matrix and $A$ is the adjacency matrix of the graph.</p>
<p>The Laplacian $L$ associated with this matrix is easy to calculate, but we’ll use <code class="language-plaintext highlighter-rouge">networkx</code> to do it for us:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>L = nx.linalg.laplacian_matrix(G).toarray()
print(L)
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[[ 2 -1 -1 0 0 0]
[-1 3 -1 -1 0 0]
[-1 -1 4 -1 -1 0]
[ 0 -1 -1 4 -1 -1]
[ 0 0 -1 -1 3 -1]
[ 0 0 0 -1 -1 2]]
</code></pre></div></div>
<p>As we can see, <strong>for this problem the Hessian and the Graph Laplacian are identical</strong>.</p>
<p>This nicely connects estimation and graph theory. In particular, in graph theory the eigenvalues and eigenvectors of the graph Laplacian $L$ are of particular interest, and we will be able to connect them to the SVD of the measurement matrix $H$ and by extension $Q$.</p>
<p>First, let us calculate the eigen-decomposition of $L$. We will use the <a href="https://en.wikipedia.org/wiki/Singular_value_decomposition">singular value decomposition or SVD</a> to do this as that nicely sorts the eigenvalues for us in decreasing magnitude order:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>eigenvectors, eigenvalues, _ = np.linalg.svd(L)
print(np.round(eigenvalues,2))
print(np.round(eigenvectors,2))
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[5.34 5. 3.47 3. 1.19 0. ]
[[ 0.23 0. -0.29 0.58 0.6 0.41]
[-0.1 -0.5 0.62 -0.29 0.33 0.41]
[-0.66 0.5 -0.19 -0.29 0.16 0.41]
[ 0.66 0.5 0.19 -0.29 -0.16 0.41]
[ 0.1 -0.5 -0.62 -0.29 -0.33 0.41]
[-0.23 0. 0.29 0.58 -0.6 0.41]]
</code></pre></div></div>
<p>However, because we have $L=Q=H^TH$, we can <em>also</em> do obtain this by taking the SVD of $H$:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>_, S, V = np.linalg.svd(H.toarray())
print(np.round(S**2,2)) # square to get eigenvalues of Q!
print(np.round(V.transpose(),2))
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[5.34 5. 3.47 3. 1.19 0. ]
[[ 0.23 -0. -0.29 0.58 0.6 -0.41]
[-0.1 -0.5 0.62 -0.29 0.33 -0.41]
[-0.66 0.5 -0.19 -0.29 0.16 -0.41]
[ 0.66 0.5 0.19 -0.29 -0.16 -0.41]
[ 0.1 -0.5 -0.62 -0.29 -0.33 -0.41]
[-0.23 -0. 0.29 0.58 -0.6 -0.41]]
</code></pre></div></div>
<p>Based on this, we can start to understand some properties that are often discussed in the context of the graph Laplacian:</p>
<p><strong>1. The smallest eigenvalue of $L$ is zero, and the associated eigenvector is all ones, up to a scale.</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>print("Eigenvalues=\n {}".format(np.round(eigenvalues,2)),
"\nEigenvector associated with first zero=\n {}".format(np.round(eigenvectors[:,-1],2))
)
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Eigenvalues=
[5.34 5. 3.47 3. 1.19 0. ]
Eigenvector associated with first zero=
[0.41 0.41 0.41 0.41 0.41 0.41]
</code></pre></div></div>
<p>Let’s unpack this: if we interpret the graph Laplacian $L$ as an the information matrix $Q$ the result above says that there is a direction in the space of unknowns in which we have <em>zero information</em>. This makes total sense: since we only have <em>relative</em> measurements, we can add a constant without changing the objective function: that corresponds to the first eigenvector.</p>
<p><strong>2. If the graph $G$ is disconnected, the second eigenvalue is also zero.</strong></p>
<p>We can examine this by looking at two mountaintops, by just replicating H:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>H2 = np.zeros((2*M,2*N))
H2[:M,:N]=H.toarray()
H2[M:,N:]=H.toarray()
Q2 = H2.T @ H2
eigenvectors2, eigenvalues2, _ = np.linalg.svd(Q2)
print("Eigenvalues=\n {}".format(np.round(eigenvalues2)),
"\nEigenvector associated with first zero=\n {}".format(np.round(eigenvectors2[:,-1],2)),
"\nEigenvector associated with second zero=\n {}".format(np.round(eigenvectors2[:,-2],2))
)
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Eigenvalues=
[5. 5. 5. 5. 3. 3. 3. 3. 1. 1. 0. 0.]
Eigenvector associated with first zero=
[-0. -0. -0. -0. -0. -0. 0.41 0.41 0.41 0.41 0.41 0.41]
Eigenvector associated with second zero=
[ 0.41 0.41 0.41 0.41 0.41 0.41 -0. 0. 0. 0. 0. 0. ]
</code></pre></div></div>
<p>This is also not surprising: even though this simulates scaling the two mountain tops and recording our elevation changes, we have no measurements connecting the two mountains, and hence we can <em>independently</em> vary the height of either mountain, without violating any measurement constraint. That corresponds to the two eigenvectors associated with the two zero eigenvalues.</p>
<p>This is also the basis for <a href="https://en.wikipedia.org/wiki/Spectral_clustering">spectral partitioning/clustering mtehods</a>: the eigenvectors neatly separate the graph into two components, and this can be generalized to more sophisticated clustering methods.</p>
<p>This is even better appreciated by plotting both eigenvectors:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>fig=plt.plot(eigenvectors2[:,-2:],'--*')
</code></pre></div></div>
<p><img src="/assets/images/Laplacian//Laplacian_34_0.png" alt="png" /></p>
<p><strong>3. The second-smallest eigenvalue, called the <a href="https://en.wikipedia.org/wiki/Algebraic_connectivity">Algebraic Connectivity or Fiedler value</a> is greater than zero if the graph is connected, and measures the global connectivity in the graph.</strong></p>
<p>In our original 6-node mountain top problem, the Fiedler value is approximatey 1.19:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>print("lambda = {}".format(round(eigenvalues[-2],2)))
print("with networkx: {}".format(round(nx.algebraic_connectivity(G),2)))
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>lambda = 1.19
with networkx: 1.19
</code></pre></div></div>
<p><strong>4. The eigenvector associated with the Fiedler value, also called the “Fiedler vector”, is the direction in space that incurs the least error in the scalar synchronization problem.</strong></p>
<p>Indeed: the direction that incurs the least error is also the one we have the least <em>information</em> about: highly informative measurements will incur a high error if they are violated. Note that by “error” we mean the quadratic objective function, not the difference between our solution and the ground truth, which we in general do not have access to.</p>
<p>Per the spectral clustering idea, it can also be used to partition the graph into two parts, even if the graph is not disconnected. This can be appreciated by plotting the Fiedler vector for the original mountain top:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>fig=plt.plot(eigenvectors[:,-2], '-*')
plt.legend(["$\lambda = {}$".format(round(eigenvalues[-2],2))],loc='upper right');
</code></pre></div></div>
<p><img src="/assets/images/Laplacian//Laplacian_38_0.png" alt="png" /></p>
<p>Just by thresholding on zero we can neatly partition the mountain in the left half, and the right half!</p>
<p><strong>5. The remaining eigenvectors of $L$ (or $Q$) are increasingly “high frequency” modes that identify directions in space that incur high costs.</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(eigenvectors[:,:-2])
plt.legend(["$\lambda_{} = {}$".format(5-i, round(e,2)) for i,e in enumerate(eigenvalues[:-2])],loc='lower left');
</code></pre></div></div>
<p><img src="/assets/images/Laplacian//Laplacian_41_0.png" alt="png" /></p>
<p>For example, the blue “mode” associated with $\lambda_5$ shows that modifying the values of neighboring nodes 2 and 3 in opposite ways will be quite expensive.</p>
<p>The SVD is a generalization of the Fourier transform, in that the SVD of the Laplacian of a ring <em>will</em> in fact yield the DFT:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>G3 = nx.watts_strogatz_graph(64, 2, 0) # ring is simply special case of WS graph
fig, ax = plt.subplots(ncols=3,figsize=(13, 4))
nx.draw_circular(G3, node_size=25, ax=ax[0])
L3 = nx.linalg.laplacian_matrix(G3).toarray()
eigenvectors3, eigenvalues3, _ = np.linalg.svd(L3)
ax[1].plot(eigenvectors3[:,-3:]);
ax[2].plot(eigenvectors3[:,-5:-3]);
</code></pre></div></div>
<p><img src="/assets/images/Laplacian//Laplacian_43_0.png" alt="png" /></p>
<p>Just to close the thread, as we saw before in the Mount Rainier example, if we have an absolute measurement then we can simply add a “prior” to our factor graph $H$ and RHS $d$, and we will get a definite answer:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>H1 = np.concatenate((H.toarray(), [[1,0,0,0,0,0]]), axis=0)
d1 = np.concatenate((d,[100]))
Q1 = H1.T @ H1
z = np.linalg.inv(Q1) @ H1.T @ d1
print(z)
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[100. 121.27268328 130.98429722 120.60191527 111.20576812
111.48345044]
</code></pre></div></div>
<p>Because there is no contradictory information regarding the value of $z_0$, the solution adopts that prior exactly.</p>
<p>It is instructive to abandon the “pure Laplacian” view here and look at the eigenvalues and eigenvectors of the Hessian of this new problem, which is no longer indeterminate:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>eigenvectors, eigenvalues, _ = np.linalg.svd(Q1)
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(eigenvectors)
ax.legend(["$\lambda_{} = {}$".format(5-i, round(e,2)) for i,e in enumerate(eigenvalues)],loc='lower right');
</code></pre></div></div>
<p><img src="/assets/images/Laplacian//Laplacian_47_0.png" alt="png" /></p>
<p>As you can see:</p>
<ol>
<li>the smallest eigenvalue is no longer zero: we now incur a definite cost of moving up and down, but the eigenvector associated with $\lambda_1$ suggests that it this is less so for nodes far away from the “anchor node”.</li>
<li>The eigenvectors associated with $\lambda_5$ has a zero in the front and its eigenvalue is the same as before.</li>
<li>The remaining eigenvectors have larger eigenvalues as they also change the anchor node, which now has higher cost.</li>
</ol>
<h2 id="mount-rainiers-eigenvectors">Mount Rainier’s Eigenvectors</h2>
<p>We will use the sparse/fast methods from networkx to calculate the Fiedler value and Fiedler vector:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>fiedler_value = nx.algebraic_connectivity(G0)
print(fiedler_value)
fiedler_vector = nx.fiedler_vector(G0)
plt.plot(fiedler_vector);
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0.053047495666829085
</code></pre></div></div>
<p><img src="/assets/images/Laplacian//Laplacian_51_1.png" alt="png" /></p>
<p>This does not look very good in 1D, so let’s plot it in 3D, using a color map to encode the Fiedler vector’s values:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>fig = plt.figure(figsize=(12,8))
ax = fig.add_subplot(111, projection='3d')
z = truth[:,2]
ax.scatter(truth[:,0], truth[:,1], z, marker='.', c=fiedler_vector,cmap='RdYlGn');
</code></pre></div></div>
<p><img src="/assets/images/Laplacian//Laplacian_53_0.png" alt="png" /></p>
<p>It is even clearer from the top: below we plot an overhead view, in which we can clearly see that the two “halves” of Mount Rainier can move more or less independently without incurring much error. Hence, this is the lowest frequency mode over which we do have some information (but not much).</p>
<p>If you are running the notebook, you could try playing with the number of neighbors in the KNN calculation, and see how the Fiedler vector changes as a result.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>fig = plt.figure(figsize=(12,8))
ax = fig.add_subplot(111)
z = truth[:,2]
ax.scatter(truth[:,0], truth[:,1], marker='.', c=fiedler_vector,cmap='RdYlGn');
</code></pre></div></div>
<p><img src="/assets/images/Laplacian//Laplacian_55_0.png" alt="png" /></p>
<h2 id="other-synchronization-problems">Other Synchronization Problems</h2>
<p>The “mountain shape” problems we discussed above are instances of a class of estimation problems called <strong>synchronization problems</strong>, in which we are given a set of <em>pairwise relative measurements</em>, and our goal is to estimate a set of absolute quantities.</p>
<p>Arguably the most studied synchronization problem in computer vision is <em>rotation averaging</em>, where we are given a set of relative rotations $R_{ij}$ between cameras and the goal is to estimate the absolute orientation $R_i$ for every camera. This is a crucial step in 3D reconstruction algorithms.</p>
<p>We have recently published a paper on <a href="https://dellaert.github.io/ShonanAveraging/">Shonan Rotation Averaging</a>, where the eigenvalue and eigenvectors of the graph Laplacian are used in both the analysis and convergence certificates.</p>Frank Dellaert, August 30, 2020Releasing GTSAM 4.0.32020-07-16T17:32:00+00:002020-07-16T17:32:00+00:00http://gtsam.org/gtsam.org/2020/07/16/new-release-gtsam<p>Author: <a href="https://blog.amayume.net/">Fan Jiang</a></p>
<h2 id="introduction">Introduction</h2>
<p>You are probably here because you do optimization, like everyone here on the GTSAM team. As fellow roboticists, we know how frustrating it be when your problem does not converge.</p>
<p>To further optimize your optimization experience, we are excited to announce this new release of GTSAM, GTSAM 4.0.3, where we incorporated a lot of new features and bugfixes, as well as substantial improvements in convergence for Pose3-related problems.</p>
<p>Note that GTSAM <em>already</em> provides the excellent Pose3 initialization module by Luca Carlone, in <code class="language-plaintext highlighter-rouge">InitializePose3</code>, which we always recommend if your pipeline does not provide a good initial estimate out of the box.</p>
<h2 id="major-changes">Major Changes</h2>
<h3 id="switching-away-from-cayley">Switching Away from Cayley</h3>
<blockquote>
<p>TL;DR: GTSAM will now default to using the full $\mathrm{SE(3)}$ exponential map, instead of using the Cayley map, which should give better convergence for most problems without performance impact.</p>
</blockquote>
<p>In nonlinear optimization, one important factor affecting the convergence is the mathematical structure of the object we are optimizing on. In many practical 3D robotics problems this is the $\mathrm{SE(3)}$ manifold describing the structure of 3D Poses.</p>
<p>It is not easy to directly operate on nonlinear manifolds like $\mathrm{SE(3)}$, so libraries like GTSAM uses the following strategy:</p>
<ul>
<li>Linearize the <em>error</em> manifold at the current estimate</li>
<li>Calculate the next update in the associated tangent space</li>
<li>Map the update back to the manifold with a <em>retract</em> map</li>
</ul>
<p>We used two distinct but equally important concepts above: 1) the error metric, which is in a PoseSLAM problems is the measure of error between two poses; and 2) the <em>retract</em> operation, which is how we apply a computed linear update back to the nonlinear error manifold.</p>
<p>In GTSAM, you can choose, at compile time, between four different choices for the retract map on the $\mathrm{SE(3)}$ manifold:</p>
<ul>
<li>Full: Exponential map on $\mathrm{SE(3)}$</li>
<li>Decomposed retract, which uses addition for translation and:
<ul>
<li>Exponential map $\mathrm{SO(3)}$ with Rotation Matrix</li>
<li>Exponential map $\mathrm{SO(3)}$ with Quaternions</li>
<li>Cayley map on $\mathrm{SO(3)}$ with Rotation Matrix</li>
</ul>
</li>
</ul>
<p>Previously in GTSAM, we used the Cayley map by default, which is an approximation of the $\mathrm{SO(3)}$ exponential map when the tangent vector (rotation error) is small. This is perfectly fine locally, if we have a relatively good initial estimate.</p>
<p>However, since we are also using the inverse of the retract as the error metric, a different choice for the retract map could give better convergence.
As you can see in the following figure, the Cayley <em>local</em> map is unbounded when $\theta$ is large, and thus negatively impacts convergence when the initialization is not good enough.</p>
<p><img src="/assets/images/release_403/cayley_distortion.png" alt="Cayley" /></p>
<p>Based on careful benchmarking, in the new release, we will not use the Cayley approximation by default, which should give you a better convergence for most applications. This is especially true if your initial estimate can be far away from the global optimum: the impact on well-initialized problems is minimal. You can look at the benchmark yourself in the next section, if you are interested.</p>
<h3 id="can-we-still-use-cayley-and-friends">Can we still use Cayley and Friends?</h3>
<p>Yes, just not by default. Historically, the Cayley approximation was chosen as a cheaper alternative to the full exponential map, and our intention is not to tell you that you should not use it, rather to inform you that without properly initializing your initial estimate, the result of Cayley could be inferior to those obtained with the full $\mathrm{SE(3)}$ retract.</p>
<p>In order to give you an intuitive understanding of the situation we made a benchmark where the four configurations by:</p>
<ul>
<li>asking GTSAM to solve 6 benchmark optimization datasets, with the Chordal initialization as initial estimate (from <code class="language-plaintext highlighter-rouge">InitializePose3</code>);</li>
<li>asking GTSAM to solve 6 benchmark optimization datasets, this time with 100 <em>random</em> initial estimates, sampled around the ground truth by a Gaussian distribution of 1 sigma, and observe the convergence metrics.</li>
</ul>
<style>
@media screen and (min-width: 640px){
.fig-2x1-container {
display: grid;
grid-template-columns: 1fr 1fr;
grid-template-rows: 1fr auto auto;
gap: 1px 1px;
grid-template-areas: "im1 im2" "name1 name2" "figname figname";
}
}
.im1 { grid-area: im1; }
.im2 { grid-area: im2; }
.name1 { grid-area: name1; text-align: center; }
.name2 { grid-area: name2; text-align: center; }
.figname { grid-area: figname; text-align: center; }
</style>
<div class="fig-2x1-container">
<div class="im1"><img src="/assets/images/release_403/chordal_convergence.png" /></div>
<div class="name1"><p>a) With Chordal Initialization</p></div>
<div class="im2"><img src="/assets/images/release_403/success_rates.png" /></div>
<div class="name2"><p>b) Without Chordal Initialization</p></div>
<div class="figname">Performance for different retract variants.</div>
</div>
<p>Note that with proper initialization, all 4 configurations achieved convergence without issue. However, the full $\mathrm{SE(3)}$ retract exhibited much better convergence with randomly initialized estimates.</p>
<p>For a visual reference, here are 3D scatter plots of samples from the random benchmark results that you can zoom in and see the difference:</p>
<div>
<script src="https://cdn.plot.ly/plotly-latest.min.js"></script>
<div id="2e54447b-538c-4cac-abd4-e2865417fa63" class="plotly-graph-div" style="height:60em; width:100%;"></div>
<script type="text/javascript">
window.PLOTLYENV = window.PLOTLYENV || {};
(function () {
var data = {}
fetch('/assets/images/release_403/cayley-div.json')
.then(function (response) {
return response.json();
})
.then(function (points_data) {
data.points = points_data;
return fetch('/assets/images/release_403/cayley-conf.json')
})
.then(function (response) {
return response.json();
})
.then(function (conf_data) {
if (document.getElementById("2e54447b-538c-4cac-abd4-e2865417fa63")) {
Plotly.newPlot(
'2e54447b-538c-4cac-abd4-e2865417fa63', data.points, conf_data, {
"responsive": true
}
)
}
});
})();
</script>
</div>
<div>
<div id="7034b6ec-dfbc-4220-888b-df106a160cd1" class="plotly-graph-div" style="height:60em; width:100%;"></div>
<script type="text/javascript">
window.PLOTLYENV=window.PLOTLYENV || {};
(function () {
var data = {}
fetch('/assets/images/release_403/full-div.json')
.then(function (response) {
return response.json();
})
.then(function (points_data) {
data.points = points_data;
return fetch('/assets/images/release_403/full-conf.json')
})
.then(function (response) {
return response.json();
})
.then(function (conf_data) {
if (document.getElementById("7034b6ec-dfbc-4220-888b-df106a160cd1")) {
Plotly.newPlot(
'7034b6ec-dfbc-4220-888b-df106a160cd1', data.points, conf_data, {
"responsive": true
}
)
}
});
})();
</script>
</div>
<p>The results can be reproduced with this repo: <code class="language-plaintext highlighter-rouge">https://github.com/ProfFan/expmap-benchmark</code></p>
<h2 id="important-new-features--bugfixes">Important New Features & Bugfixes</h2>
<p>In addition to the change in default Pose3 retract, which will now be the full exponential map, GTSAM has seen a steady stream of commits since the last release, 4.0.2, which has been there for more than 6 months. A summary of the most important issues and features is below:</p>
<ul>
<li>Robust noise model is ready for general usage
<ul>
<li>It can be used to replace RANSAC for some applications</li>
<li>For a gentle introduction, see <a href="/2019/09/20/robust-noise-model.html">this</a> awesome tutorial by Varun Agrawal</li>
</ul>
</li>
<li><code class="language-plaintext highlighter-rouge">CombinedImuFactor</code> serialization is now fixed</li>
<li>The ISAM2 KITTI example has a C++ port, thanks Thomas Jespersen for the help!</li>
<li>Now you can choose arbitrary MATLAB install prefix for the toolbox build</li>
<li>Now you can <code class="language-plaintext highlighter-rouge">make python-install</code> to install the Python toolbox</li>
<li>Now you can use the Conjugate Gradient solver in Python</li>
<li>Now you can install GTSAM with <code class="language-plaintext highlighter-rouge">pip</code> if you only use the Python interface</li>
<li>Added <code class="language-plaintext highlighter-rouge">FrobeniusFactor</code> and <code class="language-plaintext highlighter-rouge">FrobeniusWormholeFactor</code> for robust SFM applications</li>
<li>Switched to in-place update of the diagonal Hessian in LM
<ul>
<li>expect a 3%-5% speedup, YMMV</li>
</ul>
</li>
<li>The Cython wrapper now can be built on Windows :)
<ul>
<li>Kudos <code class="language-plaintext highlighter-rouge">@tuwuhs</code> for the help!</li>
</ul>
</li>
<li>Fixed a few memory-related bugs detected by the LLVM sanitizer
<ul>
<li>Greatly improved stability</li>
</ul>
</li>
</ul>
<h2 id="finale">Finale</h2>
<p>With over a hundred merged pull requests, we welcome you again on board the new release of GTSAM, GTSAM <code class="language-plaintext highlighter-rouge">4.0.3</code>. We would like to thank all our contributors for their precious commits and bug reports. Finally, thank you for using GTSAM and please don’t hesitate to open an issue on GitHub if you found a bug!</p>Fan JiangAuthor: Fan JiangGeometry and Variable Naming Conventions2020-06-28T00:00:00+00:002020-06-28T00:00:00+00:00http://gtsam.org/gtsam.org/2020/06/28/gtsam-conventions<p>Author: <a href="https://samarth-robo.github.io">Samarth Brahmbhatt</a></p>
<p>I am a post-doc at <a href="https://vladlen.info">Intel ISL</a>. In my
<a href="https://contactpose.cc.gatech.edu">PhD work</a>, I used GTSAM for object pose
estimation and 3D reconstruction of hand joints to study hand-object interaction.
I love GTSAM for its ability to optimize geometric quantities and deal with
uncertainty in observations. However, it is easy to get lost in the web of
inverse transformations while optimizing complex geometric systems. I’ve found
that naming code variables consistently with textbook notation conventions can
help mitigate this. This post describes some suggestions for doing that.</p>
<h1 id="points">Points</h1>
<ul>
<li>Always name your 3D points like how you would on paper. A point $^cX$ in the
camera coordinate system $c$ is named <code class="language-plaintext highlighter-rouge">cX</code>.</li>
<li>3D points use uppercase letters, 2D points use lowercase letters.</li>
</ul>
<h1 id="pose">Pose</h1>
<p>Name your pose variables like how you would write them on paper.
The pose $^wT_c$ of camera coordinate frame $c$ in the world coordinate frame
$w$ is named <code class="language-plaintext highlighter-rouge">wTc</code>. In GTSAM jargon, <code class="language-plaintext highlighter-rouge">c</code> is the <strong>pose coordinate</strong> frame,
and <code class="language-plaintext highlighter-rouge">w</code> is the <strong>world coordinate</strong> frame.</p>
<h1 id="composing-poses">Composing Poses</h1>
<p>Math: $^oT_c =~^oT_w~\cdot~^wT_c$.</p>
<p>GTSAM code:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Pose3</span> <span class="n">oTw</span> <span class="o">=</span> <span class="n">Pose3</span><span class="p">(...);</span>
<span class="n">Pose3</span> <span class="n">wTc</span> <span class="o">=</span> <span class="n">Pose3</span><span class="p">(...);</span>
<span class="n">Pose3</span> <span class="n">oTc</span> <span class="o">=</span> <span class="n">oTw</span> <span class="o">*</span> <span class="n">wTc</span><span class="p">;</span>
<span class="c1">// same as Pose3 oTc = oTw.compose(wTc);</span>
</code></pre></div></div>
<h1 id="transforming-points-from-pose-coordinates">Transforming Points <em>From</em> Pose Coordinates</h1>
<p>This operation uses the camera pose in the world coordinate frame
($^wT_c$) to bring points from the camera coordinate frame ($^c\homo{X}$) to
the world coordinate frame ($^w\homo{X}$).</p>
<p>Math: $^w\homo{X} =~^wT_c~\cdot~^c\homo{X}$</p>
<p>GTSAM Code:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Point3</span> <span class="n">wX</span> <span class="o">=</span> <span class="n">wTc</span><span class="p">.</span><span class="n">transformFrom</span><span class="p">(</span><span class="n">cX</span><span class="p">);</span>
<span class="c1">// same as Point3 wX = wTc * cX;</span>
</code></pre></div></div>
<h1 id="transforming-points-to-pose-coordinates">Transforming Points <em>To</em> Pose Coordinates</h1>
<p>This operation uses the inverse of the camera pose $\left(^wT_c\right)^{-1}$
to bring points from the world coordinate frame ($^w\homo{X}$) to
the camera coordinate frame ($^c\homo{X}$).</p>
<p>Math: $^c\homo{X} =~\left(^wT_c\right)^{-1}~\cdot~^w\homo{X}$</p>
<p>GTSAM Code:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Point3</span> <span class="n">cX</span> <span class="o">=</span> <span class="n">wTc</span><span class="p">.</span><span class="n">transformTo</span><span class="p">(</span><span class="n">wX</span><span class="p">);</span>
</code></pre></div></div>
<h1 id="cameras">Cameras</h1>
<p>The GTSAM pinhole camera classes
(e.g. <a href="https://github.com/borglab/gtsam/blob/develop/gtsam/geometry/CalibratedCamera.cpp"><code class="language-plaintext highlighter-rouge">PinholeBase</code></a>)
internally use <code class="language-plaintext highlighter-rouge">transformTo()</code> to transform 3D points into the camera
coordinates, so you should use the pose of the camera w.r.t. world while
constructing the object:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Pose3</span> <span class="n">wTc</span> <span class="o">=</span> <span class="n">Pose3</span><span class="p">(...);</span>
<span class="n">SimpleCamera</span> <span class="nf">cam</span><span class="p">(</span><span class="n">wTc</span><span class="p">,</span> <span class="n">K</span><span class="p">);</span>
</code></pre></div></div>
<p>now you can use <code class="language-plaintext highlighter-rouge">cam</code> in the <code class="language-plaintext highlighter-rouge">TriangulationFactor</code> for example. Other factors
like the
<a href="https://github.com/borglab/gtsam/blob/develop/gtsam/slam/ProjectionFactor.h"><code class="language-plaintext highlighter-rouge">GenericProjectionFactor</code></a>
also use the same convention:
\(\begin{align*}
^{sensor}\homo{X}
&=~^{sensor}T_{world}~\cdot~^{world}\homo{X}\\
&=~^{sensor}T_{body}~\cdot~^{body}T_{world}~\cdot~^{world}\homo{X}\\
&= \left(^{world}T_{body}~\cdot~^{body}T_{sensor}\right)^{-1}~\cdot~^{world}\homo{X}
\end{align*}\)</p>
<p><strong>Example</strong>: In a mobile robot SLAM scenario, wheel odometry might tell you where
the robot body is in the world ($^{world}T_{body}$), and the robot URDF spec
might tell you where the camera is on the robot body ($^{body}T_{sensor}$).
Together, these allow you to situate camera observations in the world coordinate
frame.</p>
<p>GTSAM Code:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Pose3</span> <span class="n">body_T_sensor</span> <span class="o">=</span> <span class="p">...</span>
<span class="n">Point2</span> <span class="n">sensor_p</span> <span class="o">=</span> <span class="p">...</span> <span class="c1">// 2D point in the image</span>
<span class="c1">// in the following factor,</span>
<span class="c1">// Symbol('T', i) is world_T_body for the i'th frame</span>
<span class="c1">// Symbol('X', j) is the j'th 3D point in world coordinates i.e. world_Xj</span>
<span class="k">auto</span> <span class="n">f</span> <span class="o">=</span> <span class="n">GenericProjectionFactor</span><span class="o"><</span><span class="n">Pose3</span><span class="p">,</span> <span class="n">Point3</span><span class="p">,</span> <span class="n">Cal3_S2</span><span class="o">></span><span class="p">(</span><span class="n">sensor_p</span><span class="p">,</span> <span class="n">noise</span><span class="p">,</span> <span class="n">Symbol</span><span class="p">(</span><span class="sc">'T'</span><span class="p">,</span> <span class="n">i</span><span class="p">),</span> <span class="n">Symbol</span><span class="p">(</span><span class="sc">'X'</span><span class="p">,</span> <span class="n">j</span><span class="p">),</span> <span class="n">K</span><span class="p">,</span> <span class="n">body_T_sensor</span><span class="p">);</span>
</code></pre></div></div>
<p>It will project the world 3D point $^{world}\homo{X}$ into the sensor coordinates like so:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Pose3</span> <span class="n">world_T_sensor</span> <span class="o">=</span> <span class="n">world_T_body</span> <span class="o">*</span> <span class="n">body_T_sensor</span><span class="p">;</span>
<span class="n">Point3</span> <span class="n">sensor_X</span> <span class="o">=</span> <span class="n">world_T_sensor</span><span class="p">.</span><span class="n">transformTo</span><span class="p">(</span><span class="n">world_X</span><span class="p">);</span>
</code></pre></div></div>
<p>and then project it to the image using instrinsics and then compare it to the detection <code class="language-plaintext highlighter-rouge">sensor_p</code>.</p>Author: Samarth BrahmbhattWhat are Factor Graphs?2020-06-01T00:00:00+00:002020-06-01T00:00:00+00:00http://gtsam.org/2020/06/01/factor-graphs<h3 id="by-frank-dellaert-fdellaert-on-twitter">By <a href="https://dellaert.github.io">Frank Dellaert</a>, <a href="http://twitter.com/fdellaert">@fdellaert on Twitter</a></h3>
<p>[Cross-posting from our new sister-site, <a href="https://opensam.org/opensam.org/2020/05/26/factor-graphs.html">OpenSAM.org</a>]</p>
<p>Many computational problems in robotics have an optimization problem at their core. For example, in simultaneous localization and mapping (SLAM) and many other <strong>estimation problems</strong> we are after a maximum a posteriori estimate, i.e., we try to maximize posterior probability of the variables given a set of measurements. When attempting to <strong>act optimally</strong>, we try to maximize a performance index, or conversely minimize a penalty function. And even in classical planning, we are trying to find an assignment to a set of discrete variables that minimizes the plan length or optimizes for some other desirable property of the plan.</p>
<p>In most of these optimization problems, the objective to be maximized or minimized is composed of many different <strong>factors or terms that typically are local in nature</strong>, i.e., they only depend on a small subset of the entire set of variables. For example, in a tracking application, a particular video frame only provides information about the position of a target at a particular time. At the next time step, a different variable is associated with the target. Of course, this depends on the parametrization chosen: if a track is globally parametrized, for example as a polynomial, this locality is destroyed.</p>
<p>A particularly insightful way of modeling this locality structure is using the concept of <strong>factor graphs</strong>. Factor graphs are a class of graphical models in which there are variables and factors. The variables represent unknown quantities in the problem, and the factors represent functions on subsets of the variables. Edges in the factor graph are always between factors and variables, and indicate that a particular factor depends on a particular variable.</p>
<p>There are three main <strong>advantages</strong> to using factor graphs when designing algorithms for robotics applications:</p>
<ul>
<li>They can represent a wide variety of problems across robotics.</li>
<li>By laying bare the compositional structure of the problem, they expose opportunities to improve computational performance.</li>
<li>They are beneficial in designing and thinking about modelling your problem, even aside from performance considerations.</li>
</ul>
<p>Because many optimization problems in robotics have the locality property, factor graphs can model a wide variety of problems across AI and robotics. Some of these are illustrated below:</p>
<h2 id="boolean-satisfiability">Boolean Satisfiability:</h2>
<p>In <em>Boolean Satisfiability</em>, we are looking for an assignment to Boolean variables that make a set of Boolean formulas true. In the example below, the rather boring looking Boolean equations is represented by the Boolean factor graph below:</p>
<p><img src="/assets/fg-images/image6.png" alt="Factor graph with Boolean variables and hard constraints" title="Factor graph with Boolean variables and hard constraints" /></p>
<h2 id="constraint-satisfaction">Constraint Satisfaction:</h2>
<p>When we generalize from Boolean to <em>discrete</em> variables, we obtain the class of <em>Constraint Satisfaction Problems</em> or CSPs. For example, the image below shows a graph coloring problem where the goal is to assign a color (chosen from a finite set of colors) to the Swiss cantons such that no neighboring canton has the same color. In the graph below, these pairiwse <em>constraints</em> are indicated by the square factors.</p>
<p><img src="/assets/fg-images/image3.png" alt="Factor graph with Discrete variables and hard constraints" title="Factor graph with Discrete variables and hard constraints" /></p>
<h2 id="bayes-networks">Bayes Networks:</h2>
<p>If we relax the hard constraints to real-valued <em>preferences</em>, we switch to a <em>Constraint Optimization Problem</em>. This can be used to express that a particular color is preferred over another one. It can be shown that <em>COPs</em> are also the main optimization problem to solve in the venerable AI technique of <em>Bayes networks</em>. Below an example where a Bayes network with given evidence (the square nodes) is converted to a COP factor graph, which can then give the maximum probable explanation for the remaining (non-evidence) variables.</p>
<p><img src="/assets/fg-images/image5.png" alt="Bayes net with evidence and corresponding COP factor graph" title="Bayes net with evidence and corresponding COP factor graph" /></p>
<h2 id="polynomial-equations">Polynomial Equations:</h2>
<p>The constraints can also be other functions of the variables. For example, we can represent a system of polynomial equations by a factor graph. The example below, adapted from Gim Hee Lee’s Ph.D. thesis, encodes a minimal geometry problem from computer vision, useful for autonomous driving:</p>
<p><img src="/assets/fg-images/image2.png" alt="Factor graph with continuous variables and polynomial equality constraints" title="Factor graph with continuous variables and polynomial equality constraints" /></p>
<h2 id="simultaneous-localization-and-mapping">Simultaneous Localization and Mapping:</h2>
<p>Speaking of autonomous driving, perhaps the most famous application of factor graphs is in SLAM, or <em>Simultaneous Localization and Mapping</em>. This is illustrated in the example below, where the location of a vehicle over time (the cyan variables) as well as the location of a set of landmarks (the dark blue nodes) are solved for. In this case we have a plethora of factors associated with vehicle odometry, landmark sightings, and GPS priors. The example in question is a small excerpt from a real robot experiment in Sydney’s Victoria Park.</p>
<p><img src="/assets/fg-images/image1.png" alt="Factor graph with pose and landmark variables, and measurement-derived factors" title="Factor graph with pose and landmark variables, and measurement-derived factors" /></p>
<h2 id="structure-from-motion">Structure from Motion:</h2>
<p>Finally, we can also represent <em>3D mapping</em> or <em>3D reconstruction</em> problems with factor graphs. Below we show just the edges in the factor graph connecting one camera (yellow) with all points visible from that camera (black) in a large-scale 3D reconstruction of downtown Chicago.</p>
<p><img src="/assets/fg-images/image4.png" alt="Factor graph with cameras and 3D point variables, and re-projection factors for one camera" title="Factor graph with cameras and 3D point variables, and re-projection factors for one camera" /></p>
<h2 id="conclusion">Conclusion</h2>
<p>For more information about how factor graphs are typically used to solve perception problems in robotics, see the following booklet: <a href="http://www.cc.gatech.edu/~dellaert/pubs/Dellaert17fnt.pdf">Factor graphs for robot perception</a>, by <a href="https://dellaert.github.io">Frank Dellaert</a> and <a href="https://frc.ri.cmu.edu/~kaess/">Michael Kaess</a>, which appeared in 2017 in Foundations and Trends in Robotics.</p>
<p>However, this is just the tip of the iceberg. Factor graphs can be used to model a much wider variety of problems across robotics domains, such as Tracking, Inertial Navigation, Mapping with LIDARs, Classical Planning, Reinforcement Learning and Optimal Control, Motion Planning etc…</p>By Frank Dellaert, @fdellaert on TwitterLQR Control Using Factor Graphs2019-11-07T00:00:00+00:002019-11-07T00:00:00+00:00http://gtsam.org/2019/11/07/lqr-control<link rel="stylesheet" href="/assets/css/slideshow.css" />
<p>Authors: <a href="https://gerry-chen.com">Gerry Chen</a>, <a href="https://www.linkedin.com/in/yetong-zhang-9b810a105/">Yetong
Zhang</a>, and <a href="https://www.cc.gatech.edu/~dellaert/FrankDellaert/Frank_Dellaert/Frank_Dellaert.html">Frank Dellaert</a></p>
<div style="display:none"> <!-- custom latex commands here -->
$
\DeclareMathOperator*{\argmin}{argmin}
\newcommand{\coloneqq}{\mathrel{:=}}
$
</div>
<style>
.MJXc-display {
overflow-x: auto;
overflow-y: hidden;
scrollbar-width: none; /* Firefox */
-ms-overflow-style: none; /* Internet Explorer 10+ */
}
.MJXc-display::-webkit-scrollbar {
width: 5px;
height: 2px;
}
.MJXc-display::-webkit-scrollbar-track {
background: transparent;
}
.MJXc-display::-webkit-scrollbar-thumb {
background: #ddd;
visibility:hidden;
}
.MJXc-display:hover::-webkit-scrollbar-thumb {
visibility:visible;
}
</style>
<p><!-- horizontal scrolling --></p>
<!-- - TOC
{:toc} -->
<h2 id="introduction">Introduction</h2>
<p><a name="fg_scratch"></a></p>
<figure class="center">
<img src="/assets/images/lqr_control/VE/fg_lqr.png" alt="Factor graph structure. The objective factors are marked with dashed lines, and the constrain factors are marked with solid lines." />
<figcaption><b>Figure 1</b> Factor graph structure for an LQR problem with 3 time steps. The cost factors are marked with dashed lines and the dynamics constraint factors are marked with solid lines.</figcaption>
</figure>
<p><br /></p>
<p>In this post we explain how optimal control problems can be formulated as factor graphs and solved
by performing variable elimination on the factor graph.</p>
<p>Specifically, we will show the factor graph formulation and solution for the
<strong>Linear Quadratic Regulator (LQR)</strong>. LQR is <em>a state feedback controller which derives the optimal gains
for a linear system with quadratic costs on control effort and state error</em>.</p>
<p>We consider the <a href="https://stanford.edu/class/ee363/lectures/dlqr.pdf"><strong>finite-horizon, discrete LQR
problem</strong></a>.
The task is to <em>find the optimal controls $u_k$ at time instances $t_k$
so that a total cost is minimized</em>. Note that we will later see the optimal controls can be represented in the form $u^*_k = K_kx_k$ for some optimal gain matrices $K_k$. The LQR problem can be represented as a constrained optimization
problem where the costs of control and state error are represented by the
minimization objective \eqref{eq:cost}, and the system dynamics are represented by the
constraints \eqref{eq:dyn_model}.</p>
<p>\[ \begin{equation} \argmin\limits_{u_{1\sim N}} ~~x_N^T Q x_N + \sum\limits_{k=1}^{N-1} x_k^T Q x_k + u_k^T R u_k \label{eq:cost} \end{equation}\]
\[ \begin{equation} s.t. ~~ x_{k+1}=Ax_k+Bu_k ~~\text{for } k=1 \text{ to } N-1 \label{eq:dyn_model} \end{equation} \]</p>
<p>We can visualize the objective function and constraints in the form of a factor
graph as shown in <a href="#fg_scratch">Figure 1</a>. This is a simple Markov chain, with the oldest
states and controls on the left, and the newest states and controls on the right. <strong>The
ternary factors represent the dynamics model constraints and the unary
factors represent the state and control costs we seek to minimize via least-squares.</strong></p>
<h2 id="variable-elimination">Variable Elimination</h2>
<p>To optimize the factor graph, which represents minimizing the least squares objectives above, we can simply eliminate the factors from right
to left. In this section we demonstrate the variable elimination graphically and algebraically, but the matrix elimination is also
provided in the <a href="#least-squares-implementation-in-gtsam">Appendix</a>.</p>
<!-- ********************** BEGIN VARIABLE ELIMINATION SLIDESHOW ********************** -->
<!-- Slideshow container, based on https://www.w3schools.com/howto/howto_js_slideshow.asp -->
<div class="slideshow-container">
<div class="mySlides 0" style="text-align: center;">
<!-- <div class="numbertext">2 / 3</div> -->
<a name="fig_eliminate_x_a"></a>
<figure class="center">
<img src="/assets/images/lqr_control/Elimination/cropped_Slide1.png" alt="Elimination of state $x_2$" />
<figcaption><b>Figure 2a</b> Elimination of state $x_2$</figcaption>
</figure>
</div>
<div class="mySlides 0" style="text-align: center;">
<!-- <div class="numbertext">2 / 3</div> -->
<a name="fig_eliminate_x_b"></a>
<figure class="center">
<img src="/assets/images/lqr_control/Elimination/cropped_Slide2.png" alt="Elimination of state $x_2$" />
<figcaption><b>Figure 2b</b> Elimination of state $x_2$</figcaption>
</figure>
</div>
<div class="mySlides 0" style="text-align: center;">
<a name="fig_eliminate_u_a"></a>
<figure class="center">
<img src="/assets/images/lqr_control/Elimination/cropped_Slide3.png" alt="Elimination of state $u_1$" />
<figcaption><b>Figure 3a</b> Elimination of state $u_1$</figcaption>
</figure>
</div>
<div class="mySlides 0" style="text-align: center;">
<a name="fig_eliminate_u_b"></a>
<figure class="center">
<img src="/assets/images/lqr_control/Elimination/cropped_Slide4.png" alt="Elimination of state $u_1$" />
<figcaption><b>Figure 3b</b> Elimination of state $u_1$</figcaption>
</figure>
</div>
<div class="mySlides 0" style="text-align: center;">
<a name="fig_merge_factor"></a>
<figure class="center">
<img src="/assets/images/lqr_control/Elimination/cropped_Slide5.png" alt="Elimination of state $u_1$" />
<figcaption><b>Figure 3c</b> Cost-to-go at $x_1$ is the sum of the two unary factors on $x_1$ (green)</figcaption>
</figure>
</div>
<div class="mySlides 0" style="text-align: center;">
<a name="fig_bayes_net"></a>
<figure class="center">
<img src="/assets/images/lqr_control/Elimination/cropped_Slide6.png" alt="Bayes net" />
<figcaption><b>Figure 4a</b> Repeat elimination until the graph is reduced to a Bayes net</figcaption>
</figure>
</div>
<!-- <div class="mySlides 0" style="text-align: center;">
<a name="fig_bayes_net"></a>
<figure class="center">
<img src="/assets/images/lqr_control/Elimination/cropped_Slide7.png"
alt="Bayes net" />
<figcaption><b>Figure 4b</b> Repeat elimination until the graph is reduced to a Bayes net</figcaption>
</figure>
</div> -->
<div class="mySlides 0" style="text-align: center;">
<a name="fig_bayes_net"></a>
<figure class="center">
<img src="/assets/images/lqr_control/Elimination/cropped_Slide8.png" alt="Bayes net" />
<figcaption><b>Figure 4b</b> Repeat elimination until the graph is reduced to a Bayes net</figcaption>
</figure>
</div>
<div class="mySlides 0" style="text-align: center;">
<a name="fig_bayes_net"></a>
<figure class="center">
<img src="/assets/images/lqr_control/Elimination/cropped_Slide10.png" alt="Bayes net" />
<figcaption><b>Figure 4c</b> Repeat elimination until the graph is reduced to a Bayes net</figcaption>
</figure>
</div>
<div class="mySlides 0" style="text-align: center;">
<a name="fig_bayes_net"></a>
<figure class="center">
<img src="/assets/images/lqr_control/Elimination/cropped_Slide11.png" alt="Bayes net" />
<figcaption><b>Figure 4d</b> Completed Bayes net</figcaption>
</figure>
</div>
<!-- Next and previous buttons -->
<a class="prev" onclick="plusSlides(-1,0)">❮</a>
<a class="next" onclick="plusSlides(1,0)">❯</a>
</div>
<p><!-- slideshow-container --></p>
<!-- The dots/circles -->
<div style="text-align:center">
<span class="dot 0" onclick="currentSlide(1,0)"></span>
<span class="dot 0" onclick="currentSlide(2,0)"></span>
<span class="dot 0" onclick="currentSlide(3,0)"></span>
<span class="dot 0" onclick="currentSlide(4,0)"></span>
<span class="dot 0" onclick="currentSlide(5,0)"></span>
<span class="dot 0" onclick="currentSlide(6,0)"></span>
<span class="dot 0" onclick="currentSlide(7,0)"></span>
<span class="dot 0" onclick="currentSlide(8,0)"></span>
<span class="dot 0" onclick="currentSlide(9,0)"></span>
<!-- <span class="dot 0" onclick="currentSlide(10,0)"></span> -->
</div>
<!-- this css is to make the scroll bar disappear when the mouse isn't over the scrollable div...
Taken from my website: https://github.com/gchenfc/gerrysworld2/blob/master/css/activity.css -->
<style>
.scrollablecontent::-webkit-scrollbar {
width: 5px;
height: 12px;
}
.scrollablecontent::-webkit-scrollbar-track {
background: transparent;
}
.scrollablecontent::-webkit-scrollbar-thumb {
background: #ddd;
visibility:hidden;
}
.scrollablecontent:hover::-webkit-scrollbar-thumb {
visibility:visible;
}
/* .slideout {
width: 400px;
height: auto;
overflow: hidden;
background: orange;
margin: 0 auto;
transition: height 0.4s linear;
}
.hide {
height: 0;
} */
</style>
<!-- ********************** END VARIABLE ELIMINATION SLIDESHOW ********************** -->
<!-- ************************ BEGIN SCROLLABLE ELIMINATION DESCRIPTION ************************ -->
<div class="scrollablecontent" id="sec:elim_scrollable" style="overflow-x: hidden; background-color:rgba(0,0,0,0.05); padding:0 8px; margin-bottom: 10px;">
<!-- ************** STATE ************** -->
<div id="sec:elim_state_div" class="slideout">
<p><a id="sec:elim_state"></a></p>
<h3 id="eliminate-a-state">Eliminate a State</h3>
<p>Let us start at the last state, $x_2$. Gathering the two factors (marked in
red <a href="#fig_eliminate_x_a" onclick="currentSlide(1,0)">Figure 2a</a>), we have \eqref{eq:potential} the objective function $\phi_1$, and \eqref{eq:constrain} the constraint equation on $x_2$, $u_1$ and $x_1$:</p>
<p>\begin{equation} \phi_1(x_2) = x_2^T Q x_2 \label{eq:potential} \end{equation}</p>
<p>\begin{equation} x_2 = Ax_1 + Bu_1 \label{eq:constrain} \end{equation}</p>
<p>By substituting $x_2$ from the dynamics constraint \eqref{eq:constrain} into the objective function
\eqref{eq:potential}, we create a new factor representing
the cost of state $x_2$ as a function of $x_1$ and $u_1$:</p>
<p>\begin{equation} \phi_2(x_1, u_1) = (Ax_1 + Bu_1)^T Q (Ax_1 + Bu_1)
\label{eq:potential_simplified} \end{equation}</p>
<p>The resulting factor graph is illustrated in <a href="#fig_eliminate_x_b" onclick="currentSlide(2,0)">Figure 2b</a>. Note that the
dynamics constraint is now represented by the bayes net factors shown as gray arrows.</p>
<p>To summarize, we used the dynamics constraint to eliminate variable
$x_2$ and the two factors marked in red, and we replaced them with a new binary cost factor on $x_1$
and $u_1$, marked in blue.</p>
<p align="right" style="background-color: rgba(0,0,0,0.1);
margin: 0px -8px 0 -8px;
padding: 0 8px;"><a onclick="currentSlide(3,0)">next >></a>
</p>
</div>
<!-- ************** CONTROL ************** -->
<div id="sec:elim_ctrl_div" class="slideout">
<p><a id="sec:elim_ctrl"></a></p>
<h3 id="eliminate-a-control">Eliminate a Control</h3>
<!-- Now \eqref{eq:potential_simplified} defines an (unnormalized) joint
Gaussian density on variables $x_1$ and $u_1$. -->
<p>To eliminate $u_1$, we seek to replace the two factors marked red in <a href="#fig_eliminate_u_a" onclick="currentSlide(3,0)">Figure 3a</a>
with a new cost factor on $x_1$ and an equation for the optimal control \(u_1^*(x_1)\).</p>
<p>Adding the control cost to \eqref{eq:potential_simplified}, the combined cost of the
two red factors in <a href="#fig_eliminate_u_a" onclick="currentSlide(3,0)">Figure 3a</a> is given by:</p>
<p>\begin{equation} \phi_3(x_1, u_1) = u_1^TRu_1 + (Ax_1 + Bu_1)^T Q (Ax_1 + Bu_1)
\label{eq:potential_u1} \end{equation}</p>
<p>$\phi_3$ is sometimes referred to as the <em>optimal action value function</em> and we seek to minimize it over $u_1$.
We do so by
setting the derivative of \eqref{eq:potential_u1} wrt $u_1$ to zero
<!-- (detailed calculation in the [Appendix](#eliminate-u_1)), -->
yielding the expression for the optimal control input $u_1^*$ as</p>
<p>\[ \begin{align}
u_1^*(x_1) &= \argmin\limits_{u_1}\phi_3(x_1, u_1) \nonumber \\
&= -(R+B^TQB)^{-1}B^TQAx_1 \label{eq:control_law} \\
&= K_1x_1 \nonumber
\end{align} \]</p>
<p>where $K_1\coloneqq -(R+B^TQB)^{-1}B^TQA$.</p>
<p>Finally, we substitute the expression of our optimal control, \(u_1^* = K_1x_1\),
into our potential \eqref{eq:potential_u1}
<!-- (detailed calculation in the [Appendix](#marginalization-cost-on-x_1)) -->
to obtain a new unary cost factor on $x_1$:</p>
<p>\begin{align}
\phi_4(x_1) &= \phi_3(x_1, u_1^*(x_1)) \nonumber \\
&= (K_1x_1)^T RK_1x_1 + (Ax_1 + BKx_1)^T Q (Ax_1 + BKx_1) \nonumber \\
&= x_1^T(A^TQA-K_1^TB^TQA)x_1 \label{eq:potential_x1}
\end{align}
Note that we simplified $K_1^TRK_1 + K_1^TB^TQBK_1 = -K_1^TB^TQA$ by substituting in for $K_1$ using
\eqref{eq:control_law}.</p>
<p>The resulting factor graph is illustrated in <a href="#fig_eliminate_u_b" onclick="currentSlide(4,0)">Figure 3b</a>.</p>
<p>For convenience, we will also define $P_k$ where $x_k^TP_kx_k$ represents the aggregate of the two unary costs on $x_k$. In the case of $P_1$,
\begin{align}
x_1^TP_1x_1 &= x_1^TQx_1 + \phi_4(x_1) \nonumber
\end{align}
is the aggregation of the two unary factors labeled in green in <a href="#fig_merge_factor" onclick="currentSlide(5,0)">Figure 3c</a>.</p>
<p style="background-color: rgba(0,0,0,0.1);
margin: 0px -8px 0 -8px;
padding: 0 8px;">
<span align="left"><a onclick="currentSlide(2,0)"><< prev</a><span style="float:right"><a onclick="currentSlide(6,0)">next >></a></span></span>
</p>
</div>
<!-- ************** BAYES NET ************** -->
<div id="sec:elim_bayes_div" class="slideout">
<p><a id="sec:elim_bayes"></a></p>
<h3 id="turning-into-a-bayes-network">Turning into a Bayes Network</h3>
<p>By eliminating all the variables from right to left, we can get a Bayes network
as shown in <a href="#fig_bayes_net" onclick="currentSlide(9,0)">Figure 4d</a>. Each time we eliminate a state
and control, we simply repeat the steps in <a href="#eliminate-a-state">Eliminate a state</a> and <a href="#eliminate-a-control">Eliminate a control</a>: we express the state $x_{k+1}$ with the dynamics model, then find the optimal control $u_k$ as
a function of state $x_k$.</p>
<p>Eliminating a general state, $x_{k+1}$, and control $u_k$, we obtain the recurrence relations:</p>
<p>\begin{equation} \boxed{K_k = -(R+B^TP_{k+1}B)^{-1}B^TP_{k+1}A} \label{eq:control_update_k} \end{equation}</p>
<p>\begin{equation} \boxed{P_k = Q+A^TP_{k+1}A - K_k^TB^TP_{k+1}A} \label{eq:cost_update_k} \end{equation}</p>
<p>with $P_{N}=Q$ is the cost at the last time step.</p>
<p>The final Bayes net in <a href="#fig_bayes_net" onclick="currentSlide(9,0)">Figure 4d</a> shows graphically the optimal control law:
\begin{equation} \boxed{u^*_k = K_k x_k} \end{equation}</p>
<p style="background-color: rgba(0,0,0,0.1);
margin: 0px -8px 0 -8px;
padding: 0 8px;">
<span align="left"><a onclick="currentSlide(5,0)"><< prev</a></span>
</p>
</div>
</div>
<p><!-- scrollablecontent -->
<!-- ************************ END SCROLLABLE ELIMINATION DESCRIPTION ************************ --></p>
<h2 id="intuition">Intuition</h2>
<!-- ************** Value Function ************** -->
<p><a name="LQR_example"></a></p>
<figure class="center" style="width:90%;padding:10px">
<a href="/assets/images/lqr_control/LQR_FGvsRicatti.png"><img src="/assets/images/lqr_control/LQR_FGvsRicatti.png" alt="Comparison between LQR control as solved by factor graphs and by the Ricatti Equation. (they are the same)" style="margin-bottom:10px; max-width: 110%; margin-left: -5%;" /></a>
<figcaption><b>Figure 5</b> Example LQR control solutions as solved by factor graphs (middle) and the traditional Discrete Algebraic Ricatti Equations (right). The optimal control gains and cost-to-go factors are compared (left). All plots show exact agreement between factor graph and Ricatti equation solutions.</figcaption>
</figure>
<p>We introduce the <strong>cost-to-go</strong> (also known as <em>return cost</em>, <em>optimal state value function</em>, or simply <em>value function</em>) as $V_k(x) \coloneqq x^TP_kx$ which intuitively represents <em>the total cost that will be accrued from here on out, assuming optimal control</em>.</p>
<p>In our factor graph representation, it is becomes obvious that $V_k(x)$ corresponds to the total cost at and after the state $x_k$ assuming optimal control because we eliminate variables backwards in time with the objective of minimizing cost.
Eliminating a state just re-expresses the future cost in terms of prior states/controls. Each time we eliminate a control, $u$, the future cost is recalculated assuming optimal control (i.e. $\phi_4(x) = \phi_3(x, u^*)$).</p>
<p>This “cost-to-go” is depicted as a heatmap in <a href="#LQR_example">Figure 5</a>.
The heat maps depict the $V_k$ showing that the cost is high when $x$ is far from 0, but also showing that after iterating sufficient far backwards in time, $V_k(x)$ begins to converge. That is to say, the $V_0(x)$ is very similar for $N=30$ and $N=100$.
Similarly, the leftmost plot of <a href="#LQR_example">Figure 5</a> depicts $K_k$ and $P_k$ and shows that they (predictably) converge as well.</p>
<p>This convergence allows us to see that we can extend to the <a href="https://en.wikipedia.org/wiki/Linear%E2%80%93quadratic_regulator#Infinite-horizon,_discrete-time_LQR">infinite horizon LQR problem</a> (continued in the next section).</p>
<!-- The factor graph representation also gives us insight to the equation for the optimal gain matrix $K_k$ from
\eqref{eq:control_update_k}.
The optimal control, $K_k$, should attempt to balance (a) the unary factor $u_k^TRu_k$ representing the cost of executing a control action and (b) the binary factor $(Ax_k+Bu_k)^TP_{k+1}(Ax_k+Bu_k)$ representing the future cost of the control action.
The binary factor consists of two terms
represents a balance between achieving a small "cost-to-go" next time step ($B^TP_{k+1}B$) and exerting a small
amount of control this time step ($R$). -->
<h2 id="equivalence-to-the-ricatti-equation">Equivalence to the Ricatti Equation</h2>
<p>In traditional descriptions of discrete, finite-horizon LQR (i.e. <a href="https://www.amazon.com/Analysis-Control-Dynamic-Economic-Systems/dp/0898749697">Chow</a>, <a href="https://pdfs.semanticscholar.org/9777/06d1dc022280f47a2c67c646e85f38d88fe2.pdf#page=86">Kirk</a>, <a href="https://stanford.edu/class/ee363/lectures/dlqr.pdf">Stanford</a>), the control law and cost function are given by</p>
<p>\[ u_k = K_kx_k \]</p>
<p>\begin{equation} K_k = -(R+B^TP_{k+1}B)^{-1}B^TP_{k+1}A \label{eq:control_update_k_ricatti} \end{equation}</p>
<p>\begin{equation} P_k = Q+A^TP_{k+1}A - K_k^TB^TP_{k+1}A \label{eq:cost_update_k_ricatti} \end{equation}</p>
<p>with $P_k$ commonly referred to as the solution to the <strong>dynamic Ricatti equation</strong> and $P_N=Q$ is the
value of the Ricatti function at the final time step.
\eqref{eq:control_update_k_ricatti} and \eqref{eq:cost_update_k_ricatti} correspond to
the same results as we derived in \eqref{eq:control_update_k} and \eqref{eq:cost_update_k}
respectively.</p>
<p>Recall that $P_0$ and $K_0$ appear to converge as the number of time steps grows. They will approach a stationary solution to the equations</p>
<p>\begin{align}
K &= -(R+B^TPB)^{-1}B^TPA \nonumber \\
P &= Q+A^TPA - K^TB^TPA \nonumber
\end{align}</p>
<p>as $N\to\infty$. This is the <a href="https://en.wikipedia.org/wiki/Algebraic_Riccati_equation">Discrete Algebraic Ricatti Equations (DARE)</a> and $\lim_{N\to\infty}V_0(x)$ and $\lim_{N\to\infty}K_0$ are the cost-to-go and optimal control gain respectively for the <a href="https://en.wikipedia.org/wiki/Linear%E2%80%93quadratic_regulator#Infinite-horizon,_discrete-time_LQR">infinite horizon LQR problem</a>. Indeed, one way to calculate the solution to the DARE is to iterate on the dynamic Ricatti equation.</p>
<h2 id="implementation-using-gtsam">Implementation using GTSAM</h2>
<p>**Edit (Apr 17, 2021): Code updated to new Python wrapper as of GTSAM 4.1.0.</p>
<p>You can view an example Jupyter notebook on <a href="https://colab.research.google.com/drive/1pIUC6fQVMEaQ7QfJk8BvD0F60gShj3F4#sandboxMode=true" target="_blank">google colab</a> or
<a href="/assets/code_samples/lqr_control.zip" download="">download</a> the modules/examples
that you can use in your
projects to:</p>
<ul>
<li>Calculate the closed loop gain matrix, K, using GTSAM</li>
<li>Calculate the “cost-to-go” matrix, P (which is equivalent to the solutions to
the dynamic Ricatti equation), using GTSAM</li>
<li>Calculate the LQR solution for a non-zero, non-constant goal position, using GTSAM</li>
<li>Visualize the cost-to-go and how it relates to factor graphs and the Ricatti
equation</li>
<li>and more!</li>
</ul>
<p>A brief example of the open-loop finite horizon LQR problem using
factor graphs is shown below:</p>
<div class="scrollablecontent" style="overflow: auto; height:600px;">
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">solve_lqr</span><span class="p">(</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">Q</span><span class="p">,</span> <span class="n">R</span><span class="p">,</span> <span class="n">X0</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="mf">0.</span><span class="p">,</span> <span class="mf">0.</span><span class="p">]),</span> <span class="n">num_time_steps</span><span class="o">=</span><span class="mi">500</span><span class="p">):</span>
<span class="s">'''Solves a discrete, finite horizon LQR problem given system dynamics in
state space representation.
Arguments:
A, B: nxn state transition matrix and nxp control input matrix
Q, R: nxn state cost matrix and pxp control cost matrix
X0: initial state (n-vector)
num_time_steps: number of time steps, T
Returns:
x_sol, u_sol: Txn array of states and Txp array of controls
'''</span>
<span class="n">n</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">size</span><span class="p">(</span><span class="n">A</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">size</span><span class="p">(</span><span class="n">B</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
<span class="c1"># noise models
</span> <span class="n">prior_noise</span> <span class="o">=</span> <span class="n">gtsam</span><span class="p">.</span><span class="n">noiseModel</span><span class="p">.</span><span class="n">Constrained</span><span class="p">.</span><span class="n">All</span><span class="p">(</span><span class="n">n</span><span class="p">)</span>
<span class="n">dynamics_noise</span> <span class="o">=</span> <span class="n">gtsam</span><span class="p">.</span><span class="n">noiseModel</span><span class="p">.</span><span class="n">Constrained</span><span class="p">.</span><span class="n">All</span><span class="p">(</span><span class="n">n</span><span class="p">)</span>
<span class="n">q_noise</span> <span class="o">=</span> <span class="n">gtsam</span><span class="p">.</span><span class="n">noiseModel</span><span class="p">.</span><span class="n">Gaussian</span><span class="p">.</span><span class="n">Information</span><span class="p">(</span><span class="n">Q</span><span class="p">)</span>
<span class="n">r_noise</span> <span class="o">=</span> <span class="n">gtsam</span><span class="p">.</span><span class="n">noiseModel</span><span class="p">.</span><span class="n">Gaussian</span><span class="p">.</span><span class="n">Information</span><span class="p">(</span><span class="n">R</span><span class="p">)</span>
<span class="c1"># Create an empty Gaussian factor graph
</span> <span class="n">graph</span> <span class="o">=</span> <span class="n">gtsam</span><span class="p">.</span><span class="n">GaussianFactorGraph</span><span class="p">()</span>
<span class="c1"># Create the keys corresponding to unknown variables in the factor graph
</span> <span class="n">X</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">U</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">k</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">num_time_steps</span><span class="p">):</span>
<span class="n">X</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">gtsam</span><span class="p">.</span><span class="n">symbol</span><span class="p">(</span><span class="s">'x'</span><span class="p">,</span> <span class="n">k</span><span class="p">))</span>
<span class="n">U</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">gtsam</span><span class="p">.</span><span class="n">symbol</span><span class="p">(</span><span class="s">'u'</span><span class="p">,</span> <span class="n">k</span><span class="p">))</span>
<span class="c1"># set initial state as prior
</span> <span class="n">graph</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">X</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">np</span><span class="p">.</span><span class="n">eye</span><span class="p">(</span><span class="n">n</span><span class="p">),</span> <span class="n">X0</span><span class="p">,</span> <span class="n">prior_noise</span><span class="p">)</span>
<span class="c1"># Add dynamics constraint as ternary factor
</span> <span class="c1"># A.x1 + B.u1 - I.x2 = 0
</span> <span class="k">for</span> <span class="n">k</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">num_time_steps</span><span class="o">-</span><span class="mi">1</span><span class="p">):</span>
<span class="n">graph</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">X</span><span class="p">[</span><span class="n">k</span><span class="p">],</span> <span class="n">A</span><span class="p">,</span> <span class="n">U</span><span class="p">[</span><span class="n">k</span><span class="p">],</span> <span class="n">B</span><span class="p">,</span> <span class="n">X</span><span class="p">[</span><span class="n">k</span><span class="o">+</span><span class="mi">1</span><span class="p">],</span> <span class="o">-</span><span class="n">np</span><span class="p">.</span><span class="n">eye</span><span class="p">(</span><span class="n">n</span><span class="p">),</span>
<span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">((</span><span class="n">n</span><span class="p">)),</span> <span class="n">dynamics_noise</span><span class="p">)</span>
<span class="c1"># Add cost functions as unary factors
</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">X</span><span class="p">:</span>
<span class="n">graph</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">np</span><span class="p">.</span><span class="n">eye</span><span class="p">(</span><span class="n">n</span><span class="p">),</span> <span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">n</span><span class="p">),</span> <span class="n">q_noise</span><span class="p">)</span>
<span class="k">for</span> <span class="n">u</span> <span class="ow">in</span> <span class="n">U</span><span class="p">:</span>
<span class="n">graph</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">u</span><span class="p">,</span> <span class="n">np</span><span class="p">.</span><span class="n">eye</span><span class="p">(</span><span class="n">p</span><span class="p">),</span> <span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">p</span><span class="p">),</span> <span class="n">r_noise</span><span class="p">)</span>
<span class="c1"># Solve
</span> <span class="n">result</span> <span class="o">=</span> <span class="n">graph</span><span class="p">.</span><span class="n">optimize</span><span class="p">()</span>
<span class="n">x_sol</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">((</span><span class="n">num_time_steps</span><span class="p">,</span> <span class="n">n</span><span class="p">))</span>
<span class="n">u_sol</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">((</span><span class="n">num_time_steps</span><span class="p">,</span> <span class="n">p</span><span class="p">))</span>
<span class="k">for</span> <span class="n">k</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">num_time_steps</span><span class="p">):</span>
<span class="n">x_sol</span><span class="p">[</span><span class="n">k</span><span class="p">,</span> <span class="p">:]</span> <span class="o">=</span> <span class="n">result</span><span class="p">.</span><span class="n">at</span><span class="p">(</span><span class="n">X</span><span class="p">[</span><span class="n">k</span><span class="p">])</span>
<span class="n">u_sol</span><span class="p">[</span><span class="n">k</span><span class="p">]</span> <span class="o">=</span> <span class="n">result</span><span class="p">.</span><span class="n">at</span><span class="p">(</span><span class="n">U</span><span class="p">[</span><span class="n">k</span><span class="p">])</span>
<span class="k">return</span> <span class="n">x_sol</span><span class="p">,</span> <span class="n">u_sol</span>
</code></pre></div> </div>
</div>
<h2 id="future-work">Future Work</h2>
<p>The factor graph <a href="#fg_scratch">(Figure 1)</a> for our finite horizon discrete LQR problem can be readily extended to LQG, iLQR, DDP, and reinforcement
learning using non-deterministic dynamics factors, nonlinear factors, discrete factor graphs, and other features of GTSAM (stay tuned for future posts).</p>
<p><br /></p>
<hr />
<p><br /></p>
<!-- ********************************** APPENDIX ********************************** -->
<h2 id="appendix">Appendix</h2>
<!-- ### Marginalization Cost on $x_1$
By substituting \eqref{eq:control_law} into \eqref{eq:potential_simplified}, we have the updated
potential function as a function of only $x_1$:
\\[ \begin{aligned}
\phi_1(x_1) &= x_1^T Q x_1 + (K_1x_1)^T RK_1x_1 + (Ax_1 + BKx_1)^T Q (Ax_1 + BKx_1) \\\\
&= x_1^T(Q+ K_1^TRK_1 + A^TQA + K_1^TB^TQB - K_1^TB^TQA - A^TQBK_1)x_1 \\\\
&= x_1^T[Q + A^TQA + K_1^T(R+B^TQB)K_1 - K_1^TB^TQA - A^TQBK_1]x_1 \\\\
&= x_1^T(Q + A^TQA + A^TQBK_1 - K_1^TB^TQA - A^TQBK_1)x_1 \\\\
&= x_1^T(Q + A^TQA - K_1^TB^TQA)x_1
\end{aligned} \\] -->
<h3 id="least-squares-implementation-in-gtsam">Least Squares Implementation in GTSAM</h3>
<p>GTSAM can be specified to use either of two methods for solving the least squares problems that
appear in eliminating factor graphs: Cholesky Factorization or QR Factorization. Both arrive at the same result, but we will take a look at QR since it more immediately illustrates the elimination algorithm at work.</p>
<!-- plain table for formatting purposes -->
<style>
table, caption, tbody, tfoot, thead, table tr, table th, table tr:nth-child(even), td {
margin: 0;
padding: 0;
border: 0;
outline: 0;
font-size: 100%;
font-weight: normal;
vertical-align: baseline;
background: transparent;
background-color: transparent;
}
table th {
padding: 3px;
}
</style>
<h4 id="qr-factorization">QR Factorization</h4>
<!-- ************************ QR block elimination ************************ -->
<!-- Slideshow container, based on https://www.w3schools.com/howto/howto_js_slideshow.asp -->
<p><a name="fig:qr_elim"></a></p>
<div class="slideshow-container" style="min-height:3in;">
<div class="mySlides 1" style="text-align: center;">
<figure class="center" style="width:75%">
<div align="left" style="width:100%; height:2.6in; overflow:auto">
<p>\( \begin{array}{cc}
\text{NM} & \text{Elimination Matrix} \\
\left[ \begin{array}{c}
\vphantom{Q^{1/2} | } I\\
\vphantom{I | } 0\\
\vphantom{R^{1/2} | } I\\
\vphantom{Q^{1/2} | } I\\
\vphantom{-A | } 0\\
\vphantom{R^{1/2} | } I\\
\vphantom{Q^{1/2} | } I
\end{array} \right]
&
\left[ \begin{array}{ccccc|c}
Q^{1/2} & & & & & 0\\
I & -B & -A & & & 0\\
& R^{1/2} & & & & 0\\
& & Q^{1/2}& & & 0\\
& & I & -B & -A & 0\\
& & & R^{1/2}& & 0\\
& & & & Q^{1/2}& 0
\end{array} \right]
\end{array} \)</p>
</div>
<img src="/assets/images/lqr_control/Elimination/cropped_Slide0.png" alt="factor graph partially eliminated" />
<figcaption><b>Figure 6a</b> Initial factor graph and elimination matrix</figcaption>
</figure>
</div>
<div class="mySlides 1" style="text-align: center;">
<figure class="center" style="width:75%">
<div align="left" style="width:100%; height:2.6in; overflow:auto">
<p>\( \begin{array}{cc}
\text{NM} & \text{Elimination Matrix} \\
\begin{bmatrix}
\vphantom{Q^{1/2} | } I\\
\vphantom{I | } 0\\
\vphantom{R^{1/2} | } I\\
\vphantom{Q^{1/2} | } I\\
\vphantom{I | } 0\\
\vphantom{R^{1/2} | } I\\
\vphantom{Q^{1/2} | } I
\end{bmatrix} &
\left[ \begin{array}{ccccc|c}
\color{red} {Q^{1/2}} & & & & & 0\\
\color{red} I & \color{red} {-B} & \color{red} {-A} & & & 0\\
& R^{1/2} & & & & 0\\
& & Q^{1/2}& & & 0\\
& & I & -B & -A & 0\\
& & & R^{1/2}& & 0\\
& & & & Q^{1/2}& 0
\end{array} \right]
\end{array} \)</p>
</div>
<img src="/assets/images/lqr_control/Elimination/cropped_Slide1.png" alt="factor graph partially eliminated" />
<figcaption><b>Figure 6b</b> Eliminate $x_2$: the two factors to replace are highlighted in red</figcaption>
</figure>
</div>
<div class="mySlides 1" style="text-align: center;">
<figure class="center" style="width:75%">
<div align="left" style="width:100%; height:2.6in; overflow:auto">
<p>\( \begin{array}{cc}
\text{NM} & \text{Elimination Matrix} \\
\begin{bmatrix}
\vphantom{I | } 0\\
\vphantom{Q^{1/2} | } I\\
\vphantom{R^{1/2} | } I\\
\vphantom{Q^{1/2} | } I\\
\vphantom{I | } 0\\
\vphantom{R^{1/2} | } I\\
\vphantom{Q^{1/2} | } I\\
\end{bmatrix} &
\left[ \begin{array}{c:cccc|c}
I & -B & -A & & & 0\\
\hdashline
& \color{blue} {Q^{1/2}B} & \color{blue} {Q^{1/2}A} & & & 0\\
& R^{1/2} & & & & 0\\
& & Q^{1/2}& & & 0\\
& & I & -B & -A & 0\\
& & & R^{1/2}& & 0\\
& & & & Q^{1/2}& 0
\end{array} \right]
\end{array} \)</p>
</div>
<img src="/assets/images/lqr_control/Elimination/cropped_Slide2.png" alt="factor graph partially eliminated" />
<figcaption><b>Figure 6c</b> Eliminated $x_2$: the resulting binary cost factor is highlighted in blue</figcaption>
</figure>
</div>
<div class="mySlides 1" style="text-align: center;">
<figure class="center" style="width:75%">
<div align="left" style="width:100%; height:2.6in; overflow:auto">
<p>\( \begin{array}{cc}
\text{NM} & \text{Elimination Matrix} \\
\begin{bmatrix}
\vphantom{I | } 0\\
\vphantom{Q^{1/2} | } I\\
\vphantom{R^{1/2} | } I\\
\vphantom{Q^{1/2} | } I\\
\vphantom{I | } 0\\
\vphantom{R^{1/2} | } I\\
\vphantom{Q^{1/2} | } I\\
\end{bmatrix} &
\left[ \begin{array}{c:cccc|c}
I & -B & -A & & & 0\\
\hdashline
& \color{red} {Q^{1/2}B} & \color{red} {Q^{1/2}A} & & & 0\\
& \color{red} {R^{1/2}} & & & & 0\\
& & {Q^{1/2}} & & & 0\\
& & I & -B & -A & 0\\
& & & R^{1/2}& & 0\\
& & & & Q^{1/2}& 0
\end{array} \right]
\end{array} \)</p>
</div>
<img src="/assets/images/lqr_control/Elimination/cropped_Slide3.png" alt="factor graph partially eliminated" />
<figcaption><b>Figure 6d</b> Eliminate $u_1$: the two factors to replace are highlighted in red</figcaption>
</figure>
</div>
<div class="mySlides 1" style="text-align: center;">
<figure class="center" style="width:75%">
<div align="left" style="width:100%; height:2.6in; overflow:auto">
<p>\( \begin{array}{cc}
\text{NM} & \text{Elimination Matrix} \\
\begin{bmatrix}
\vphantom{I | } 0\\
\vphantom{D_1^{1/2} | } I\\
\vphantom{(P_1-Q)^{1/2} | } I\\
\vphantom{Q^{1/2} | } I\\
\vphantom{I | } 0\\
\vphantom{R^{1/2} | } I\\
\vphantom{Q^{1/2} | } I\\
\end{bmatrix} &
\left[ \begin{array}{cc:ccc|c}
I & -B & -A & & & 0\\
& D_1^{1/2} & -D_1^{1/2}K_1 & & & 0\\
\hdashline
& & \color{blue} {(P_1-Q)^{1/2}} & & & 0\\
& & Q^{1/2} & & & 0\\
& & I & -B & -A & 0\\
& & & R^{1/2}& & 0\\
& & & & Q^{1/2}& 0
\end{array} \right]
\end{array} \)</p>
</div>
<img src="/assets/images/lqr_control/Elimination/cropped_Slide4.png" alt="factor graph partially eliminated" />
<figcaption><b>Figure 6e</b> Eliminated $u_1$: the resulting unary cost factor on $x_1$ is shown in blue</figcaption>
</figure>
</div>
<div class="mySlides 1" style="text-align: center;">
<figure class="center" style="width:75%">
<div align="left" style="width:100%; height:2.6in; overflow:auto">
<p>\( \begin{array}{cc}
\text{NM} & \text{Elimination Matrix} \\
\begin{bmatrix}
\vphantom{I | } 0\\
\vphantom{D_1^{1/2} | } I\\
\vphantom{(P_1-Q)^{1/2} | } I\\
\vphantom{Q^{1/2} | } I\\
\vphantom{I | } 0\\
\vphantom{R^{1/2} | } I\\
\vphantom{Q^{1/2} | } I\\
\end{bmatrix} &
\left[ \begin{array}{cc:ccc|c}
I & -B & -A & & & 0\\
& D_1^{1/2} & -D_1^{1/2}K_1 & & & 0\\
\hdashline
& & \color{red} {(P_1-Q)^{1/2}} & & & 0\\
& & \color{red} {Q^{1/2}} & & & 0\\
& & \color{red} I & \color{red} {-B} & \color{red} {-A} & 0\\
& & & R^{1/2}& & 0\\
& & & & Q^{1/2}& 0
\end{array} \right]
\end{array} \)</p>
</div>
<img src="/assets/images/lqr_control/Elimination/cropped_Slide6.png" alt="factor graph partially eliminated" />
<figcaption><b>Figure 6f</b> Eliminate $x_1$: the three factors to replace are highlighted in red</figcaption>
</figure>
</div>
<div class="mySlides 1" style="text-align: center;">
<figure class="center" style="width:75%">
<div align="left" style="width:100%; height:2.6in; overflow:auto">
<p>\( \begin{array}{cc}
\text{NM} & \text{Elimination Matrix} \\
\begin{bmatrix}
\vphantom{I | } 0\\
\vphantom{D_1^{1/2}K_1 | } I\\
\vphantom{I | } 0\\
\vphantom{D_0^{1/2} | } I\\
\vphantom{P^{1/2} | } I\\
\vphantom{Q^{1/2} | } I\\
\end{bmatrix} &
\left[ \begin{array}{ccc:cc|c}
I & -B & -A & & & 0\\
& D_1^{1/2} & -D_1^{1/2}K_1& & & 0\\
& & I & -B & -A & 0\\
\hdashline
& & &\color{blue} {P_1^{1/2}B} & \color{blue} {P_1^{1/2}A} & 0\\
& & & R^{1/2}& & 0\\
& & & & Q^{1/2}& 0
\end{array} \right]
\end{array} \)</p>
</div>
<img src="/assets/images/lqr_control/Elimination/cropped_Slide7.png" alt="factor graph partially eliminated" />
<figcaption><b>Figure 6g</b> Eliminated $x_1$: the resulting binary cost factor is shown in blue</figcaption>
</figure>
</div>
<div class="mySlides 1" style="text-align: center;">
<figure class="center" style="width:75%">
<div align="left" style="width:100%; height:2.6in; overflow:auto">
<p>\( \begin{array}{cc}
\text{NM} & \text{Elimination Matrix} \\
\begin{bmatrix}
\vphantom{I | } 0\\
\vphantom{D_1^{1/2}K_1 | } I\\
\vphantom{I | } 0\\
\vphantom{D_0^{1/2} | } I\\
\vphantom{P^{1/2} | } I\\
\vphantom{Q^{1/2} | } I\\
\end{bmatrix} &
\left[ \begin{array}{ccc:cc|c}
I & -B & -A & & & 0\\
& D_1^{1/2} & -D_1^{1/2}K_1& & & 0\\
& & I & -B & -A & 0\\
\hdashline
& & &\color{red} {P_1^{1/2}B} &\color{red} {P_1^{1/2}A} & 0\\
& & &\color{red} {R^{1/2}} & & 0\\
& & & & Q^{1/2} & 0
\end{array} \right]
\end{array} \)</p>
</div>
<img src="/assets/images/lqr_control/Elimination/cropped_Slide8.png" alt="factor graph partially eliminated" />
<figcaption><b>Figure 6h</b> Eliminate $u_0$: the two cost factors to replace are shown in red</figcaption>
</figure>
</div>
<div class="mySlides 1" style="text-align: center;">
<figure class="center" style="width:75%">
<div align="left" style="width:100%; height:2.6in; overflow:auto">
<p>\( \begin{array}{cc}
\text{NM} & \text{Elimination Matrix} \\
\begin{bmatrix}
\vphantom{I | } 0\\
\vphantom{D_1^{1/2}K_1 | } I\\
\vphantom{I | } 0\\
\vphantom{D_0^{1/2} | } I\\
\vphantom{(P_0-Q)^{1/2} | } I\\
\vphantom{P^{1/2} | } I\\
\end{bmatrix} &
\left[ \begin{array}{cccc:c|c}
I & -B & -A & & & 0\\
& D_1^{1/2} & -D_1^{1/2}K_1 & & & 0\\
& & I & -B & -A & 0\\
& & & D_0^{1/2} & -D_0^{1/2}K_0 & 0\\
\hdashline
& & & & \color{blue} {(P_0-Q)^{1/2}} & 0\\
& & & & Q^{1/2} & 0\\
\end{array} \right]
\end{array} \)</p>
</div>
<img src="/assets/images/lqr_control/Elimination/cropped_Slide9.png" alt="factor graph partially eliminated" />
<figcaption><b>Figure 6i</b> Eliminated $u_0$: the resulting unary cost factor on $x_0$ is shown in blue.</figcaption>
</figure>
</div>
<div class="mySlides 1" style="text-align: center;">
<figure class="center" style="width:75%">
<div align="left" style="width:100%; height:2.6in; overflow:auto">
<p>\( \begin{array}{cc}
\text{NM} & \text{Elimination Matrix} \\
\begin{bmatrix}
\vphantom{I | } 0\\
\vphantom{D_1^{1/2}K_1 | } I\\
\vphantom{I | } 0\\
\vphantom{D_0^{1/2} | } I\\
\vphantom{(P_0-Q)^{1/2} | } I\\
\vphantom{P^{1/2} | } I\\
\end{bmatrix} &
\left[ \begin{array}{cccc:c|c}
I & -B & -A & & & 0\\
& D_1^{1/2} & -D_1^{1/2}K_1 & & & 0\\
& & I & -B & -A & 0\\
& & & D_0^{1/2} & -D_0^{1/2}K_0 & 0\\
\hdashline
& & & & \color{red} {(P_0-Q)^{1/2}} & 0\\
& & & & \color{red} {Q^{1/2}} & 0\\
\end{array} \right]
\end{array} \)</p>
</div>
<img src="/assets/images/lqr_control/Elimination/cropped_Slide10.png" alt="factor graph partially eliminated" />
<figcaption><b>Figure 6j</b> Eliminate $x_0$: the final two factors to eliminate are shown in red.</figcaption>
</figure>
</div>
<div class="mySlides 1" style="text-align: center;">
<figure class="center" style="width:75%">
<div align="left" style="width:100%; height:2.6in; overflow:auto">
<p>\( \begin{array}{cc}
\text{NM} & \text{Elimination Matrix} \\
\begin{bmatrix}
\vphantom{I | } 0\\
\vphantom{D_1^{1/2}K_1 | } I\\
\vphantom{I | } 0\\
\vphantom{D_0^{1/2} | } I\\
\vphantom{P_0^{1/2} | } I\\
\end{bmatrix} &
\left[ \begin{array}{cccc:c|c}
I & -B & -A & & & 0\\
& D_1^{1/2} & -D_1^{1/2}K_1 & & & 0\\
& & I & -B & -A & 0\\
& & & D_0^{1/2} & -D_0^{1/2}K_0 & 0\\
\hdashline
& & & & \color{blue} {P_0^{1/2}} & 0\\
\end{array} \right]
\end{array} \)</p>
</div>
<img src="/assets/images/lqr_control/Elimination/cropped_Slide11.png" alt="factor graph partially eliminated" />
<figcaption><b>Figure 6k</b> Final result: after eliminating $x_0$, the elimination matrix is upper-triangular and we can read off the control laws.</figcaption>
</figure>
</div>
<!-- Next and previous buttons -->
<a class="prev" onclick="plusSlides(-1,1)">❮</a>
<a class="next" onclick="plusSlides(1,1)">❯</a>
</div>
<p><!-- slideshow-container --></p>
<!-- The dots/circles -->
<div style="text-align:center">
<span class="dot 1" onclick="currentSlide(1,1)"></span>
<span class="dot 1" onclick="currentSlide(2,1)"></span>
<span class="dot 1" onclick="currentSlide(3,1)"></span>
<span class="dot 1" onclick="currentSlide(4,1)"></span>
<span class="dot 1" onclick="currentSlide(5,1)"></span>
<span class="dot 1" onclick="currentSlide(6,1)"></span>
<span class="dot 1" onclick="currentSlide(7,1)"></span>
<span class="dot 1" onclick="currentSlide(8,1)"></span>
<span class="dot 1" onclick="currentSlide(9,1)"></span>
<span class="dot 1" onclick="currentSlide(10,1)"></span>
<span class="dot 1" onclick="currentSlide(11,1)"></span>
</div>
<p><br /></p>
<div style="overflow: auto">
<table style="width:6.1in; margin: 0 auto;">
<tr>
<th>where</th><th>$P_{k}$</th><th>$=$</th><th>$Q + A^TP_{k+1}A - K_k^TB^TP_{k+1}A$</th><th>($P_2=Q$)</th>
</tr><tr>
<th></th><th>$D_{k}$</th><th>$=$</th><th>$R + B^TP_{k+1}B$</th>
</tr><tr>
<th></th><th>$K_k$</th><th>$=$</th><th>$-D_{k}^{-1/2}(R + B^TP_{k+1}B)^{-T/2}B^TP_{k+1}A$</th>
</tr><tr>
<th></th><th></th><th>$=$</th><th>$-(R + B^TP_{k+1}B)^{-1}B^TP_{k+1}A$</th>
</tr>
</table>
</div>
<!-- ************************ end QR block elimination ************************ -->
<p><br /></p>
<p>The factorization process is illustrated in <a href="#fig:qr_elim">Figure 6</a> for a 3-time step factor graph, where the noise matrices and elimination
matrices are shown with the corresponding states of the graph. The noise matrix (NM) is $0$ for a
hard constraint and $I$ for a minimization objective. The elimination matrix is formatted as an
augmented matrix $[A|b]$ for the linear least squares problem $\argmin\limits_x||Ax-b||_2^2$
with ${x=[x_2;u_1;x_1;u_0;x_0]}$ is the vertical concatenation of all state and control vectors.
The recursive expressions for $P$, $D$, and $K$ when eliminating control variables (i.e. $u_1$ in <a href="#fig:qr_elim" onclick="currentSlide(5,1)">Figure 6e</a>) are derived from block QR Factorization.</p>
<p>Note that all $b_i=0$ in the augmented matrix for the LQR problem of finding minimal control to
reach state $0$, but simply changing values of $b_i$ intuitively extends GTSAM to solve
LQR problems whose objectives are to reach different states or even follow trajectories.</p>
<!-- ### Final Symbolic Expressions of Factor Graph Evaluation
In the above solution, we have
\\[ \begin{aligned}
K_1 &= -(R+B^TQB)^{-1}B^TQA\\\\
P_1 &= Q+A^TQA + A^TQBK_1\\\\
K_0 &= -(R+B^TV_1B)^{-1}B^TV_1A\\\\
P_0 &= Q + A^T V_1 A + A^T V_1 B K_0
\end{aligned} \\]
In general, the above factor graph and solution method can be expanded for an arbitrary number of time steps, $T$, arising in the iterative equations
\\[ \begin{aligned}
V_T &= Q \\\\
K_t &= -( R + B^T V_{t+1} B )^{-1} B^T V_{t+1} A \\\\
P_t &= Q + A^T V_{t+1} A + A^T V_{t+1} B K_t
\end{aligned} \\]
and
\\[ \begin{aligned}
u_t &= K_t x_t
\end{aligned} \\]
which match the traditional algorithm using the Ricatti Equation for solving the finite-horizon discrete-time LQR problem. As the number
of time steps grows, the solution for $V_0$ approaches the stationary solution to the algebraic
Ricatti equation and the solution for $K_0$ approaches the solution to the infinite-horizon
discrete-time LQR problem. -->
<!-- **************** JAVASCRIPT FOR SLIDESHOWS **************** -->
<script>
var slideIndex = [1,1];
showSlides(slideIndex[0], 0);
showSlides(slideIndex[1], 1);
// Next/previous controls
function plusSlides(n, which) {
showSlides(slideIndex[which] += n, which);
}
// Thumbnail image controls
function currentSlide(n, which) {
showSlides(slideIndex[which] = n, which);
}
// change image/slide
function showSlides(n, which, triggeredByScroll) {
var i;
var slides = document.getElementsByClassName("mySlides "+which);
var dots = document.getElementsByClassName("dot "+which);
if (n > slides.length) {slideIndex[which] = 1}
if (n < 1) {slideIndex[which] = slides.length}
for (i = 0; i < slides.length; i++) {
slides[i].style.display = "none";
}
for (i = 0; i < dots.length; i++) {
dots[i].className = dots[i].className.replace(" active", "");
}
slides[slideIndex[which]-1].style.display = "block";
dots[slideIndex[which]-1].className += " active";
if (which==1){
return
}
// when image changes, also scroll to the correct subsection in "Variable Elimination"
var scrollable = document.getElementById("sec:elim_scrollable");
var scrollLoc_state = document.getElementById("sec:elim_state").offsetTop - scrollable.offsetTop;
var scrollLoc_ctrl = document.getElementById("sec:elim_ctrl" ).offsetTop - scrollable.offsetTop;
var scrollLoc_bayes = document.getElementById("sec:elim_bayes").offsetTop - scrollable.offsetTop;
var scroll_cur = scrollable.scrollTop;
var scrollLoc;
var div_state = document.getElementById("sec:elim_state_div");
var div_ctrl = document.getElementById("sec:elim_ctrl_div");
var div_bayes = document.getElementById("sec:elim_bayes_div");
switch(slideIndex[which]) {
case 1:
case 2:
div_state.style.display = "block";
div_ctrl.style.display = "none";
div_bayes.style.display = "none";
// fadeIn(div_state);
// fadeOut(div_ctrl, div_state);
// fadeOut(div_bayes, div_state);
return;
case 3:
case 4:
case 5:
div_state.style.display = "none";
div_ctrl.style.display = "block";
div_bayes.style.display = "none";
// fadeOut(div_state, div_ctrl);
// div_state.classList.toggle('hide');
// fadeIn(div_ctrl);
// fadeOut(div_bayes, div_ctrl);
return;
case 6:
case 7:
case 8:
case 9:
div_state.style.display = "none";
div_ctrl.style.display = "none";
div_bayes.style.display = "block";
// fadeOut(div_state, div_bayes);
// fadeOut(div_ctrl, div_bayes);
// fadeIn(div_bayes);
return;
}
}
// // when scrolling through subsections in "Variable Elimination", also change the image to correspond
// document.getElementById("sec:elim_scrollable").addEventListener("scroll", function (event) {
// var scrollable = document.getElementById("sec:elim_scrollable");
// var scrollLoc_state = document.getElementById("sec:elim_state").offsetTop - scrollable.offsetTop;
// var scrollLoc_ctrl = document.getElementById("sec:elim_ctrl" ).offsetTop - scrollable.offsetTop;
// // var scrollLoc_value = document.getElementById("sec:elim_value").offsetTop - scrollable.offsetTop;
// var scrollLoc_bayes = document.getElementById("sec:elim_bayes").offsetTop - scrollable.offsetTop;
// var scroll = this.scrollTop;
// if (scroll < scrollLoc_ctrl) {
// if (slideIndex[0] > 2) {showSlides(slideIndex[0]=1, 0, true)}
// }
// // else if (scroll < scrollLoc_value) {
// // if ((slideIndex[0] < 3) || (slideIndex[0] > 4)) {showSlides(slideIndex[0]=3, 0, true)}
// // }
// else if ((scroll < scrollLoc_bayes) && (scroll < (scrollable.scrollHeight - scrollable.offsetHeight))) {
// if ((slideIndex[0] < 3) || (slideIndex[0] > 3)) {showSlides(slideIndex[0]=3, 0, true)}
// }
// else {
// if ((slideIndex[0] < 6)) {showSlides(slideIndex[0]=6, 0, true)}
// }
// });
function fadeOut(element, nextElement) {
if (element.style.display == "none") {
return;
}
element.addEventListener('webkitTransitionEnd', function () {
element.style.display = "none";
nextElement.style.display = "block";
}, {once: true});
element.addEventListener('mozTransitionEnd', function () {
element.style.display = "none";
nextElement.style.display = "block";
}, {once: true});
element.addEventListener('oTransitionEnd', function () {
element.style.display = "none";
nextElement.style.display = "block";
}, {once: true});
element.addEventListener('transitionend', function () {
element.style.display = "none";
nextElement.style.display = "block";
}, {once: true});
element.style.webkitTransitionDuration = "0.1s";
element.style.mozTransitionDuration = "0.1s";
element.style.oTransitionDuration = "0.1s";
element.style.transitionDuration = "0.1s";
element.style.opacity = "0";
}
function fadeIn(element) {
if (element.style.display == "block") {
return;
}
element.addEventListener('webkitTransitionEnd', function () {
}, {once: true});
element.addEventListener('mozTransitionEnd', function () {
}, {once: true});
element.addEventListener('oTransitionEnd', function () {
}, {once: true});
element.addEventListener('transitionend', function () {
}, {once: true});
element.style.webkitTransitionDuration = "0.1s";
element.style.mozTransitionDuration = "0.1s";
element.style.oTransitionDuration = "0.1s";
element.style.transitionDuration = "0.1s";
element.style.webkitTransitionDelay = "0.1s";
element.style.mozTransitionDelay = "0.1s";
element.style.oTransitionDelay = "0.1s";
element.style.transitionDelay = "0.1s";
element.style.opacity = "1";
}
</script>Look Ma, No RANSAC2019-09-20T00:00:00+00:002019-09-20T00:00:00+00:00http://gtsam.org/2019/09/20/robust-noise-model<p>Author: Varun Agrawal<br />
Website: <a href="https://varunagrawal.github.io">varunagrawal.github.io</a></p>
<p>(<em><strong>Psst</strong>, be sure to clone/download GTSAM 4.0.2 which resolves a bug in the Huber model discussed below, for correct weight behavior</em>)</p>
<h2 id="introduction">Introduction</h2>
<figure class="center">
<img src="/assets/images/robust_estimators/se2_matches.png" alt="Matches between books" />
<figcaption><b>Figure 1</b> Two books on a plane separated by an SE(2) transform, and some manually selected feature matches between them. There are some clearly incorrect matches, which are <b>outliers</b>.</figcaption>
</figure>
<p>Robust error models are powerful tools for supplementing parameter estimation algorithms with the added capabilities of modeling outliers. Parameter estimation is a fundamental tool used in many fields, especially in perception and robotics, and thus performing robust parameter estimation across a wide range of applications and scenarios is crucial to strong performance for many applications. This necessitates the need to manage outliers in our measurements, and robust error models provide us the means to do so. Robust error models are amenable to easy plug-and-play use in pre-existing optimization frameworks, requiring minimal changes to existing pipelines.</p>
<p>Using robust error models can even obviate the need for more complex, non-deterministic algorithms such as Random Sample Consensus (a.k.a. RANSAC), a fundamental tool for parameter estimation in many a roboticist’s toolbox for years. While RANSAC has proven to work well in practice, it might need a high number of runs to converge to a good solution. Robust error models are conceptually easy and intuitive and can be used by themselves or in conjunction with RANSAC.</p>
<p>In this blog post, we demonstrate the capabilities of robust error models which downweigh outliers to provide better estimates of the parameters of interest. To illustrate the benefits of robust error models, take a look at <strong>Figure 1</strong>. We show two books next to each other, transformed by an $SE(2)$ transform, with some manually labeled point matches between the two books. We have added some outliers to more realistically model the problem.
As we will see later in the post, the estimates from RANSAC and a robust estimation procedure are quite similar, even though the ratio of inliers to outliers is $2:1$.</p>
<h2 id="parameter-estimation---a-short-tutorial">Parameter Estimation - A Short Tutorial</h2>
<p>We begin by reviewing techniques for parameter estimation as outlined by the <a href="https://www.microsoft.com/en-us/research/wp-content/uploads/2016/11/RR-2676.pdf">tutorial by Zhengyou Zhang</a>, as applied to our $SE(2)$. Given some matches ${(s,s’)}$ between the two images, we want to estimate the the $SE(2)$ parameters $\theta$ that transform a feature $s$ in the first image to a feature $s’$ in the second image:</p>
<p>\[ s’ = f(\theta ; s) \]</p>
<p>Of course, this generalizes to other transformations, including 3D-2D problems. This is a ubiquitous problem seen in multiple domains of perception and robotics and is referred to as <strong>parameter estimation</strong>.</p>
<p>A standard framework to estimate the parameters is via the Least-Squares formulation. We usually have many more observations than we have parameters, i.e., the problem is now overdetermined. To handle this, we minimize the sum of square <strong>residuals</strong> $f(\theta ; s_i) - s’_i$, for $i\in1\ldots N$:</p>
<p>\[ E_{LS}(\theta) = \sum_i \vert\vert f(\theta ; s_i) - s’_i \vert\vert^2 \]</p>
<p>which we refer to as <strong>the cost function</strong> or <strong>the objective function</strong>.
In the case of $SE(2)$ the parameters $\theta$ should be some parameterization of a transformation matrix, having three degrees of freedom (DOF). A simple way to accomplish this is to have $\theta=(x,y,\alpha)$.</p>
<p>Our measurement functions are generally non-linear, and hence we need to linearize the measurement function around an estimate of $\theta$. GTSAM will iteratively do so via optimization procedures such as Gauss-Newton, Levenberg-Marquardt, or Dogleg. Linearization is done via the <strong>Taylor expansion</strong> around a linearization point $\theta_0$:</p>
<p>\[ f(\theta + \Delta\theta; s) = f(\theta; s) + J(\theta; s)\Delta\theta \]</p>
<p>This gives us the following linearized least squares objective function:</p>
<p>\[ E_{LS, \theta_0} = \sum_i \vert\vert f(\theta; s_i) + J(\theta; s_i)\Delta\theta - s_i’ \vert\vert^2 \]</p>
<p>Since the above is now linear in $\Delta\theta$, GTSAM can solve it using either sparse Cholesky or QR factorization.</p>
<h2 id="robust-error-models">Robust Error Models</h2>
<p>We have derived the basic parameter optimization approach in the previous section and seen how the choice of the optimization function affects the optimality of our solution. However, another aspect we need to take into account is the effect of outliers on our optimization and final parameter values.</p>
<p>By default, the optimization objectives outlined above try to model all measurements equally. This means that in the presence of outliers, the optimization process might give us parameter estimates that try to fit these outliers, sacrificing accuracy on the inliers. More formally, given the <em>residual</em> $r_i$ of the $i^{th}$ match, i.e. the difference between the $i^{th}$ observation and the fitted value, the standard least squares approach attempts to optimize the sum of all the squared residuals. This can lead to the estimated parameters being distorted due to the equal weighting of all data points. Surely, there must be a way for the objective to model inliers and outliers in a clever way based on the residual errors?</p>
<p>One way to tackle the presence of outliers is a family of models known as <strong>Robust Error Models</strong> or <strong>M-estimators</strong>. The M-estimators try to reduce the effect of outliers by replacing the squared residuals with a function of the residuals $\rho$ that weighs each residual term by some value:</p>
<p>\[ p = min \sum_i^n \rho(r_i) \]</p>
<p>To allow for optimization, we define $\rho$ to be a symmetric, positive-definite function, thus it has a unique minimum at zero, and is less increasing than square.</p>
<p>The benefit of this formulation is that we can now solve the above minimization objective as an <strong>Iteratively Reweighted Least Squares</strong> problem. The M-estimator of the parameter vector $p$ based on $\rho$ is the value of the parameters which solves</p>
<p>\[ \sum_i \psi(r_i)\frac{\delta r_i}{\delta p_j} = 0 \]</p>
<p>for $j = 1, …, m$ (recall that the maximum/minimum of a function is at the point its derivative is equal to zero). Above, $\psi(x) = \frac{\delta \rho(x)}{\delta x}$ is called the <strong>influence function</strong>, which we can use to define a <strong>weight function</strong> $w(x) = \frac{\psi{x}}{x}$ giving the original derivative as</p>
<p>\[ \sum_i w(r_i) r_i \frac{\delta r_i}{\delta p_j} = 0 \]</p>
<p>which is exactly the system of equations we obtain from iterated reweighted least squares.</p>
<p>In layman’s terms, the influence function $\psi(x)$ measures the influence of a data point on the value of the parameter estimate. This way, the estimated parameters are intelligent to outliers and only sensitive to inliers, since the are no longer susceptible to being significantly modified by a single match, thus making them <strong>robust</strong>.</p>
<h3 id="m-estimator-constraints">M-estimator Constraints</h3>
<p>While M-estimators provide us with significant benefits with respect to outlier modeling, they do come with some constraints which are required to enable their use in a wide variety of optimization problems.</p>
<ul>
<li>The influence function should be bounded.</li>
<li>The robust estimator should be unique, i.e. it should have a unique minimum. This implies that <em>the individual $\rho$-function is convex in variable <strong>p</strong></em>.</li>
<li>The objective should have a gradient, even when the 2nd derivative is singular.</li>
</ul>
<h3 id="common-m-estimators">Common M-estimators</h3>
<p>Below we list some of the common estimators from the literature and which are available out of the box in GTSAM. We also provide accompanying graphs of the corresponding <strong>$\rho$ function</strong>, the <strong>influence function</strong>, and the <strong>weight function</strong> in order, allowing one to visualize the differences and effects of each estimators influence function.</p>
<ol>
<li>
<p>Fair
<img src="/assets/images/robust_estimators/fair.png" alt="fair m-estimator" /></p>
</li>
<li>
<p>Huber
<img src="/assets/images/robust_estimators/huber.png" alt="huber m-estimator" /></p>
</li>
<li>
<p>Cauchy
<img src="/assets/images/robust_estimators/cauchy.png" alt="cauchy m-estimator" /></p>
</li>
<li>
<p>Geman-McClure
<img src="/assets/images/robust_estimators/gemanmcclure.png" alt="geman-mcclure m-estimator" /></p>
</li>
<li>
<p>Welsch
<img src="/assets/images/robust_estimators/welsch.png" alt="welsch m-estimator" /></p>
</li>
<li>
<p>Tukey
<img src="/assets/images/robust_estimators/tukey.png" alt="tukey m-estimator" /></p>
</li>
</ol>
<h2 id="example-with-huber-noise-model">Example with Huber Noise Model</h2>
<p>Now it’s time for the real deal. So far we’ve spoken about how great robust estimators are, and how they can be easily modeled in a least squares objective, but having a concrete example and application can really help illuminate these concepts and demonstrate the power of a robust error model. In this specific case we use the <strong>Huber M-estimator</strong>, though any other provided M-estimator can be used depending on the application or preference.</p>
<p>For our example application, the estimation of an $SE(2)$ transformation between two objects (a scenario commonly seen in PoseSLAM applications), we go back to our image of the two books from the introduction, which we have manually labeled with matches and outliers. A RANSAC estimate using the matches gives us the $SE(2)$ paramters <code class="language-plaintext highlighter-rouge">(347.15593, 420.31040, 0.39645)</code>.</p>
<figure class="center">
<img src="/assets/images/robust_estimators/se2_matches.png" alt="Matches between books" />
<figcaption><b>Figure 2</b>: Matches between the 2 books.</figcaption>
</figure>
<p>To begin, we apply a straightforward optimization process based on Factor Graphs. Using GTSAM, this can be achieved in a few lines of code. We show the core part of the example below, omitting the housekeeping and data loading for brevity.</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// This is the value we wish to estimate</span>
<span class="n">Pose2_</span> <span class="nf">pose_expr</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
<span class="c1">// Set up initial values, and Factor Graph</span>
<span class="n">Values</span> <span class="n">initial</span><span class="p">;</span>
<span class="n">ExpressionFactorGraph</span> <span class="n">graph</span><span class="p">;</span>
<span class="c1">// provide an initial estimate which is pretty much random</span>
<span class="n">initial</span><span class="p">.</span><span class="n">insert</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">Pose2</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mf">0.01</span><span class="p">));</span>
<span class="c1">// We assume the same noise model for all points (since it is the same camera)</span>
<span class="k">auto</span> <span class="n">measurementNoise</span> <span class="o">=</span> <span class="n">noiseModel</span><span class="o">::</span><span class="n">Isotropic</span><span class="o">::</span><span class="n">Sigma</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">);</span>
<span class="c1">// Now we add in the factors for the measurement matches.</span>
<span class="c1">// Matches is a vector of 4 tuples (index1, keypoint1, index2, keypoint2)</span>
<span class="kt">int</span> <span class="n">index_i</span><span class="p">,</span> <span class="n">index_j</span><span class="p">;</span>
<span class="n">Point2</span> <span class="n">p</span><span class="p">,</span> <span class="n">measurement</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="n">vector</span><span class="o"><</span><span class="n">tuple</span><span class="o"><</span><span class="kt">int</span><span class="p">,</span> <span class="n">Point2</span><span class="p">,</span> <span class="kt">int</span><span class="p">,</span> <span class="n">Point2</span><span class="o">>>::</span><span class="n">iterator</span> <span class="n">it</span> <span class="o">=</span> <span class="n">matches</span><span class="p">.</span><span class="n">begin</span><span class="p">();</span>
<span class="n">it</span> <span class="o">!=</span> <span class="n">matches</span><span class="p">.</span><span class="n">end</span><span class="p">();</span> <span class="o">++</span><span class="n">it</span><span class="p">)</span> <span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">tie</span><span class="p">(</span><span class="n">index_i</span><span class="p">,</span> <span class="n">measurement</span><span class="p">,</span> <span class="n">index_j</span><span class="p">,</span> <span class="n">p</span><span class="p">)</span> <span class="o">=</span> <span class="o">*</span><span class="n">it</span><span class="p">;</span>
<span class="n">Point2_</span> <span class="n">predicted</span> <span class="o">=</span> <span class="n">transformTo</span><span class="p">(</span><span class="n">pose_expr</span><span class="p">,</span> <span class="n">p</span><span class="p">);</span>
<span class="c1">// Add the Point2 expression variable, an initial estimate, and the measurement noise.</span>
<span class="n">graph</span><span class="p">.</span><span class="n">addExpressionFactor</span><span class="p">(</span><span class="n">predicted</span><span class="p">,</span> <span class="n">measurement</span><span class="p">,</span> <span class="n">measurementNoise</span><span class="p">);</span>
<span class="p">}</span>
<span class="c1">// Optimize and print basic result</span>
<span class="n">Values</span> <span class="n">result</span> <span class="o">=</span> <span class="n">LevenbergMarquardtOptimizer</span><span class="p">(</span><span class="n">graph</span><span class="p">,</span> <span class="n">initial</span><span class="p">).</span><span class="n">optimize</span><span class="p">();</span>
<span class="n">result</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"Final Result:</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
</code></pre></div></div>
<p>It is important to note that our initial estimate for the transform values is pretty arbitrary.
Running the above code give us the transform values <code class="language-plaintext highlighter-rouge">(305.751, 520.127, 0.284743)</code>, which when compared to the RANSAC estimate doesn’t look so good.</p>
<p>Now how about we try using M-estimators via the built-in robust error models? This is a two line change as illustrated below:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// This is the value we wish to estimate</span>
<span class="n">Pose2_</span> <span class="nf">pose_expr</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
<span class="c1">// Set up initial values, and Factor Graph</span>
<span class="n">Values</span> <span class="n">initial</span><span class="p">;</span>
<span class="n">ExpressionFactorGraph</span> <span class="n">graph</span><span class="p">;</span>
<span class="c1">// provide an initial estimate which is pretty much random</span>
<span class="n">initial</span><span class="p">.</span><span class="n">insert</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">Pose2</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mf">0.01</span><span class="p">));</span>
<span class="c1">// We assume the same noise model for all points (since it is the same camera)</span>
<span class="k">auto</span> <span class="n">measurementNoise</span> <span class="o">=</span> <span class="n">noiseModel</span><span class="o">::</span><span class="n">Isotropic</span><span class="o">::</span><span class="n">Sigma</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">);</span>
<span class="cm">/********* First change *********/</span>
<span class="c1">// We define our robust error model here, providing the default parameter value for the estimator.</span>
<span class="k">auto</span> <span class="n">huber</span> <span class="o">=</span> <span class="n">noiseModel</span><span class="o">::</span><span class="n">Robust</span><span class="o">::</span><span class="n">Create</span><span class="p">(</span><span class="n">noiseModel</span><span class="o">::</span><span class="n">mEstimator</span><span class="o">::</span><span class="n">Huber</span><span class="o">::</span><span class="n">Create</span><span class="p">(</span><span class="mf">1.345</span><span class="p">),</span> <span class="n">measurementNoise</span><span class="p">);</span>
<span class="c1">// Now we add in the factors for the measurement matches.</span>
<span class="c1">// Matches is a vector of 4 tuples (index1, keypoint1, index2, keypoint2)</span>
<span class="kt">int</span> <span class="n">index_i</span><span class="p">,</span> <span class="n">index_j</span><span class="p">;</span>
<span class="n">Point2</span> <span class="n">p</span><span class="p">,</span> <span class="n">measurement</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="n">vector</span><span class="o"><</span><span class="n">tuple</span><span class="o"><</span><span class="kt">int</span><span class="p">,</span> <span class="n">Point2</span><span class="p">,</span> <span class="kt">int</span><span class="p">,</span> <span class="n">Point2</span><span class="o">>>::</span><span class="n">iterator</span> <span class="n">it</span> <span class="o">=</span> <span class="n">matches</span><span class="p">.</span><span class="n">begin</span><span class="p">();</span>
<span class="n">it</span> <span class="o">!=</span> <span class="n">matches</span><span class="p">.</span><span class="n">end</span><span class="p">();</span> <span class="o">++</span><span class="n">it</span><span class="p">)</span> <span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">tie</span><span class="p">(</span><span class="n">index_i</span><span class="p">,</span> <span class="n">measurement</span><span class="p">,</span> <span class="n">index_j</span><span class="p">,</span> <span class="n">p</span><span class="p">)</span> <span class="o">=</span> <span class="o">*</span><span class="n">it</span><span class="p">;</span>
<span class="n">Point2_</span> <span class="n">predicted</span> <span class="o">=</span> <span class="n">transformTo</span><span class="p">(</span><span class="n">pose_expr</span><span class="p">,</span> <span class="n">p</span><span class="p">);</span>
<span class="c1">// Add the Point2 expression variable, an initial estimate, and the measurement noise.</span>
<span class="c1">// The graph takes in factors with the robust error model.</span>
<span class="cm">/********* Second change *********/</span>
<span class="n">graph</span><span class="p">.</span><span class="n">addExpressionFactor</span><span class="p">(</span><span class="n">predicted</span><span class="p">,</span> <span class="n">measurement</span><span class="p">,</span> <span class="n">huber</span><span class="p">);</span>
<span class="p">}</span>
<span class="c1">// Optimize and print basic result</span>
<span class="n">Values</span> <span class="n">result</span> <span class="o">=</span> <span class="n">LevenbergMarquardtOptimizer</span><span class="p">(</span><span class="n">graph</span><span class="p">,</span> <span class="n">initial</span><span class="p">).</span><span class="n">optimize</span><span class="p">();</span>
<span class="n">result</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"Final Result:</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
</code></pre></div></div>
<p>This version of the parameter estimation gives us the $SE(2)$ transform <code class="language-plaintext highlighter-rouge">(363.76, 458.966, 0.392419)</code>. Quite close for only 15 total matches, especially considering 5 of the 10 were outliers! The estimates are expected to improve with more matches to further constrain the problem.</p>
<p>As an alternate example, we look at an induced $SE(2)$ transform and try to estimate it from scratch (no manual match labeling. Our source image is one of crayon boxes retrieved from the internet, since this image lends itself to good feature detection.</p>
<p><img src="/assets/images/robust_estimators/original_image.png" alt="original image" /></p>
<p>To model an $SE(2)$ transformation, we apply a perspective transform to the image. The transformation applied is <code class="language-plaintext highlighter-rouge">(x, y, theta) = (4.711, 3.702, 0.13963)</code>. This gives us a ground truth value to compare our methods against. The transformed image can be seen below.</p>
<p><img src="/assets/images/robust_estimators/transformed_image.png" alt="Warped image" /></p>
<p>We run the standard pipeline of SIFT feature extraction and FLANN+KDTree based matching to obtain a set of matches. At this point we are ready to evaluate the different methods of estimating the transformation.</p>
<p>The vanilla version of the parameter estimation gives us <code class="language-plaintext highlighter-rouge">(6.26294, -14.6573, 0.153888)</code> which is pretty bad. However, the version with robust error models gives us <code class="language-plaintext highlighter-rouge">(4.75419, 3.60199, 0.139674)</code>, which is a far better estimate when compared to the ground truth, despite the initial estimate for the transform being arbitrary. This makes it apparent that the use of robust estimators and subsequently robust error models is the way to go for use cases where outliers are a concern. Of course, providing better initial estimates will only improve the final estimate.</p>
<p>You may ask how does this compare to our dear old friend RANSAC? Using the OpenCV implementation of RANSAC, a similar pipeline gives use the following $SE(2)$ values: <code class="language-plaintext highlighter-rouge">(4.77360, 3.69461, 0.13960)</code>.</p>
<p>We show, in order, the original image, its warped form, and the recovered image from the $SE(2)$ transformation estimation. The first image is from the ground truth transform, the second one is using RANSAC, the next one is using a vanilla parameter estimation approach, and the last one uses robust error models. As you can see, while the vanilla least-squares optimization result is poor compared to the ground truth, the RANSAC and the robust error model recover the transformation correctly, with the robust error model’s result being comparable to the RANSAC one.</p>
<figure class="center">
<img src="/assets/images/robust_estimators/ground_truth_images.png" alt="ground truth" />
<figcaption>Image warping and recovery using ground truth SE(2) transform. The first image is the original image, the 2nd image is the transformed image, and the last one is the image we get on applying the reverse $SE(2)$ transform.</figcaption>
</figure>
<figure class="center">
<img src="/assets/images/robust_estimators/ransac_images.png" alt="ransac" />
<figcaption>Image warping and recovery using RANSAC.</figcaption>
</figure>
<figure class="center">
<img src="/assets/images/robust_estimators/vanilla_model_images.png" alt="vanilla" />
<figcaption>Image warping and recovery using plain old parameter estimation. You can see that the 3rd (recovered) image does not line up correctly.</figcaption>
</figure>
<figure class="center">
<img src="/assets/images/robust_estimators/robust_model_images.png" alt="robust error model" />
<figcaption>Image warping and recovery using robust error models with parameter estimation. These results are comparable to the ones from RANSAC, demonstrating the promise of robust error models.</figcaption>
</figure>
<!-- That's pretty close too, so why go through the headache of using robust error models? For one, unlike RANSAC, robust error models are deterministic and possess defined behavior. Moreover, one does not need to run multiple runs of optimization to obtain a consistent result, compared to RANSAC which may require hundreds of runs to converge to a good result. -->
<!-- ## Robust Error Models + RANSAC -->
<h2 id="conclusion">Conclusion</h2>
<p>In this post, we have seen the basics of parameter estimation, a ubiquitous mathematical framework for many perception and robotics problems, and we have seen how this framework is susceptible to perturbations from outliers which can throw off the final estimate. More importantly, we have seen how a simple tool called the Robust M-estimator can easily help us deal with these outliers and their effects. An example case for $SE(2)$ transform estimation demonstrates not only their ease of use with GTSAM, but also the efficacy of the solution generated, especially when compared to widely used alternatives such as RANSAC.</p>
<p>Furthermore, robust estimators are deterministic, ameliorating the need for the complexity that comes inherent with RANSAC. While RANSAC is a great tool, robust error models provide us with a solid alternative to be considered. With the benefits of speed, accuracy, and ease of use, robust error models make a strong case for their adoption for many related problems and we hope you will give them a shot the next time you use GTSAM.</p>Author: Varun Agrawal Website: varunagrawal.github.io