Geometry and Variable Naming Conventions
GTSAM Posts
Author: Samarth Brahmbhatt
I am a post-doc at Intel ISL. In my PhD work, I used GTSAM for object pose estimation and 3D reconstruction of hand joints to study hand-object interaction. I love GTSAM for its ability to optimize geometric quantities and deal with uncertainty in observations. However, it is easy to get lost in the web of inverse transformations while optimizing complex geometric systems. I’ve found that naming code variables consistently with textbook notation conventions can help mitigate this. This post describes some suggestions for doing that.
Points
- Always name your 3D points like how you would on paper. A point $^cX$ in the
camera coordinate system $c$ is named
cX
. - 3D points use uppercase letters, 2D points use lowercase letters.
Pose
Name your pose variables like how you would write them on paper.
The pose $^wT_c$ of camera coordinate frame $c$ in the world coordinate frame
$w$ is named wTc
. In GTSAM jargon, c
is the pose coordinate frame,
and w
is the world coordinate frame.
Composing Poses
Math: $^oT_c =~^oT_w~\cdot~^wT_c$.
GTSAM code:
Pose3 oTw = Pose3(...);
Pose3 wTc = Pose3(...);
Pose3 oTc = oTw * wTc;
// same as Pose3 oTc = oTw.compose(wTc);
Transforming Points From Pose Coordinates
This operation uses the camera pose in the world coordinate frame ($^wT_c$) to bring points from the camera coordinate frame ($^c\homo{X}$) to the world coordinate frame ($^w\homo{X}$).
Math: $^w\homo{X} =~^wT_c~\cdot~^c\homo{X}$
GTSAM Code:
Point3 wX = wTc.transformFrom(cX);
// same as Point3 wX = wTc * cX;
Transforming Points To Pose Coordinates
This operation uses the inverse of the camera pose $\left(^wT_c\right)^{-1}$ to bring points from the world coordinate frame ($^w\homo{X}$) to the camera coordinate frame ($^c\homo{X}$).
Math: $^c\homo{X} =~\left(^wT_c\right)^{-1}~\cdot~^w\homo{X}$
GTSAM Code:
Point3 cX = wTc.transformTo(wX);
Cameras
The GTSAM pinhole camera classes
(e.g. PinholeBase
)
internally use transformTo()
to transform 3D points into the camera
coordinates, so you should use the pose of the camera w.r.t. world while
constructing the object:
Pose3 wTc = Pose3(...);
SimpleCamera cam(wTc, K);
now you can use cam
in the TriangulationFactor
for example. Other factors
like the
GenericProjectionFactor
also use the same convention:
\(\begin{align*}
^{sensor}\homo{X}
&=~^{sensor}T_{world}~\cdot~^{world}\homo{X}\\
&=~^{sensor}T_{body}~\cdot~^{body}T_{world}~\cdot~^{world}\homo{X}\\
&= \left(^{world}T_{body}~\cdot~^{body}T_{sensor}\right)^{-1}~\cdot~^{world}\homo{X}
\end{align*}\)
Example: In a mobile robot SLAM scenario, wheel odometry might tell you where the robot body is in the world ($^{world}T_{body}$), and the robot URDF spec might tell you where the camera is on the robot body ($^{body}T_{sensor}$). Together, these allow you to situate camera observations in the world coordinate frame.
GTSAM Code:
Pose3 body_T_sensor = ...
Point2 sensor_p = ... // 2D point in the image
// in the following factor,
// Symbol('T', i) is world_T_body for the i'th frame
// Symbol('X', j) is the j'th 3D point in world coordinates i.e. world_Xj
auto f = GenericProjectionFactor<Pose3, Point3, Cal3_S2>(sensor_p, noise, Symbol('T', i), Symbol('X', j), K, body_T_sensor);
It will project the world 3D point $^{world}\homo{X}$ into the sensor coordinates like so:
Pose3 world_T_sensor = world_T_body * body_T_sensor;
Point3 sensor_X = world_T_sensor.transformTo(world_X);
and then project it to the image using instrinsics and then compare it to the detection sensor_p
.