GTSAM Posts

Author: Samarth Brahmbhatt

I am a post-doc at Intel ISL. In my PhD work, I used GTSAM for object pose estimation and 3D reconstruction of hand joints to study hand-object interaction. I love GTSAM for its ability to optimize geometric quantities and deal with uncertainty in observations. However, it is easy to get lost in the web of inverse transformations while optimizing complex geometric systems. I’ve found that naming code variables consistently with textbook notation conventions can help mitigate this. This post describes some suggestions for doing that.

Points

Always name your 3D points like how you would on paper. A point $^cX$ in the camera coordinate system $c$ is named cX.
3D points use uppercase letters, 2D points use lowercase letters.

Pose

Name your pose variables like how you would write them on paper. The pose $^wT_c$ of camera coordinate frame $c$ in the world coordinate frame $w$ is named wTc. In GTSAM jargon, c is the pose coordinate frame, and w is the world coordinate frame.

Composing Poses

Math: $^oT_c =~^oT_w~\cdot~^wT_c$ .

GTSAM code:

Pose3 oTw = Pose3(...);
Pose3 wTc = Pose3(...);
Pose3 oTc = oTw * wTc;
// same as Pose3 oTc = oTw.compose(wTc);

Transforming Points From Pose Coordinates

This operation uses the camera pose in the world coordinate frame ( $^wT_c$ ) to bring points from the camera coordinate frame ( $^c\homo{X}$ ) to the world coordinate frame ( $^w\homo{X}$ ).

Math: $^w\homo{X} =~^wT_c~\cdot~^c\homo{X}$

GTSAM Code:

Point3 wX = wTc.transformFrom(cX);
// same as Point3 wX = wTc * cX;

Transforming Points To Pose Coordinates

This operation uses the inverse of the camera pose $\left(^wT_c\right)^{-1}$ to bring points from the world coordinate frame ( $^w\homo{X}$ ) to the camera coordinate frame ( $^c\homo{X}$ ).

Math: $^c\homo{X} =~\left(^wT_c\right)^{-1}~\cdot~^w\homo{X}$

GTSAM Code:

Point3 cX = wTc.transformTo(wX);

Cameras

The GTSAM pinhole camera classes (e.g. PinholeBase) internally use transformTo() to transform 3D points into the camera coordinates, so you should use the pose of the camera w.r.t. world while constructing the object:

Pose3 wTc = Pose3(...);
SimpleCamera cam(wTc, K);

now you can use cam in the TriangulationFactor for example. Other factors like the GenericProjectionFactor also use the same convention: $\begin{align*} ^{sensor}\homo{X} &=~^{sensor}T_{world}~\cdot~^{world}\homo{X}\\ &=~^{sensor}T_{body}~\cdot~^{body}T_{world}~\cdot~^{world}\homo{X}\\ &= \left(^{world}T_{body}~\cdot~^{body}T_{sensor}\right)^{-1}~\cdot~^{world}\homo{X} \end{align*}$

Example: In a mobile robot SLAM scenario, wheel odometry might tell you where the robot body is in the world ( $^{world}T_{body}$ ), and the robot URDF spec might tell you where the camera is on the robot body ( $^{body}T_{sensor}$ ). Together, these allow you to situate camera observations in the world coordinate frame.

GTSAM Code:

Pose3 body_T_sensor = ...
Point2 sensor_p = ...  // 2D point in the image
// in the following factor,
// Symbol('T', i) is world_T_body for the i'th frame
// Symbol('X', j) is the j'th 3D point in world coordinates i.e. world_Xj
auto f = GenericProjectionFactor<Pose3, Point3, Cal3_S2>(sensor_p, noise, Symbol('T', i), Symbol('X', j), K, body_T_sensor);

It will project the world 3D point $^{world}\homo{X}$ into the sensor coordinates like so:

Pose3 world_T_sensor = world_T_body * body_T_sensor;
Point3 sensor_X = world_T_sensor.transformTo(world_X);

and then project it to the image using instrinsics and then compare it to the detection sensor_p.