Introduction
Trying to build a cognitive map from First-Person videos that can be used to understand novel but similar environments, which will benefit visually-impaired people. Instructed by Prof. Hyun Soo Park and his group at UMN.
Demo: Groccery store data annotation with ECO;
Code
Local Egocentric Maps
)
Local Egocentric Maps
- Frontalization via Homography
- Rescaling for canonical depth viewpoint
My work mainly focus on the egocentric recognition of sections in supermarket with a novel interface by leveraging scene geometry and reconstructed camera motion.
Undistort the image using camera intrinsic parameters
Assume that sections are aligned with the three principal orthogonal directions of the scene.
Calculate three mutually orthogonal vanishing points. We manually select points $vp_x$ and $vp_y$ in X and Y directions in camera coordinate system.
Consider the 3D point where the X axis (in camera coordinates) meets the projective line corresponding to $vp_x$, let this point be $Xp$. Using K for the camera intrinsic parameters, R and C for pose, we must have
$$\lambda v p_x=KR(X_p-C)\Rightarrow X_p-C=\lambda R^T K^{-1} v p_x$$
The X axis direction in 3D is thus given by unit $(Xp−C)$. Similarly, the Y direction and the gravity vector is obtained as the cross product of the two. We observe that using the cross product gives more stable gravity vector compared to simply using the output from the algorithm.
Triangulate an origin point in 3D using pixel correspondence between two images.
Using the origin and axes, construct a bounding box in 3D that is projected onto the image.
Keyboard input to the interface can be used to move each of the faces of the box in the normal direction.
- propagate labels to the box.