Egocentric Cognitive Mapping

Introduction

Trying to build a cognitive map from First-Person videos that can be used to understand novel but similar environments, which will benefit visually-impaired people. Instructed by Prof. Hyun Soo Park and his group at UMN.

eco

Demo: Groccery store data annotation with ECO;
Code

Local Egocentric Maps

eco1 ) eco2

Local Egocentric Maps

Frontalization via Homography

eco3

Rescaling for canonical depth viewpoint

eco4

My work mainly focus on the egocentric recognition of sections in supermarket with a novel interface by leveraging scene geometry and reconstructed camera motion.

Undistort the image using camera intrinsic parameters

Assume that sections are aligned with the three principal orthogonal directions of the scene.
Calculate three mutually orthogonal vanishing points. We manually select points $vp_x$ and $vp_y$ in X and Y directions in camera coordinate system.
Consider the 3D point where the X axis (in camera coordinates) meets the projective line corresponding to $vp_x$, let this point be $Xp$. Using K for the camera intrinsic parameters, R and C for pose, we must have

$$\lambda v p_x=KR(X_p-C)\Rightarrow X_p-C=\lambda R^T K^{-1} v p_x$$

The X axis direction in 3D is thus given by unit $(Xp−C)$. Similarly, the Y direction and the gravity vector is obtained as the cross product of the two. We observe that using the cross product gives more stable gravity vector compared to simply using the output from the algorithm.

eco5

Triangulate an origin point in 3D using pixel correspondence between two images.
Using the origin and axes, construct a bounding box in 3D that is projected onto the image.
Keyboard input to the interface can be used to move each of the faces of the box in the normal direction.

eco6

propagate labels to the box.

eco7

Computer Vision, 3D reconstruction