HoviTron targets two aspects: “Hovi” referring to Holographic Vision, and “Tron” corresponding to Tele-Robotic Operation. The “Tron” target imposes that a tele-operator can freely navigate in the scene, while the “Hovi” target adds eye accommodation to the stereoscopic viewing.
In the final Proof-of-Concept (PoC-3), the HoviTron tele-operator remotely guides (1) the robot arm (2) in the scene (3) that is captured by fixed RGBD devices (4). The head pose (5) determines the viewpoints (6) to synthesize for the head mounted display, conferring free navigation. The stereoscopic view synthesis is augmented with micro-parallax foveated images – aka a light field - that provide immersive holographic vision to the tele-operator, therefore experiencing less workload (cf. deliverables D5.2 and D5.4 on public resources page ) than with a conventional stereoscopic head mounted display.
The HoviTron technology is complementary to the MPEG-I immersive video standardization, with the former focusing on the capturing and rendering aspects, while the latter adds compression for volumetric video content streaming.
Figure 1: Guiding (1) the robot arm (2) in the scene (3), captured with fixed RGBD cameras (4), and providing for each head pose (5) the corresponding stereoscopic and holographic viewpoints (6) through view synthesis technology.
Figure 2: Virtual view synthesis (e2) from a couple of input images (Left-Right, cf. bottom) through 2D-3D back- and forth projection (a1-a2) and implicit meshing (b1-b2-c1-c2) in its 3D representation (d). Micro-parallax virtual views (f1..32) projected through a smart optical system (g) into the eye (h) create a light field, enabling Holographic Vision with back- and foreground eye accommodation (i1-i2).
Figure 2 explains how our RVS (Reference View Synthesizer) OpenGL tool donated to MPEG and further developed within HoviTron (mainly a Vulkan equivalent with extensions) synthesizes virtual views. In a nutshell, RGBD data captured from a fixed set of cameras (here, Left and Right input) is pushed through arrows (a1) and (a2) into space to create a 3D point cloud, which points are interconnected thanks to the implicit triangles between adjacent pixels from the input images. Region (b1) with a zoom-in view in (b2) and (c1)-(c2) show how these triangles are gradually constructed, resulting in an implicit 3D representation of (d). From there on, a 2D projection towards any virtual viewpoint creates the RGB view (e2) and its corresponding depth map (e1).
Applying this principle twice – once for each eye - provides stereoscopic virtual views, as in (6) of Figure 1. Moreover, synthesizing many micro-parallax images (f1..32) per eye (h) with a smart optical system (g) creates a light field providing Holographic Vision with back- and foreground eye accommodation (i1-i2). Exploiting the unique micro-parallax temporal redundancies, the RVS process can be simplified towards so-called Spatio-Temporally Amortized Light-Fields (STALF).