Abstract:
In computer vision, physics plays an important role in several applications. In this work, we teach a machine to detect the mechanical laws of motion of physical objects using video, and show how the results can be useful for computer vision tasks. We assume no prior knowledge of physics, beyond a temporal stream of bounding boxes. The problem is very difficult because a machine must learn not only a governing equation (e.g. projectile motion) but also the existence of governing parameters (e.g. velocities). We evaluate our ability to represent the physical laws of motion in video, such as the movement of a projectile or circular motion, in both real and constructed videos. These elementary tasks have textbook governing equations and enable ground truth verification of our approach. To establish the importance of the proposed method, we show a real-world use case in the domain of object tracking in confounding scenes, where existing state-of-the-art algorithms fail. Incorporating physics into computer vision not only serves the purpose of curiosity-driven research, but also provides an inductive bias for computer vision applications like object tracking.