Combining Stochastic and Deterministic Modeling of IPFIX Records to Infer Connected IoT Devices in R

Combining Stochastic and Deterministic Modeling of IPFIX Records to Infer Connected IoT Devices in R

Abstract:

Residential Internet service providers (ISPs) today have limited device-level visibility into subscriber houses, primarily due to the network address translation (NAT) technology. The continuous growth of “unmanaged” consumer Internet of Things (IoT) devices combined with the rise of work-from-home makes home networks attractive targets to sophisticated cyber attackers. Volumetric attacks sourced from a distributed set of vulnerable IoT devices can impact ISPs by deteriorating the performance of their network, or even making them liable for being a carrier of malicious traffic. This article explains how ISPs can employ IP Flow Information eXport (IPFIX), a flow-level telemetry protocol available on their network, to infer connected IoT devices and ensure their cyber health without making changes to home networks. Our contributions are threefold: 1) we analyze more than nine million IPFIX records of 26 IoT devices collected from a residential testbed over three months and identify 28 flow features pertinent to their network activity that characterize the network behavior of IoT devices—we release our IPFIX records as open data to the public; 2) we train a multiclass classifier on stochastic attributes of IPFIX flows to infer the presence of certain IoT device types in a home network with an average accuracy of 96%. On top of the machine learning (ML) model, we develop a trust metric to track network activity of detected devices over time; and 3) finally, we develop deterministic models (DTs) of specific and shared cloud services consumed by IoTs, yielding an average accuracy of 92%. We show a combination of stochastic and DTs mitigates false positives in 75% of incidents at the expense of an average 7% reduction in true positives.