QMSampler Joint Sampling of Multiple Networks with Quality Guarantee Hadoop Bigdata

QMSampler Joint Sampling of Multiple Networks with Quality Guarantee Hadoop Bigdata

Abstract:

Because Online Social Networks (OSNs) have become increasingly important in the last decade, they have motivated a great deal of research on Social Network Analysis (SNA). Currently, SNA algorithms are evaluated on real datasets obtained from large-scale OSNs, which are usually sampled by Breadth-First-Search (BFS), Random Walk (RW), or some variations of the latter. However, none of the released datasets provides any statistical guarantees on the difference between the sampled datasets and the ground truth. Moreover, all existing sampling algorithms only focus on sampling a single OSN, but each OSN is actually a sampling of a complete social network. Hence, even if the whole dataset from a single OSN is sampled, the results may still be skewed and may not fully reflect the properties of the complete social network. To address the above issues, we have made the first attempt to explore the joint sampling of multiple OSNs and propose an approach called Quality-guaranteed Multi-network Sampler (QMSampler) that can jointly sample multiple OSNs.