An Improved Technique for Increasing Availability In Big Data Replication in Hadoop Bigdata

An Improved Technique for Increasing Availability In Big Data Replication in Hadoop Bigdata

Abstract

Big Data represents a major challenge for the performance of the cloud computing storage systems. Some distributed file systems (DFS) are widely used to store big data, such as Hadoop Distributed File System (HDFS), Google File System (GFS) and others. These DFS replicate and store data as multiple copies to provide availability and reliability, but they increase storage and resources consumption. In a previous work (Kaseb, Khafagy, Ali, & Saad, 2018), we built a Redundant Independent Files (RIF) system over a cloud provider (CP), called CPRIF, which provides HDFS without replica, to improve the overall performance through reducing storage space, resources consumption, operational costs and improved the writing and reading performance. However, RIF suffers from limited availability, limited reliability and increased data recovery time. In this paper, we overcome the limitations of the RIF system by giving more chances to recover a lost block (availability) and the ability of the system to keep working the presence of a lost block (reliability) with less computation (time overhead).