Using Intermediate Data of Map Reduce for Faster Execution

Shah Pratik Prakash; Pattabiraman V.

doi:10.46300/91013.2022.16.4

Using Intermediate Data of Map Reduce for Faster Execution

Authors: Shah Pratik Prakash, Pattabiraman V.

Abstract: Data of any kind structured, unstructured or semistructured is generated in large quantity around the globe in various domains. These datasets are stored on multiple nodes in a cluster. MapReduce framework has emerged as the most efficient technique and easy to use for parallel processing of distributed data. This paper proposes a new methodology for mapreduce framework workflow. The proposed methodology provides a way to process raw data in such a way that it requires less processing time to generate the required result. The methodology stores intermediate data which is generated between map and reduce phase and re-used as input to mapreduce. The paper presents methodology which focuses on improving the data reusability, scalability and efficiency of the mapreduce framework for large data analysis. MongoDB 2.4.2 is used to demonstrate the experimental work to show how we can store and reuse intermediate data as a part of mapreduce to improve the processing of large datasets.

Pages: 20-26

DOI: 10.46300/91013.2022.16.4

International Journal of Computers and Communications, E-ISSN: 2074-1294, Volume 16, 2022, Art. #4

PDF DOI XML

Certification

International Journal of Computers and Communications

E-ISSN: 2074-1294

Volume 16, 2022

Using Intermediate Data of Map Reduce for Faster Execution

Citation Tools

Press ESC to close

Search Articles

International Journal of
Computers and Communications