BDEv: Big Data Evaluator


BDEv

[NEW] 07/09/2020: BDEv 3.5 released! Check out the News section

BDEv [1] is a benchmarking tool to evaluate Big Data processing frameworks in terms of performance, resource utilization, energy efficiency and microarchitecture-level events. It provides support for multiple frameworks (e.g. Hadoop, Spark, Flink) and manages the configuration needed to leverage the available computational resources like CPU, memory and network interfaces. The evaluation of these frameworks can be done by running different benchmarks (e.g. TeraSort, WordCount) included in the BDEv distribution, while also enabling the execution of custom commands. Moreover, BDEv eases the deployment of the frameworks over a cluster of nodes, the execution of the experiments and the task of gathering the results by providing automatically generated graphs.

BDEv has evolved from MREv [2], which was originally aimed to evaluate HPC-oriented MapReduce frameworks. MREv has been used for research purposes in [3], which analyses the behaviour of HPC-oriented MapReduce frameworks on an HPC cluster. It has also been extensively used during the development and evaluation of Flame-MR [4, 5, 6], an efficient MapReduce framework that improves the performance of Hadoop, and for comparing the performance of Hadoop, Spark and Flink in [7].

This tool is distributed as free software and is publicly available at the Downloads section under the MIT license.

Citation

If you have used BDEv in your research, please cite our work using the following reference:

References
  • [2] Jorge Veiga, Roberto R. Expósito, Guillermo L. Taboada and Juan Touriño. MREv: an Automatic MapReduce Evaluation Tool for Big Data Workloads. In Proceedings of the International Conference on Computational Science (ICCS'15), vol. 51, pages 80–89. Reykjavík, Iceland, 2015. Preprint Online
  • [3] Jorge Veiga, Roberto R. Expósito, Guillermo L. Taboada and Juan Touriño. Analysis and evaluation of MapReduce solutions on an HPC cluster. Computers & Electrical Engineering, vol. 50, pages 200-216. February 2016. Preprint Online
  • [4] Jorge Veiga, Roberto R. Expósito, Guillermo L. Taboada and Juan Touriño. Flame-MR: An event-driven architecture for MapReduce applications. Future Generation Computer Systems, vol. 65, pages 46-56. December 2016. Preprint Online
  • [5] Jorge Veiga, Roberto R. Expósito, Guillermo L. Taboada and Juan Touriño. Enhancing in-memory efficiency for MapReduce-based data processing. Journal of Parallel and Distributed Computing, vol. 120, pages 323-338. October 2018. Preprint Online
  • [6] Jorge Veiga, Roberto R. Expósito, Bruno Raffin and Juan Touriño. Optimization of real-world MapReduce applications with Flame-MR: practical use cases. IEEE Access, vol. 6, pages 69750-69762. November 2018. Preprint Online
  • [7] Jorge Veiga, Roberto R. Expósito, Xoán C. Pardo, Guillermo L. Taboada and Juan Touriño. Performance Evaluation of Big Data Frameworks for Large-Scale Data Analytics. In Proceedings of the 2016 IEEE International Conference on Big Data (IEEE BigData 2016), pages 424-431. December 2016. Preprint Online