BDEv: Big Data Evaluator


News

11/04/2024 BDEv 3.9 is released!

BDEv 3.9 is a point release for the BDEv 3.x line

  • Updated versions of the following frameworks
    • Hadoop 3.3.6
    • Spark 3.2.4
    • Spark 3.3.4
    • Flink 1.14.6
    • Flink 1.15.4
  • Support for the Hadoop 3.4.x line
  • Support for the Spark 3.4.x and 3.5.x lines
  • Support for the Flink 1.16.x, 1.17.x, 1.18.x and 1.19.x lines
  • Add HDFS parameter to configure DataNode heartbeat (hdfs-default.sh)
  • Add YARN parameter to configure the scheduler class (yarn-default.sh)
  • Add several options to customize the YARN fair scheduler (yarn-default.sh)
  • Add YARN parameter to configure NodeManager heartbeat (yarn-default.sh)
  • Configuration of each experiment is now copied into the output report directory
  • HDFS formatting and data deletion can be configured (hdfs-default.sh)
  • HDFS replication factor is automatically set when there are not enough DataNodes
  • HDFS input data is reused if already exist
  • Update RAPL-based power monitoring tool to PAPI v7.1.0
  • Upate RGen data generator to include minor improvements and bug fixes
  • Update resource monitoring tool to dool v1.3.1
  • SSH optional parameters can now be set (bdev-default.sh)
  • Fix bug when running Spark and Flink on YARN
  • Remove support for Flink versions 1.12.x and 1.13.x
  • Minor changes and bug fixes

29/07/2022 BDEv 3.8 is released!

BDEv 3.8 is a point release for the BDEv 3.x line

  • Updated versions of the following frameworks
    • Hadoop 2.10.2
    • Hadoop 3.2.4
    • Hadoop 3.3.3
    • Spark 3.2.2
    • Flink 1.14.5
  • Support for the Spark 3.3.x line
  • Support for the Flink 1.15.x line
  • Add new sorting benchmark for Hadoop, Spark and Flink: TPCx-HS
  • Update RGen data generator to support TPCx-HS (HSGen) and fix TeraGen bug with Hadoop 3.X
  • Hadoop classpath is now automatically configured for Spark and Flink
  • Add option to use hostnames or IPs for cluster nodes (bdev-default.sh)
  • Add Hadoop parameter for setting timeout on waiting response from server (core-default.sh)
  • Add HDFS parameter to configure write packet size (hdfs-default.sh)
  • Add HDFS parameter to configure retries when writing blocks to DataNodes (hdfs-default.sh)
  • Add HDFS parameter to configure handler count for DataNodes (hdfs-default.sh)
  • Add option to set an IP for loopback interface (system-default.sh)
  • Improve frameworks cleanup
  • Add sanity checks for networking setup
  • Remove support for Flink versions 1.11.x
  • Minor changes and bug fixes

18/03/2022 BDEv 3.7 is released!

BDEv 3.7 is a point release for the BDEv 3.x line

  • Updated versions of the following frameworks
    • Hadoop 3.3.2
    • Spark 3.1.3
    • Flink 1.11.6
    • Flink 1.12.7
    • Flink 1.13.6
  • Support for the Spark 3.2.x line
  • Support for the Flink 1.14.x line
  • Add Flink configuration parameters for several shuffle settings
  • Remove support for Flink versions 1.10.x
  • Add sanity check for Flink configuration file
  • Update resource monitoring tool to dool v1.0.0
  • Minor changes and bug fixes

16/07/2021 BDEv 3.6 is released!

BDEv 3.6 is a point release for the BDEv 3.x line

  • Updated versions of the following frameworks
    • Hadoop 2.10.1
    • Hadoop 3.1.4
    • Hadoop 3.2.2
    • Hadoop 3.3.1
    • Spark 2.4.8
    • Spark 3.0.3
    • Flink 1.10.3
    • Flink 1.11.3
  • Support for the Spark 3.1.x line
  • Support for the Flink 1.12.x and 1.13.x lines
  • Add Spark configuration parameters for
    • spark.memory.fraction
    • spark.memory.storageFraction
    • spark.kryoserializer.buffer.max
    • spark.executor.heartbeatInterval
  • Adaptative Query Execution (AQE) can be enabled and configured for Spark 3.x
  • Add Flink configuration parameters for several network memory settings and timeouts
  • Add HDFS configuration parameters for setting socket-related timeouts
  • Command batch mode now supports running multiple scripts stored in a directory
  • Remove support for Flink versions <= 1.9
  • All input datasets are now created using an internal data generator tool (RGen)
  • Set Hadoop classpath properly
  • Add sanity checks for some dependencies
  • Fix missing dool files
  • Minor changes and bug fixes

07/09/2020 BDEv 3.5 is released!

BDEv 3.5 is a point release for the BDEv 3.x line

  • Updated versions of the following frameworks
    • Spark 2.4.6
    • Flink 1.9.3
    • Flink 1.10.2
  • Support for the Hadoop 3.3.x line
  • Support for the Spark 3.0.x line
  • Support for the Flink 1.11.x line
  • Update resource monitoring tool to dool v0.9.9, a Python 3 compatible clone of dstat
  • Update RAPL-based power monitoring tool to use PAPI v6.0.0
  • Update iLO scripts to version 5.30
  • RAPL-based monitoring tool now can report readings for uncore devices if PP0 is available
  • Fix bug when setting iLO credentials
  • Minor bug fixes

25/03/2020 BDEv 3.4 is released!

BDEv 3.4 is a point release for the BDEv 3.x line

  • Updated versions of the following frameworks
    • Spark 2.4.5
    • Flink 1.8.3
    • Flink 1.9.2
  • Support for the Flink 1.10.x line
  • Allow to start the job history server for Flink
  • Simplified and improved cluster deployments for Spark & Flink
  • Add support to configure the YARN Timeline server (yarn-default.sh)
  • Add support to configure the disk health checker service for YARN (yarn-default.sh)
  • Add support to configure last access time precision for HDFS (hdfs-default.sh)
  • Add support to enable short-circuit local reads for HDFS (hdfs-default.sh)
  • Memory settings for some daemons (eg, DataNode) can now be set individually
  • Fix SQL and ConnComp workloads for Spark
  • Spark & Flink benchmarking suites are now downloaded on demand (removed from git repo)
  • Scala version for Spark & Flink can be set in solutions-default.sh (2.11 by default)
  • Set LD_LIBRARY_PATH in Spark & Flink to include the path to Hadoop native libraries
  • Hadoop existence is checked when deployed together with other frameworks
  • Minor bug fixes

17/12/2019 BDEv 3.3 is released!

BDEv 3.3 is a point release for the BDEv 3.x line. From this version onwards, BDEv is hosted at GitHub

  • Updated versions of the following frameworks
    • Hadoop 2.10.0
    • Hadoop 3.1.3
    • Hadoop 3.2.1
    • Spark 2.3.4
    • Spark 2.4.4
    • Flink 1.8.2
  • Support for the Flink 1.9.x line (tested with Flink 1.9.1)
  • Add support for BDWatchdog tool, which allows per-process resource monitoring and JVM profiling
  • BDEv now runs by default in command mode when no framework is selected in solutions.lst
  • Update PAPI library to version 5.7.0
  • Update RAPL power monitoring tool to use PAPI 5.7.0
  • Add support to configure the binary names for Python2 and Python3 (system-default.sh)
  • Fix bug when running Flink in standalone mode
  • Other minor bug fixes

06/06/2019 BDEv 3.2 is released!

BDEv 3.2 is a point release for the BDEv 3.x line

  • Updated versions of the following frameworks
    • Hadoop 2.7.7
    • Hadoop 2.8.5
    • Hadoop 2.9.2
    • Hadoop 3.1.2
    • Hadoop 3.2.0
    • RDMA-Hadoop-3 0.9.1
    • Flame-MR 1.2
    • Spark 2.3.3
    • Spark 2.4.3
    • Flink 1.7.2
    • Flink 1.8.0
  • Add support to configure handler count for Hadoop NameNode (hdfs-default.sh)
  • Add support to use unsafe based Kryo serializer for Spark (solutions-default.sh)
  • Fix bug when running multiple Executors per worker/node in Spark configuration
  • Minor bug fixes

16/05/2018 BDEv 3.1 is released!

BDEv 3.1 is a point release for the BDEv 3.x line

  • Updated versions of some frameworks
    • Hadoop 2.7.6
    • Hadoop 2.8.3
    • Hadoop 2.9.0
    • Flame-MR 1.1
    • RDMA-Hadoop-2 1.3.5
    • Spark 2.3.0
    • RDMA-Spark 0.9.5
    • Flink 1.4.2
  • Support for the Hadoop 3.1.x line (tested with Hadoop 3.1.0)
  • Improved customization of experiments with setup and clean up commands
  • Extended configuration parameters

07/11/2017 BDEv 3.0 is released!

This is a major release which contains a number of significant enhancements, along with new characteristics previously unreleased.

  • New monitoring tools
    • Power monitoring via RAPL
    • Microarchitecture-level event counting via Oprofile
  • Enhanced framework configuration parameters
  • Support for the Flink 1.3.x line (tested with Flink 1.3.2)
  • Support for the Spark 2.2.x line (tested with Spark 2.2.0)
  • Support for the Flame-MR 1.x line (tested with Flame-MR 1.0)
  • Updated versions of some frameworks
    • Hadoop 2.7.4
    • Hadoop 2.8.1
    • RDMA-Hadoop 1.2.0
    • RDMA-Spark 0.9.4
  • Updated version of resource monitoring tool to dstat 0.7.3
  • Improved graph generation portability

25/11/2016 BDEv 2.3 is released!

BDEv 2.3 is a point release for the BDEv 2.x line

  • Added support for Flame-MR version 0.10.0
  • Added support for the Flink 1.1.x line (tested with version 1.1.3)
  • Added support for the SLURM job scheduler
  • Updated versions of some frameworks
    • Hadoop 2.6.5
    • Hadoop 2.7.3
    • RDMA-Hadoop-2 1.1.0
    • Spark 1.6.3
    • Flink 1.0.3
  • Memory settings for mappers and reducers can now be set separately
  • Added configuration variable in etc/core-default.sh to control the Hadoop parameter "io.file.buffer.size"
  • Virtual and physical memory limits in YARN containers as well as the ratio between them can now be configured in etc/yarn.default.sh
  • Fix critical bug when running Flink under YARN related with JobManager memory
  • Minor bug fixes

04/07/2016 BDEv 2.2 is released!

BDEv 2.2 is a point release for the BDEv 2.x line

  • Add support for multiple Workers/TaskManagers per node in Spark and Flink
  • Enhancements in several Flink workloads
    • Grep
    • PageRank (with delta iterations)
    • Connected Components (with Gelly)
    • K-Means
  • Enhancements in several Spark workloads
    • TeraSort
    • K-Means (with MLlib)
  • Bug fixes

05/05/2016 BDEv 2.1 is released!

BDEv 2.1 is a point release for the BDEv 2.x line

  • Allow to start the job history server for Spark-based frameworks
  • Added support for some new frameworks
    • Hadoop 2.6.4
    • RDMA-Hadoop-2 0.9.9
    • Spark 1.6.1
    • Flink 1.0.2
  • Added support for Mahout 0.11.2 and 0.12.0
  • Added new benchmarks for Spark
    • PageRank
    • Connected Components
    • K-Means
    • Bayes
    • Hive SQL queries:
      • Aggregation
      • Join
      • Scan
  • Added new benchmarks for Flink
    • PageRank
    • Connected Components
    • K-Means
  • Multiple bug fixes

02/03/2016 BDEv 2.0 is released!

BDEv 2.0 is a major release which contains a number of significant enhancements. Starting with this release, the MapReduce Evaluator (MREv) tool has been renamed to Big Data Evaluator (BDEv).

  • Added support for Flink version 0.10.2 on YARN and standalone modes
  • Added support for RDMA-Spark version 0.9.1 on YARN and standalone modes
  • Added support for Spark on standalone mode
  • Updated versions of some frameworks
    • Hadoop-2.7.2
    • RDMA-Hadoop-2 0.9.8
    • Spark 1.5.2
    • Spark 1.6.0
  • New data set generation using the HiBench DataGen tool
  • Added new benchmarks for Hadoop-based frameworks
    • Grep
    • K-Means
    • Connected Components
    • Hive SQL queries:
      • Aggregation
      • Join
      • Scan
  • Added new benchmarks for Spark and Flink
    • WordCount
    • Sort
    • Grep
    • TeraSort
  • Added new benchmark for DataMPI
    • Grep
  • Allow to start the job history server for Hadoop-based frameworks
  • Input datasize for WordCount, Sort and TeraSort can now be different
  • Enhanced configuration of parameters
    • Separated configuration files for HDFS, MapReduce and YARN
    • Solution-specific parameters in a separated file
    • Many new configuration parameters
  • Support for multiple disks
  • Network interfaces are now optional parameters. If not specified, IPs are determined using the hostfile
  • Updated iLO scripts to version 4.70
  • Allow to configure IP, user name and password for iLO interfaces
  • Automatically download Apache Mahout and Apache Hive on demand
  • Optimized code refactorization and simplified internal configuration
  • Multiple bug fixes

10/08/2015 MREv 1.1 is released!

MREv 1.1 has been released and can be obtained from the Downloads section. The main new features of MREv 1.1 are:

  • Updated versions of the frameworks
    • Hadoop-2.7.1-GbE
    • Hadoop-2.7.1-IPoIB
    • Hadoop-2.7.1-UDA
    • RDMA-Hadoop-2-0.9.7-GbE
    • RDMA-Hadoop-2-0.9.7-IPoIB
    • Spark-1.4.1-YARN-GbE
    • Spark-1.4.1-YARN-IPoIB
  • Configurable timeout for the workloads
  • User-defined batch command
  • Separate configuration and log directories for each evaluation
  • Enhanced resource configuration

09/12/2014 MREv 1.0 is released!

Version 1.0 of MREv has been released and can be obtained from the Downloads section.