and Raghupathi 1 defined  Big Data as
“large volumes of high velocity, complex, and variable data that require
advanced techniques and technologies to enable the capture, storage,
distribution, management and analysis of the information”
. The characteristic of Big Data are defined by V’s


images from medical Centers

Map Reduce


Medical images and meta data

The most three known Vs are Volume, Variety, and Velocity 2,3. Another two Vs
are recently added, Veracity 4,5 and Value 6.

Big data can be generated in different industries such as healthcare,
environmental, and various services. Healthcare or medical big data consist of
electronic health record (EHR) including patient’s signals (ECG, EEG, EMG,..
etc), testing results, and medical images 7. Medical imaging has rapidly became
the best non-invasive method to evaluate a patient and determine whether a
medical condition exists 8. Imaging is used to assist in the diagnosis of a
condition and in most cases is the first step of the journey through the modern
medical system. Advancements in imaging technology has enabled us to gather
more detailed, higher-resolution 2D, 3D, 4D, and microscopic images that are
enabling faster diagnosis and treatment of certain complex conditions.

data collected from different resources need to be analyzed, otherwise, the
data will be of less value. Big Data analysis plays role in extracting insights
from massive data sets 10. It increase the availability of data and analytic
capabilities and reducing waste in resource. Therefore, health care cost can be
reduced, quality and outcomes can be improved 9. Applying Big Data analysis
(depending on pattern recognition) lets value arises to develop actionable
information and supports the decision making. There are a lot of available
frameworks used in Big data analysis. This paper compares two of them, these are:
Apache hadoop map reduce 11 and Apache Spark with indexing and retrieving
Medical Images 20.  The comparison is based on providing a
higher-performance (reducing time needed to index images and image retrieval
speed). We performed a case study using DICOM images (CT images). Our case
study is performed by applying simple and essential use case, which is indexing
and storing the medical images (CT images)-separately – in both of previous
frameworks, then retrieving them as shown in Figure 1. Our data set consists of  350 DICOM (Digital Imaging and Communications
in Medicine) files .

Doug Cutting in 2005 created an open source framework to handle
big data issues with processing and analysis. This framework is called Apache
Hadoop11. It consist of two major components, these are: Open source data
storage which is named HDFS
(Hadoop Distributed File System) and the Processing API which is Map Reduce framework and other (25) project
libraries 12. HDFS inspired by Google’s File System (GFS) 13 provides a
scalable, efficient, and replica based storage of data at various nodes that
form a part of a cluster. Hadoop can either deal with Map reduce or
Spark in indexing and processing the data.

Appach spark14 is Open source software
Processing engine; instead of just “map” and “reduce” in Hadoop, defines a
large set of operations (transformations & actions) Operations can
be arbitrarily combined in any order .spark Supports Java, Scala and Python. In
spark the Key construct Resilient Distributed Dataset (RDD) which represent
data or transformations on data. RDDs can be created from Hadoop InputFormats
(such as HDFS files), “parallelize ()” datasets, or by transforming other RDDs
(you can stack RDDs)