Abstract
© 2017 Nova Science Publishers, Inc. All rights reserved. Organizations are flooded with data. Not only that, but in an era of incredibly cheap storage where everyone and everything are interconnected, the nature of the data we are collecting is also changing. For many businesses, their critical data used to be limited to their transactional databases and data warehouses. Nowadays, the variety of data that is available to organizations is tremendous. The challenge here is that traditional tools are poorly equipped to deal with the scale and complexity of much of this data. This is where Hadoop comes in. Hadoop is built to deal with all sorts of messiness. Apache Hadoop is an open-source software framework used for distributed storage and processing of dataset of big data using the MapReduce programming model. It consists of computer clusters built from commodity hardware. Above all, the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework. This chapter is a welcome to the magical world of Hadoop.
Original language | English |
---|---|
Title of host publication | Machine Learning: Advances in Research and Applications |
Publisher | Nova Science Publishers, Inc. |
Publication status | Published - 1 Oct 2017 |
Externally published | Yes |
Keywords
- Big Data
- Hadoop
- HDFS
- Statistical Analysis
- Machine Learning