Posts

Showing posts from May, 2017

IMPORTANT LINKS

VI EDITOR COMMANDS VI COMMANDS BIBLE OF HADOOP : THE DEFINITIVE GUIDE OF HADOOP V4 MY BDHAD LOGO

RDBMS vs MAPREDUCE

Image
REF :- HADOOP : THE DEFINITIVE GUIDE :- BY TOM WHITE ( THE BIBLE OF HADOOP ) [ Relational Database Management Systems Why can’t we use databases with lots of disks to do large-scale analysis? Why is Hadoop needed? The answer to these questions comes from another trend in disk drives: seek time is improving more slowly than transfer rate. Seeking is the process of moving the disk’s head to a particular place on the disk to read or write data. It characterizes the latency of a disk operation, whereas the transfer rate corresponds to a disk’s bandwidth. If the data access pattern is dominated by seeks, it will take longer to read or write large portions of the dataset than streaming through it, which operates at the transfer rate. On the other hand, for updating a small proportion of records in a database, a traditional B-Tree (the data structure used in relational databases, which is limited by the rate at which it can perform seeks) works well. For updating the majority of a datab

HADOOP DISTRIBUTED FILE SYSTEM ( HDFS )

FIRST THING IN OUR MIND THAT WHERE TO USE THIS HADOOP   Let us see and consider one use case. Suppose You are the Manager of an Organisation, and you want how many times these words like POWER FAILURE, CSAT NO , REFUND used by his customers in their emails. If we had structured data than it can easily managable without using Hadoop provided Volume warehouse management proportions, but we have Unstructured data  in emails, so we will have to solve this problem using Hadoop. Solutions would be. As we know that Hadoop Components have two things. 1) HDFS ( HADOOP DISTRIBUTED FILE SYSTEM ) for Distributed storage. 2) MAPREDUCE for prarallel data processing. So execution steps. Load the data into HDFS. Analyse the data in Parallel using MAP-REDUCE. Store results back into HDFS. Retrieve results and Visualize. So, We need to know deeply that What is HDFS exactly and how is it processed and stored using its Architecture. Next➱

About Hadoop

Image
Seeds of Hadoop had planted in 2002 for better understanding and exploring Open Source Technology in Big Data world. Hadoop had started by SIR DOUG CUTING AND MIKE CAFARELLA who were the Graduate student of University of Washington for the project NUTCH which was the sub project of APACHE LUCENE.