HADOOP DISTRIBUTED FILE SYSTEM ( HDFS )

FIRST THING IN OUR MIND THAT WHERE TO USE THIS HADOOP

  Let us see and consider one use case.

Suppose You are the Manager of an Organisation, and you want how many times these words like POWER FAILURE, CSAT NO , REFUND used by his customers in their emails.

If we had structured data than it can easily managable without using Hadoop provided Volume warehouse management proportions, but we have Unstructured data in emails, so we will have to solve this problem using Hadoop.

Solutions would be.
As we know that Hadoop Components have two things.

1) HDFS ( HADOOP DISTRIBUTED FILE SYSTEM ) for Distributed storage.
2) MAPREDUCE for prarallel data processing.

So execution steps.


  1. Load the data into HDFS.
  2. Analyse the data in Parallel using MAP-REDUCE.
  3. Store results back into HDFS.
  4. Retrieve results and Visualize.


So, We need to know deeply that What is HDFS exactly and how is it processed and stored using its Architecture. Next➱

Comments

Popular posts from this blog

DeprecatedProperties for Apache Hadoop 3.0.0-alpha4