How to store such huge data which is beyond our capacity?

INTRODUCTION

WHAT ARE THE ISSUES FACED?

Where to store data?
If stored how to process data?
How to retrieve data faster?
How to stored and retrieve data at Real-time?
How to find raw data for the industry?
How to manage that untapped data?

HOW DATA IS INCREASING?

  1. SOCIAL MEDIA — Social Media is a place where people connect with each other by online mode and share their emotions and journey by images, audios, videos, etc.Social Media is one of the important factors of Big Data. Instagram, Facebook, Whatsapp, takes alot of data like personal details, pictures, likes or reactions, etc.
  • FACEBOOK — Facebook is a social media platform that has almost 2.7 billion active users until the second quarter of 2020. Facebook generates 4 petabytes of data per day. People can chat and upload images, videos, etc. on Facebook.
  1. GOOGLE- Google is a Search Engine that has 4 billion users and it processes 3.5 billion searches per day and if we break down this it processes 40,000 searches per second on an average. Google processes approximately 20 petabytes of data per day through an average of 100,000 MapReduce jobs spread across its massive computing clusters.
  2. INTERNET OF THINGS(IoT) — IoT connects with a device and makes it smarter. Nowadays we have a smart A.C., smart room, etc. Due to IoT we humongous amount of data is generated. It is assumed that till 2025 41.6 billion of data will be generated by IoT devices.

WHAT IS BIGDATA?

  1. Huge Volumes
  2. Data in different types and Format
  3. Impacting the Business

CHALLENGES

  1. STORING THE DATA — The data is coming in huge volume and where to store it is a big issue. To store a huge amount of data in a traditional system is not possible.To buy one expensive hardware with a huge volume storing capability is not a good idea because it will raise another issue.We have one file of 500 MB but we have only 200 MB of storage left now what to do?
  2. VARIOUS FORMATS OF DATA- Earlier, we used to store data in Relational Database but currently, 80% of the data is Unstructured Data. Also now there are different types of data:
A.STRUCTURED DATAB.UNSTRUCTURED DATAC.SEMI-STRUCTURED DATA

TYPES OF BIG DATA

A.STRUCTURED DATA

B.UNSTRUCTURED DATA

C.SEMI-STRUCTURED DATA

Characteristics of Big Data:

  1. Volume
  2. Velocity
  3. Variety

Volume

  • Generated from hospitals keeping record all patients,doctors,nurses ,medical staff etc.
  • By social media .
  • By google drive ,drop box.
  • By organization.etc..

Velocity

Variety

DISTRIBUTED STORAGE

VERTICAL SCALING (SCALE-UP)- We can add more storage to the same hard disk. It stores the data but at the time of Retrieval or processing the data it increases the read/write or input/output time.HORIZONTAL SCALING (SCALE OUT) - Add more P.C. rather than adding storage. The Advantage of horizontal Scaling is it stores the data but also retrieves and processes it at a faster rate which is good for Industries.

SOLUTION TO BIG DATA

HADOOP IS A FRAMEWORK WRITTEN IN JAVA LANGUAGE.

1.HDFS(Hadoop distributed file system) -> for distributed system2.MapReduce -> for processing and parallel working

HADOOP ARCHITECTURE

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store