Uploaded on Oct 11, 2018
Big Data is a collection of a wide range of data analytics and data gathering strategies. It is a kind of capability to capture, store and analyze data on a mass scale to help in business decisions.
An Introduction to Big Data
Big Data
B I G D A T A
A N I N T R O D U C T I O N B Y C R E D I B L L
W W W . C R E D I B L L . C O M
What is
Big Data?
Big Data is a collection of a wide range of data analytics and data
gathering strategies. It is a kind of capability to capture, store and
analyze data on a mass scale to help in business decisions.
Data is a kind of resource itself which helps companies with such
vital information which is helpful to draw deep insights into human
behaviour. Big data provides a new view of traditional metrics like
sales and marketing information.
Characteristics of Big Data
S I N A I D E S I G N E R S
Volume: The quantity of generated an stored data.
Variety: The type and nature of data.
Velocity: The speed at which data is generated and processed.
Variability: It is the consistency of data sets.
Veracity: It is about the quality of captured data.
Some key Facts about Big Data
Data is growing at lightning fast speed, studies show that by the year 2020
around 2MB data will be created for every user every second.
Google receives around 40000 queries per second, which is around 1.2
trillion searches for a year.
Facebook receives around ~35000 likes per minute.
Youtube receives around 300 hours of video content every minute.
Google processes 20,000 TB of data every day.
Reasons Organizations need
to move on Big Data
As technologies are shifting from analogue to digital need of
increased data storage has multiplied manifolds.
In Big-data data is stored in a single warehouse on a single location,
it minimizes the risk and promotes calculated decision at right time.
Big Data technologies like NoSQL and MapReduce provide the
ability to retrieve the information without changing the structure in a
data base.
Frameworks supported by Big Data
APACHE MAHOUT
It is a kind of library which uses MapReduce paradigm on top of
Hadoop.
It provides Java libraries for statistical and algebraic operations.
It helps in creating a scalable performance oriented machine learning
application on Hadoop.
It provides better user targeting based on predictions of audience
interests.
APACHE PIG
It is a High-Level language named Pig latin which resembles sql but
has some minor differences.
This program executes large data sets by executing Map Reduce Jobs.
It prevents data frauds by detailed transaction analysis.
Analyze user engagement on the web.
APACHE SPARK
It is a kind of tool used for general purpose used for large scale
processing engine.
It is quiet fast, easy to use and have advanced options for
development.
One can create application which works faster than normal.
It is the most suited processing engine for performaing advanced
analytics in large scale data processing.
APACHE HIVE
Hive provides a mechanism to structure organizational data through
HiveQL.
Provides better management and querying for large data sets.
Reduces time for semantic check.
Ad-hoc style querying.
APACHE SOLR
SOLR provides text search, real-time indexing, faceted search.
Offers dynamic clustering and rich documents handling.
It is designed for scalability and fault tolerance
It supports indexing and searching through multiple sites.
Thank You
Visit: www.credibll.com
Comments