Advertisment

Maximize Big Data Benefits

A comprehensive, systematic approach to effectively and efficiently store big data and mine this data for real time and long term insight is the key.

author-image
VoicenData Bureau
New Update
BigData

By Virendra Gupta

Advertisment

The most discussed topic in today’s technology era is the buzz word “Big Data”. In an attempt to further analyze why Big Data has become the focal point of almost all current discussions related to an organization’s IT framework, one can break down the concept into the various aspects of Big Data and its prominence in the world of technology.

Firstly, today it has become viable for enterprises to store very large data as economically as possible due to drastic reduction in storage cost. This is a result of IT vendors innovating, customizing and delivering products and solutions which cater to the evolving technology needs and investment capabilities of organizations of varying scales. In turn, large scale usage of technology in enterprises is creating more opportunity for the generation of big data.

Such generation and storage of big data is further creating opportunities to discover new insights which can be of great value to enterprises. A comprehensive and systematic approach to effectively and efficiently store big data and mine this data for real time and long term insight is becoming the focus of technology solutions endeavor.

Technology solutions in the big data space broadly can be divided into four areas – big data storage, big data processing, big data analytics/mining and big data visualization.

Advertisment

Big Data Storage

In big data storage, vendors are striving to design and deliver solutions keeping in mind the nature of big

data and subsequently finding better ways to store and retrieve such data. Big data need not necessarily

be in relational format. It can be in the form of documents, tweets/short messages, JSON style data, log

files, call records and so on.

This is where solutions can be found such as Mongo DB, Cassandra DB, HBase, Big Table etc. Both OLTP (On Line Transaction Processing) and OLAP (On Line Analytical Processing) category databases need redesign to handle big data effectively and efficiently. In each of these categories, new solutions can be found. OLTP (On Line Transaction Processing) databases which are optimized for individual transactions need to handle large volume of data and hence become slow if not designed to handle it.

Here new designs emerging based on single thread architecture, hybrid (in memory + disk based) storage for speed up may be useful. For OLAP (On Line Analytical Processing) databases, where analytical queries are involved, databases supporting unstructured and semi structured data and designs supporting columnar store may be more suitable for deployment.

Advertisment

Big Data Processing

In the Big Data processing space, Hadoop currently seems to be most popular platform. Many new

solutions are coming up out of which Apache SPARK and STORM are really picking pace. A look at Google

trends clearly highlights a surge of interest in these two solutions. Map reduce programming model of

Hadoop is restrictive for applications which need to process the same data multiple times. SPARK overcomes this limitation and makes this kind of processing a lot faster.

SPARK modifies map reduce model to support reuse of data for processing increasing efficiency in turn for data mining applications. STORM comes very handy for stream processing in real time. It can be used when applications need to be built which require single pass over the data coming in real time. STORM design supports this kind of processing well.

Big Data Analytics/Mining

Big Data Analytics and Mining is one area where there are not many open source platforms which

provide higher level functionality except may be WEKA and MAO from University of Waikato. WEKA is a

data mining framework whereas MAO (Massive Online Analysis) provides big data stream mining

capability in real time.

Advertisment

Both are written in Java. WEKA provides Java APIs and MAO - (Source: http://moa.cs.waikato.ac.nz/) and can be easily integrated and used with Hadoop or Storm. Big data analytics and mining is an area where big data can be analyzed to draw meaningful insights which can aid organizations in taking subsequent business decisions in order to augment the profitability of a business. Investment in big data storage and maintenance can prove useful only when this layer of technology solution can really provide useful capability to decision makers. Without appropriate intelligence, the investment of financial and human resources into the utilization of big data solutions would not prove a fruitful exercise.

Big Data Visualization

How big data analysis and discovered insights and patterns are presented to decision maker is the area

which big data visualization addresses. It is the exercise of utilizing the data which has been stored,

processed and analyzed in a manner where the customer can access, view and actively incorporate it

into their business function. Here it may be difficult to find a very comprehensive visualization platform

though several visualization tools may be available such as Dygraphs, etc.

A visualization platform should be able to provide multi-layered and multi-level view to the decision maker with capability to change the variables for analysis dynamically and should be able to render the results quickly. Organizations can best leverage big data solutions if the interface that aids visualization of data is intuitive, simplistic but comprehensive, thereby allowing users to interact with, alter and harness the data to maximize profitability.

Sequentially big data storage, processing, analysis and visualization solutions should be designed and

deployed such that they are able to deliver business value to an enterprise resulting in benefit in terms

of saving of cost, providing better service, optimization of infrastructure and network, etc. which

justifies the overall investment in these technologies and solutions.

(The author, Virendra Gupta, is Senior Vice President at the R&D Center of Huawei Technologies, India)

Advertisment