- Big Data: A small introduction
Big Data: A small introduction
|
I
have been collecting information about big data for some time and
introducing notions on the subject in some of my courses, but today
while I was making a conference I realized that it was a topic that we
had not yet mentioned on the page, despite being one of the most current
trends in the industry.
By
Big Data we refer exactly to what its name indicates: the treatment and
analysis of huge data repositories, so disproportionately large that it
is impossible to treat them with conventional database and analytics
tools. The trend is in an environment that does not sound strange to us:
the proliferation of Web pages, image and video applications, social
networks, mobile devices, apps, sensors, Internet of things, etc.
capable of generating, according to IBM, more than 2.5 quintillions of
bytes a day, to the point that 90% of the data of the world have been
created during the last two years. We speak of an environment that is
absolutely relevant to many aspects, from the analysis of natural
phenomena such as climate or seismographic data, to environments such as
health, security or, of course, the business area. And it is precisely
in this area where companies develop their activity where an interest is
emerging that turns big data into something like "the next buzzword",
the word that we will certainly hear coming from everywhere: vendors of
technology, tools, consultants, etc. At a time when most managers have
never sat in front of a simple Google Analytics page and are powerfully
surprised when they see what it is capable of doing, a panorama of tools
designed to make things immensely larger and more complex can make
sense. Be scared, so scared.
What
exactly is behind the buzzword? Basically, the evidence that the
analytical tools do not come in order to make information that is useful
for business management is the data generated. If your company doesn't
have a problem with data analytics, it is simply because it is not where
it has to be or does not know how to obtain information about the
environment: as we join the traditional operations and transactions
issues as an increasingly intense two-way interaction with customers and
the Web analytics movement that generate social networks of all kinds ,
we find a scenario where not to be a major disadvantage with respect to
those who are. It is simply that operating in the environment with the
greatest capacity of data generation in history entails the adaptation
of tools and processes. Unstructured, unconventional databases that can
reach petabytes, exabytes or zetabytes, and require specific treatments
for their storage and processing or visualization needs.
Big
data was, for example, the star in the last Oracle OpenWorld: The
position adopted is to offer huge machines with massive capacities,
multiparallel processing, unlimited visual analysis, heterogeneous data
processing, etc. Developments such as exadata and acquisitions like
Endeca support an offer based on thinking big, that some have not
hesitated to discuss: in the face of this approximation, the reality is
that some of the companies most focused on the subject, like Google,
Yahoo! or Facebook or practically all startups do not use Oracle tools
and OPT, instead , by an approximation based on the distributed, the
cloud and the open source. Open source are Hadoop, an extremely popular
framework in this field that allows applications to work with huge data
repositories and thousands of nodes, originally created by Doug Cutting
(which gave him the same name as his son's toy elephant) and inspired by
Google tools like MapReduce or Google file system, or NoSQL ,
non-relational database systems necessary to host and process the
enormous complexity of data of all kinds generated, and that in many
cases do not follow the logic of guarantees acid (atomicity,
consistency, isolation, durability) characteristic of conventional
databases.
In
the future: an ever-increasing panorama of adoption, and many, many
questions. Implications for users and their privacy, or companies and
the reliability or real potential of the results obtained: As the MIT
Technology Review says, great responsibilities. For the moment, one
thing is safe in big data: Prepare your ears to hear the term.
0 comentarios: