jueves, 6 de abril de 2017

Big Data: A small introduction

  • Big Data: A small introduction
 
 
Big Data: A small introduction
Big Data: A small introduction
I have been collecting information about big data for some time and introducing notions on the subject in some of my courses, but today while I was making a conference I realized that it was a topic that we had not yet mentioned on the page, despite being one of the most current trends in the industry.

By Big Data we refer exactly to what its name indicates: the treatment and analysis of huge data repositories, so disproportionately large that it is impossible to treat them with conventional database and analytics tools. The trend is in an environment that does not sound strange to us: the proliferation of Web pages, image and video applications, social networks, mobile devices, apps, sensors, Internet of things, etc. capable of generating, according to IBM, more than 2.5 quintillions of bytes a day, to the point that 90% of the data of the world have been created during the last two years. We speak of an environment that is absolutely relevant to many aspects, from the analysis of natural phenomena such as climate or seismographic data, to environments such as health, security or, of course, the business area. And it is precisely in this area where companies develop their activity where an interest is emerging that turns big data into something like "the next buzzword", the word that we will certainly hear coming from everywhere: vendors of technology, tools, consultants, etc. At a time when most managers have never sat in front of a simple Google Analytics page and are powerfully surprised when they see what it is capable of doing, a panorama of tools designed to make things immensely larger and more complex can make sense. Be scared, so scared.

What exactly is behind the buzzword? Basically, the evidence that the analytical tools do not come in order to make information that is useful for business management is the data generated. If your company doesn't have a problem with data analytics, it is simply because it is not where it has to be or does not know how to obtain information about the environment: as we join the traditional operations and transactions issues as an increasingly intense two-way interaction with customers and the Web analytics movement that generate social networks of all kinds , we find a scenario where not to be a major disadvantage with respect to those who are. It is simply that operating in the environment with the greatest capacity of data generation in history entails the adaptation of tools and processes. Unstructured, unconventional databases that can reach petabytes, exabytes or zetabytes, and require specific treatments for their storage and processing or visualization needs.

Big data was, for example, the star in the last Oracle OpenWorld: The position adopted is to offer huge machines with massive capacities, multiparallel processing, unlimited visual analysis, heterogeneous data processing, etc. Developments such as exadata and acquisitions like Endeca support an offer based on thinking big, that some have not hesitated to discuss: in the face of this approximation, the reality is that some of the companies most focused on the subject, like Google, Yahoo! or Facebook or practically all startups do not use Oracle tools and OPT, instead , by an approximation based on the distributed, the cloud and the open source. Open source are Hadoop, an extremely popular framework in this field that allows applications to work with huge data repositories and thousands of nodes, originally created by Doug Cutting (which gave him the same name as his son's toy elephant) and inspired by Google tools like MapReduce or Google file system, or NoSQL , non-relational database systems necessary to host and process the enormous complexity of data of all kinds generated, and that in many cases do not follow the logic of guarantees acid (atomicity, consistency, isolation, durability) characteristic of conventional databases.

In the future: an ever-increasing panorama of adoption, and many, many questions. Implications for users and their privacy, or companies and the reliability or real potential of the results obtained: As the MIT Technology Review says, great responsibilities. For the moment, one thing is safe in big data: Prepare your ears to hear the term.

0 comentarios: