viernes, 14 de abril de 2017

Twitter and its innovation practices

Twitter and its innovation practices
 
Twitter and its innovation practices
Twitter and its innovation practices
 
Twitter and its innovation practices
 
Twitter prepares the development of a new line of business based on the exploitation of the data of its users, and I can not help but see the reflection of innovation practices that the company has already become a repetitive pattern: allow the development of an ecosystem of companies that exploit characteristics of its operation, see which is the most outstanding competitor , acquire it, and expel the others.

A pattern that has been repeated with practically all the functions that Twitter has been incorporating from ideas of the community of users or companies that has been able to generate in every moment: for the development of the app for each platform, for example, Twitter allowed it to develop several in competition, finally acquiring one of them and converting it into an official client. With the geolocalization, with the inclusion of photographs, with the shortening of links, with the insertion of video ... with many of the functions that we see today on Twitter, the company has acted following the same strategy.

In the present case, the pattern is repeated: For years, the company has allowed to arising multiple companies dedicated to the analytics of their data. Finally, in August last year, he acquired Gnip, one of the competitors in that environment, for $134 million. After the operation, the company has simply been leaving to up agreements with its previous partners without renewing them: the last of these partners, DataSift, will see their access to Twitter data disappear on August 13. And finally announced that Gnip will be in charge exclusively for the exploitation of these data, in what constitutes a proof of the reality of big data as a business source.

So far, all normal. Or in the case of Twitter, business as usual. What this episode raises me, however, it is the sustainability of these practices in the medium and long term: innovation on Twitter depends largely on the progressive demands of its users and the way in which the business community is looking for ways to satisfy them through products that are built around the ecosystem generated by the company. On that platform, the company performs a work of cherry-picking: Choose the best option and acquire it. Some of these acquisitions have been completely crucial in the company's future. From the point of view of innovation, an impeccable practice that exploits its capacity to generate a platform, an ecosystem that draws the attention of third parties. For those who make the decision to develop activities on this platform, the obvious finding of what a risky sport is: After creating and consolidating your activity, or you are the chosen one and become part of Twitter, which seems to be especially good at the time to raise purchases without decapitalizing the company and retaining the majority of its workers , or you only have a time until Twitter decides to throw you and exploit that business itself.

Entrepreneurs like Loïc LeMeur, who spent several years trying to develop products around Twitter with seesmic and pivoting non-stop to adapt to that changing platform, nothing new. But I do not think you have been eager to try again in the same way. As a formula for innovation, the strategy has definitely paid off. But could Twitter, in the future, find that it alienated the base of developers and entrepreneurs to the point that, in the face of these prospects for the future, companies that were willing to gamble working on their ecosystem would not emerge?



This article is also available in English in my medium page, "Twitter and its approach to innovation"

jueves, 13 de abril de 2017

From the data to the artificial intelligence

From the data to the artificial intelligence
 
From the data to the artificial intelligence
From the data to the artificial intelligence
From the data to the artificial intelligence
An article about the advances of Facebook in image recognition, which allows you to establish search systems based on the content that appears in them, it leads me to reflect on the importance of the availability of data for the development of algorithms of machine learning and artificial intelligence: no one escapes that the ability of Facebook to develop these systems of processing and recognition of patterns in images has to do neither more nor Less than with their chances of accessing tens of millions of images tagged and commented on their users in the network itself and Instagram Facebook.

When it comes to thinking about the possibilities of artificial intelligence for our business, we have to start with the possibilities we have to get data to analyze. Data which, moreover, are not all created equal: it is not just that the paper file is not going to serve us anything, but also, we need formats and tools open enough to allow processing, something that is not always easy when we talk about companies that, for a long time, processed their data in legacy systems of difficult integration.

The fact of coming from a stage in which many industries have been concerned about catching up on issues related to the so-called big data facilitates to some extent that task: when you already have data scientists in place, the least you can expect is that they have carried out the cleaning and cataloguing of the data sources with which they intend to count in their analytics and visualizations. But after the big data, the next step comes: Artificial intelligence. In fact, progress in artificial intelligence is leading the data scientists to realize that they need to evolve into that discipline, or be considered obsolete professionals.

The data is the real gasoline that moves artificial intelligence. The availability of data allows us to develop the best algorithms, and above all, improve them over time to produce better results and adapt to changing conditions. The availability of more and more data in autonomous driving as its fleets do more and more kilometers is what allows Tesla to reduce the number of disengagements, episodes in which the driver is forced to take control, to the current levels: only between October and November of last 2016, four autonomous vehicles of the company travelled 885 km on California highways , and they experienced 182 of those moments, in what represents a starting point from which to continue improving with the accumulated experience. In fact, Waymo, which has accumulated data for all experiments in autonomous driving of Google, it achieved throughout the year 2016 to bring down the number of these disengagements from 0.8 per thousand miles, to 0.2, in what is an impressive progression fed, again, by the availability of data to process.

The real mistake in artificial intelligence is to try to judge an algorithm by its results at the moment we obtain it, without taking into account the progress it can achieve as it has more and better data. Write a review about Echo of Amazon saying that it is little less than an alarm clock with radio a little illustrated is an attitude that forgets the fundamental: that with eight million devices on the market, the possibilities that Amazon has to improve Echo's intelligence are virtually unlimited, and that means that every time we will understand better , which will gradually reduce its errors and become, without a doubt, a device that we end up asking how we could live without it.

In what sport can the arrival of arbitrators based on artificial intelligence be considered first? Of course, in American football, the classic example of sport in which everything is quantified, analyzed and processed to the limit. Which insurance companies will be able first to access the savings and improvements of the appraisal based on artificial intelligence? Those that have large amounts of data correctly stored and structured to be able to process and train with them to the machine. What academic institutions will be the first to take advantage of artificial intelligence in the educational process? Those that have complete files, properly structured and prepared for their treatment. And I can assure you that that, which seems so basic, does not have all the institutions I know.

To understand the evolution of the data to the machine learning and to the artificial intelligence is, for any manager, increasingly important, and for a company, more and more strategic. This is how you will decide which companies end up which side of the new digital

The Data revolution

The Data revolution
 
The Data revolution
The Data revolution
It draws my attention that the unofficial issue, but if omnipresent of this mobile World Congress of 2017 is, look almost everywhere you want to look, the data revolution. In a very short time, we have gone from developing business activities and focusing on making their approach as competitive as possible, to seek an end that although obviously very related, is raised in a completely different way: to focus on these activities generating as much data as possible.

Trying to interpret the historical series of events like the MWC requires, on the one hand, to comb some gray hair and, on the other, try not to see it all through a single color prism. For someone who works in infrastructure, it is possible that everything you see since entering the MWC by Pavilion 1 until it leaves the 8 has to do with cloud computing, with integration of datacenters, or with 5 G. For those who work in security, all you will see will surely be issues related to this aspect. Capturing the common element, that "the next element" become the official theme of the event, requires a whole view, an abstraction taking some steps away. And my persistent impression is that this omnipresent theme is the reinterpretation of all business activities through the prism of data: The data converted into the real gasoline that moves the business.

The first important announcement of the MWC was, without a doubt, that 4th platform of Telefonica that reorients the whole company precisely to that, to the management of the data of the user (very relevant in that sense the entry of Chema Alonso himself in which it describes and tries to clarify the approach with the margin of Visions Conspiracy): The digital transformation of an operator is absolutely necessary to avoid its total commoditization , and that transformation requires an exquisite attention to the data, so APIficaremos all our activity and we will turn around that. That yes, that the ultimate goal is that the services are better and do not want to go ... but all this, thanks to the generation and exploitation of data. The whole business, raised around the data you generate as a user, and in the way you with the rules and the appropriate guarantees to understand it as something positive, not sinister.

But in fact, it doesn't matter who you are: If you're Telefonica, perfect, the thing seems clear. But if you are a car brand – four years ago there was only one, Ford, and this year are already a crowd – the approach will be to reorient the entire experience of a user to, again, the generation of data. What is it, as I mentioned yesterday in my talk at the seat stand, the vehicle connected? Simply, one way to try to improve the product and service set of an automotive company thanks to the data generated by a vehicle that stores and transmits everything we do with it. From that pioneering Tesla that in 2013 decided to include the total connectivity of its vehicles by an agreement with AT&T in the price of them (four years of practically unlimited connectivity with each vehicle sold) until yesterday, the first big brand, Chevrolet, announced the same for $20/month, everything fits perfectly: a car is no longer a vehicle to move from one point to another , but a huge computer with wheels turned into the maximum expression of mobile technology, and therefore has all the sense of the world show in an event like the MWC. Everything in the idea of the connected car points to the same: constant generation of data to be able to convert the user experience into something infinitely more versatile, to go from selling a product, to sell a complete solution that includes everything, and is based on the exploitation of the data that the user generates with his vehicle. Eventually, that user will stop having an active participation in the driving, or the vehicle can stop being his and become a model of use by login similar to a Chromebook (any car becomes "your car", with your presets on the radio, the position of your seat, your driving parameters or your usual sites on the GPS as soon as you identify when you enter it) , or we will see how they integrate in the price and experience of the vehicle issues such as maintenance or insurance, but all those possibilities will be fed and will make sense thanks to the constant generation of data.

The data revolution and digital transformation is expressed with absolute clarity at that time, when you are able to walk through the vastness of a MWC and when you go back to your hotel exhausted you realize that everything, practically everything you've seen had that common thread. If something is going to change in the next few years will be that, the orientation of all business activity to the generation and exploitation of the data, to its constant analysis by all kinds of technical

miércoles, 12 de abril de 2017

ICTs turn to the explosion of data and the post-PC era, in five days

ICTs turn to the explosion of data and the post-PC era, in five days
  • ICTs turn to the explosion of data and the post-PC era, in five days
 
ICTs turn to the explosion of data and the post-PC era, in five days
ICTs turn to the explosion of data and the post-PC era, in five days
Marimar Jiménez, five days, called me to ask me some impressions about the subjects that, in my opinion, they would occupy the technological agenda of this 2012 that begins, and published it yesterday Friday under the title "ICT are turned into the explosion of data and the era post-PC" (see in PDF). We talked about some of the topics I have recently discussed on the blog: Big Data and analytics, BYOD, corporate social webs and retail innovations.

Then the relevant part of the message I answered:

Analytical: After the beginning of participation for many, we will enter the popularization of the analysis phase. For some, this will be called Big data, investment in massive systems, distributed or not, of complex analysis of data of all kinds in order to detect tendencies, to plan actions or to incorporate information in the CRM of the companies. In the systems departments of the most advanced companies in this regard, Hadoop will be a common topic of conversation. For others, it will mean simply the incorporation of analytical tools of the web activity more or less simple. But it will certainly mean an increase in activity in this regard.
BYOD: Companies will continue to consolidate the tendency to accept that the employee chooses their own devices, incorporating them into the company's information infrastructure whenever possible. The trend marks a whole new attitude when it comes to understanding corporate information architectures, and poses important challenges in terms of management, control, costs and security.
Social-oriented corporate webs. The 2012 will begin to mark the obsolescence season of the old static corporate Web approach, and you'll see a significant increase in pages of companies looking for interaction, constant communication and liaison. What so far was simply a trend in media companies or with strong technological orientation will begin to consolidate in many other industries.
Innovations in retail: the massive popularization of the smartphone will be accompanied very possibly by a strong increase in its proposal of value in face to the retail, through experiences with topics like NFC and related systems.

Sensorization and machine learning

Sensorization and machine learning
Sensorization and machine learning
Sensorization and machine learning
 
Sensorization and machine learning
 
 
The news of the day leaves little doubt: we are heading towards a future where we will live completely surrounded by sensors of all kinds. The photo's earphones are the latest development of SMS audio, the company created by rapper 50 Cent, based on Intel technology, and designed to monitor physiological variables associated with physical exercise, a socket that might seem rather more natural for the practice of sport than wearing a bracelet, a chest band, or a wrist watch.

But the earphones are only a tiny piece in a huge puzzle that is behind many of the recent developments and movements in the technology sector: Yesterday also announced the acquisition of SmartThings on the part of Samsung, two hundred million dollars that position the Korean giant in the world of home automation (lighting, humidity, locks ... of all) and make millionaires the founders of a company started in Kickstarter. Clearly, the tendency is to sensoricemos our bodies, our environment, our homes and our cars, even if it leads us to have no clear who will be responsible when the information collected by these sensors trigger a bad decision.

Intelligent watches, bracelets for the monitoring of older people, new developments in batteries designed specifically for such devices ... and a real flood of data produced every time we move, exercise, or just breathe. Data of all kinds, with possibilities of use very imaginative or very dangerous, that will determine new business rules that are putting in SOLFA even the international agreements.

What do we do with so many data generated by so many sensors? We are already saturated, and we are only analyzing around 1% of the data generated. The logical-or almost the only thing-we can do is ... put other machines to analyze them. The machine learning is being shown as the great frontier, as the only way to make such a constant collection of data a minimum of meaning. The training of an algorithm with data from 133,000 patients from four Chicago hospitals between 2006 and 2011 achieved a diagnosis of emergency situations such as cardiovascular or respiratory problems, issued with four hours of advancement over that performed by physicians. A compilation of parameters of the patient's clinical history, combined with information about their age, family history, and certain analytical analyses, after being analyzed by an algorithm, it is likely to lead to a drastic reduction in deaths related to this type of situation, in which the provision of medical assistance a few hours before may prove vital.

We are definitely experiencing a sensorization boom. But the next step, logical or even essential is going to be the development of tools so that the immense amount of data generated by these sensors can be analyzed with a minimum of criterion. A very interesting scenario, with a brutal potential, and in which we will certainly see some important movements soon ...



(This article is also available in English in my medium page, "Sensorization and Machine learning")

martes, 11 de abril de 2017

Do you have an information management strategy?

Do you have an information management strategy?
 
Do you have an information management strategy?
Do you have an information management strategy?
 
Do you have an information management strategy?
 
The progressive digitization of our environment has led to the generation of a huge amount of data on our habits, uses, customs and actions of all kinds. On the net, it is clear that everything we do, the pages we visit, the clicks that direct our browsing, our purchases, etc. are collected in a log file and associated well to our identity, if we have carried out a login process, or to a system that allows the preservation of the session between different actions, such as cookies or digital fingerprinting.

But the constant generation of data begins to encompass much more than the time spent in front of the screen. More and more people begin to use regularly – or even consistently – devices that allow to quantify various variables ranging from location to multiple parameters usually associated with physical activity. The simple use of the mobile phone, associated with the "most common lie in the network" that implies the simple click with which we claim to have read the terms of service of an app, (something that we usually do because they are not usually written in English, but usually in a "legalés" that few fluently dominate), can allow the developer of the app can monitor sensors that evaluate from our location to the ambient noise level , temperature, displacement in different sense (three-dimensional accelerometers and gyroscope), moisture, light or proximity to the body.

Devices such as Fitbit, Jawbone Up, Misfit Shine and similar allow to measure parameters such as the steps we give, the floors we climb, the activity we develop, or even, connected with other accessories such as a scale, our weight and percentage of fat. A small device such as Scanadu Scout allows to evaluate in ten seconds supported in our temples a variety of parameters such as body temperature, blood pressure, respiratory rate, blood oxygen level, pulse and stress level, and store all readings in the corresponding application. The smartwatches, more and more common, allow to evaluate constants like the body temperature, the pulse, etc.: At its last conference for developers, Apple, which is rumored to be on the verge of putting in the market its iwatch with a special relationship with health, presented a platform that allows integrating all the information generated by all our devices and wearables of all kinds , so that it can be managed by physicians and other providers of health and wellness-related services.

The smart home is another huge field of data generation: to be able to control parameters like temperature, the security, lighting or content of our pantry using devices such as Nest, Canary, Philips Hue, Amazon Dash and many others has a clear counterpart: to allow all these data to be managed by the service providers in ways that, on many occasions , we didn't even get to imagine.

To develop its value proposition, many companies begin to consider the exploitation of the data that their users generate. The idea may seem interesting and tempting: getting to know your client can generate a sustainable competitive advantage, since it allows you to offer your product or service in conditions of adaptation that that customer values, that come to generate a positive bias in their choice of the product or service according to that adaptation, and that difficult that a competitor that knows less to your client can match. And new tools that dramatically reduce entry barriers to sophisticated analytical and machine learning techniques are fueling the trend.

But the difference between the companies that carry out this type of exploitation and those that do it badly can become noticeable. Hence, the development of a data management strategy is fundamental: it is not a matter of accumulating useless data, let alone alienate the client by making him think that we are the private equivalent or even the foolish cousin of the NSA who watches all his movements.

What data do we really need? What is the minimum set of data that we must generate, what we must obtain explicitly – by to the client – and which implicitly – Derivándolos the use that the customer makes of our products or services? What do we want this data for? Do we really intend to exploit them in order to offer your client a better value proposition, or rather to harass and persecute it more efficiently, or to sell access to such data to third parties that we are not clear what they intend to do with them? What treatment do we intend to give to this data? Are we going to be obscurantist, hide the customer what we know about it, how we use them or who we share it with.

lunes, 10 de abril de 2017

Business, data and transparency

Business, data and transparency
 
  • Business, data and transparency
 
Business, data and transparency
Business, data and transparency
My expansion column this week is titled "Business, Data and transparency" (pdf), and it aims to convey an idea for me fundamental: that it is not about how much data a company collects about us, but of variables like how to do it, the level of control that offers the user on that process, the clarity in the reasons for the compilation of that data , transparency in the analyses carried out, and the final result that the user or client perceives after the process. It is not so much collecting data, but doing it well and being respectful.

Paradoxes are clear: I can think of companies that, although they know about me much more than what I can get to know about myself, only generate me as a side effect that the publicity I receive is better adapted to my interests, something that in principle I perceive as positive. And also, let me decide at every moment what data I want to save, which I want to remove, and offer me tools to do it myself in three mouse clicks. and other companies that once I gave them some data, and from there and for having done it, I call five hundred other companies different at dinnertime to annoy me with products and services that do not interest me. A management, that of the data and the information of the client, that goes far beyond the rights arco and of the legal norms, and that differentiates increasingly to the companies of the last century of those of this century.

Then the full text of the column:
 
''Business, data and transparency

What do companies know about us? Every day we produce more information, and companies try to capture and analyze it. Tastes, feelings, tendencies, obtained through information that we publish in social networks. We are "signed up" on so many sites, that knowing the details about everything that the big data is capable of analyzing at every moment on us is becoming more and more complex.

The answer is not to stop using tools that offer very important value proposals in our contact with people or access to information. On the contrary: what we as users must demand is clarity and transparency.

That a company collects data about us can be reasonable, if done right. And what is doing well in this context? Simply, as a user you can know at every moment what data is being handled by the company about me, what you are doing with them, and what results you intend to obtain.

When we think about it, the results are surprising: it turns out that the amount of data is not what worries us the most, but the use that is made of them. A company can get to know us better than we do, but what we really need to worry about is what consequences that knowledge has. If it is going to be used to persecute us more, to overwhelm us, or to sell data to third parties losing control of its use, we will — reasonably — avoid it. On the contrary, if the result of knowing us better is that it offers better products, in better conditions, or more adapted to our tastes, it is more possible that we agree.

It is not the data: it is the clear and unmistakable will to allow us to understand what happens to them, what they are used for. The keyword? Transparency.''
 

domingo, 9 de abril de 2017

In documents TV, RTVE: "Eye with your data"

In documents TV, RTVE: "Eye with your data"
In documents TV, RTVE: "Eye with your data"
 
 
Yesterday was aired on 2 RTVE a program titled "Eye with Your data" for whose recording I contacted Marisol Soto last September. After a long telephone conversation of more than an hour on September 27th to focus the issue, we recorded in my office on October 17th. The complete program is available on the relevant RTVE page.

Also included were Samuel Parra, Jorge Bluebells, Javier Sempere, Chema Alonso, Ofelia Tejerina, Rafael García enjoy it, Mario Costejà, Joaquín Muñoz, Marta Bobo and Jorge Flores,

In documents TV, RTVE: "Eye with your data"
In documents TV, RTVE: "Eye with your data"
We speak of the nature of the personal data, of its use and conditions of it, of the proactive management of the image itself, and also, and abundantly, of a controversial issue that I have not seen in the final assembly: that of that supposed "right to oblivion" for me completely nonexistent and tautologically absurd. Oblivion is not, has never been and should never be a right, because it is a physiological process that occurs in people's brains. Nothing and no one can force another person to forget something, and if anyone has any problem with any information published, should go to the source that published it, not the search engine that indexes and whose work is precisely that, index. On this topic I have spoken on other occasions (May 2011, February 2012 and June 2013) in a clear and consistent way, and I still think exactly the same. If Mario Costejà-or any of the cases mentioned in the program-has a problem with a news published in the Vanguard or with the means of information that is, you should go to the forefront or the corresponding medium, which in my opinion will be able to answer you that this news actually took place and that you can therefore inform about it if you consider it relevant. In the same way that before anyone in his right mind was passed by the imagination request to all the newspaper of the world that they were tearing up news pages when someone requested it if the news had turned out erroneous or inaccurate, now you can not pretend to delete something that is in the network, only request, if necessary, the publication of a rectification. And in any case, go to the source, not a search engine whose mission should simply be to search within the whole set of pages that are allowed to have access.

From my point of view, another of those cases in which everything the law says about it was clear before the popularization of the network and the search engines, keep way after her, and the only thing to do is to keep applying as it was done, responding exactly to the same logic.

With regard to issues such as privacy or the use of the network by children, my opinion is equally clear: I think it is important to promote education on the terms and conditions of use of the applications and tools that we use, but always – and especially in the case of children – I tend to defend, even in schools and with associations of parents , that the main danger is to stay out of the net. The huge vision of the dangers of the network seems wrong and dangerous, and that is, if possible, my objection to the program: a tone that in many moments becomes disturbing and tends to highlight the negative, the dangerous, almost discouraging use. Everything has dangers, including the street, and not for that reason we are locked up at home. The information can be used badly, yes. The network can serve to spy on us, and in the middle of the post-Snowden, we also know that it has been so. But this should serve to defend our rights and demand governments to stop, not to encourage us to stay out of the network (more knowing that much of that espionage has also had place outside the network, in media as old as the phone). Education, the more the better. Complete and efficient communication to understand the possible problems and be able to react if they happen, too. Defending our rights, all. Fears, the least possible.

sábado, 8 de abril de 2017

BigML: Discoveries, reactions and communication

BigML: Discoveries, reactions and communication
  • BigML: Discoveries, reactions and communication
 
BigML: Discoveries, reactions and communicationBigML is without a doubt one of the most promising startups of those in which I have some level of involvement (I'm in your strategic advisory Board). Ideas like "machine learning for Everyone", modeling or big data are undoubtedly powerful in an era in which data of all kinds proliferate exponentially (I spoke of BigML earlier in this other post), and every day, the impression is to be developing a very good tool and to be just waiting – proactively waiting, of course – for the rest of the world to discover.

A model made with statistics published by Kickstarter about the projects presented in the platform allows to create a model of success/failure in crowdfunding, which spontaneously calls the attention of Gigaom in "How To succeed on Kickstarter: Find 35 people and ask for less than $9.000?" and gives rise to a wave of visits and interest. The same medium and the same person who, in a previous article, had echoed the existence of the company and had mentioned it in a more generic article about machine learning titled "Your Data has a secret, but you — yes, you — can make it talk". Meanwhile, other completely different but equally intensive fields in data, such as web analytics or finance and markets, begin to discover the potential of this type of tools, as in this article in SeekingAlpha entitled "Dividends: Still the best all-season investment strategy".

These moments in a startup are delicious. Total concentration in the constant improvement of the product, and sowing, constantly sowing with tools such as blog, newsletter or social networks. Everything is worth to dynamize the information, to put it in the way of the possible interested. You never know where an interesting impact is going to come from, but you know that of all those who receive a message, a certain percentage, although small, will see applications to his field and will try the product, to contact you, to write about it or, at least, to develop some curiosity on the subject. It is about creating content on diverse topics to reach a broader potential demand: stock market indices, music, flight delays, sports ... And for each entry, its corresponding diffusion and potential multiplier effect through social networks like Twitter or Facebook. A strategy of communication of the book, absolutely necessary in a complex product that is not sold as would sell a product of consumption or of simple understanding, but that has already surpassed the more than two thousand registered users. No, the product is not sold in the Ark. It is what the research has: that with each model and every set of data analyzed you have the potential to contribute knowledge, but also to become viral: probability of injuries in automobile accident, incidents with firearms in schools, prediction of the number of extramarital affaires, the amount of tips ... you have data? Here are answers and solutions.

For an academic like me, to follow the journey of BigML is being a rigorously real case of communication of a complex product, of those in which you really learn. We will continue to report

viernes, 7 de abril de 2017

Facebook and Datalogix: Connecting offline and online data

Facebook and Datalogix: Connecting offline and online data
  •  Facebook and Datalogix: Connecting offline and online data
 
Facebook and Datalogix: Connecting offline and online data
Facebook and Datalogix: Connecting offline and online data
Facebook announces an agreement with Datalogix, leader in the integration of marketing databases with digital media, with the supposed aim of measuring the often discussed efficiency of its advertising, and shoots with it all alarms on the privacy of its users. While some large chains of establishments such as CVS categorically assert that they do not share personally identifiable information with Facebook or with Datalogix, the Electronic Frontier Foundation publishes a detailed article in which it realizes the procedures that Facebook intends to carry out with the information, and the process to make opt-out of the system.

Basically, what Facebook wants to do is compare the gigantic database of Datalogix, it contains data from hundreds of loyalty programs used by North American consumers, to obtain samples from customers that can divide depending on whether they have been exposed to advertising campaigns or not, and to sample the number of consumers in each group that acquired a particular product. The data encryption structure, the fact that they are handled in an aggregate manner by groups, and the agreement that Facebook has signed with Datalogix to protect users ' privacy prevents a purchase behavior from being assigned to a specific user of the social network.

The guarantees, however, are contractual or technical, do not serve to reassure many users who see in this connection between their behavior in and out of the network an invasion of their privacy. Articles like The New York Times published last February, "How Companies Learn Your Secrets", nine pages detailing the procedures of the companies use and how they have served, in many cases, to have more information about the customers they have about themselves, they do not contribute precisely to creating a climate of confidence in this sense. In fact, it is well known for many years that companies that are able to gather information about the behavior of purchasing broad categories of products, like Tesco in the UK, they have more information than anyone about their consumers thanks to their loyalty cards, and they work with them with relative freedom when it comes to exchanging access to it with brands and manufacturers.

In fact, it is well known that companies that use this information well are able to revert value to all sides of the equation: manufacturers get a better understanding of their customers without access to their individual data, customers receive offers and coupons better adapted to their consumption that allow them to pay less for their purchase , and the company captures a portion of that generated value cobrándoselo to manufacturers who, otherwise, would be forced to distribute their coupons at random. The loyalty card system has been working well for a long time, and most of its customers do not seem to be particularly concerned that the establishment in which they shop can have a very detailed picture of their personal and family consumption patterns. Or do not worry, or the price paid in terms of loss of privacy seems appropriate. But seeing Facebook, with the vast repository of information that it treasures about us (because we have been giving it over time), coming into contact with that system, seems a more delicate step.

In particular, I do not think we will witness any kind of avalanche of users doing opt-out on the corresponding page. Moreover, I do not think that the we or even the procedure carried out by Facebook was a kind of carte blanche open to the use and eventual abuse of any data. In fact, my opinion is that the complete connection of offline and online data generation systems is simply a matter of time. For better or for worse, and without wanting to go into that thorny discussion of whether that is good or bad, I am convinced that we are heading to a world in which this connection will be complete and in real time. A world that will greatly characterize the business environment in which we are going to live, and which, in my facet as a business school teacher, will have to know exhaustively. We'll have to get ready.

jueves, 6 de abril de 2017

Big Data: A small introduction

Big Data: A small introduction
  • Big Data: A small introduction
 
 
Big Data: A small introduction
Big Data: A small introduction
I have been collecting information about big data for some time and introducing notions on the subject in some of my courses, but today while I was making a conference I realized that it was a topic that we had not yet mentioned on the page, despite being one of the most current trends in the industry.

By Big Data we refer exactly to what its name indicates: the treatment and analysis of huge data repositories, so disproportionately large that it is impossible to treat them with conventional database and analytics tools. The trend is in an environment that does not sound strange to us: the proliferation of Web pages, image and video applications, social networks, mobile devices, apps, sensors, Internet of things, etc. capable of generating, according to IBM, more than 2.5 quintillions of bytes a day, to the point that 90% of the data of the world have been created during the last two years. We speak of an environment that is absolutely relevant to many aspects, from the analysis of natural phenomena such as climate or seismographic data, to environments such as health, security or, of course, the business area. And it is precisely in this area where companies develop their activity where an interest is emerging that turns big data into something like "the next buzzword", the word that we will certainly hear coming from everywhere: vendors of technology, tools, consultants, etc. At a time when most managers have never sat in front of a simple Google Analytics page and are powerfully surprised when they see what it is capable of doing, a panorama of tools designed to make things immensely larger and more complex can make sense. Be scared, so scared.

What exactly is behind the buzzword? Basically, the evidence that the analytical tools do not come in order to make information that is useful for business management is the data generated. If your company doesn't have a problem with data analytics, it is simply because it is not where it has to be or does not know how to obtain information about the environment: as we join the traditional operations and transactions issues as an increasingly intense two-way interaction with customers and the Web analytics movement that generate social networks of all kinds , we find a scenario where not to be a major disadvantage with respect to those who are. It is simply that operating in the environment with the greatest capacity of data generation in history entails the adaptation of tools and processes. Unstructured, unconventional databases that can reach petabytes, exabytes or zetabytes, and require specific treatments for their storage and processing or visualization needs.

Big data was, for example, the star in the last Oracle OpenWorld: The position adopted is to offer huge machines with massive capacities, multiparallel processing, unlimited visual analysis, heterogeneous data processing, etc. Developments such as exadata and acquisitions like Endeca support an offer based on thinking big, that some have not hesitated to discuss: in the face of this approximation, the reality is that some of the companies most focused on the subject, like Google, Yahoo! or Facebook or practically all startups do not use Oracle tools and OPT, instead , by an approximation based on the distributed, the cloud and the open source. Open source are Hadoop, an extremely popular framework in this field that allows applications to work with huge data repositories and thousands of nodes, originally created by Doug Cutting (which gave him the same name as his son's toy elephant) and inspired by Google tools like MapReduce or Google file system, or NoSQL , non-relational database systems necessary to host and process the enormous complexity of data of all kinds generated, and that in many cases do not follow the logic of guarantees acid (atomicity, consistency, isolation, durability) characteristic of conventional databases.

In the future: an ever-increasing panorama of adoption, and many, many questions. Implications for users and their privacy, or companies and the reliability or real potential of the results obtained: As the MIT Technology Review says, great responsibilities. For the moment, one thing is safe in big data: Prepare your ears to hear the term.

Big data and the Future of Medicine (2)

Big data and the Future of Medicine (2)
Big data and the Future of Medicine (2)
Big data and the Future of Medicine (2)
  •  Big data and the Future of Medicine (2)
 
To the thread of a previous entry written last month, an example of one the many companies working in the area of the big data applied to the medicine, that within the strong trend towards digital health gives us an interesting video explaining the possibilities of treatment of patients ' data in a world in which the profusion of sensors multiplies the information that can be introduced in the system , while enabling an infinitely more sophisticated statistical treatment.

 
GNS Healthcare works with mobile applications that many use to keep track of our health habits – exercise, weight, body fat percentage, food, water, activity, etc. – and with the digitized clinical records (electronic medical record or electronic clinical history, more and more habitual in certain countries) to generate predictive models by means of reverse engineering and simulation techniques, in order to advance which of the possible treatments before a certain symptomatology will have a more appropriate effect in A patient we have a lot of information about – while managing the information of many patients helps us build valid models applicable for those patients we don't have such information about.

 
The idea is to move towards personalized medicine through models in which we introduce all the available data about ourselves, from our personal genomics to everything our sensor ecosystem stores on us on a continuous basis. Instead of being short-term and simply seeing who is carrying a device to monitor their health, or sinister uses of something like this to raise the price of our health insurance policy, we have to look beyond and try to design a future in which millions of people can voluntarily contribute their data to such systems , and that make progress in the field of medicine that today we can not even imagine.

miércoles, 5 de abril de 2017

"Sponge companies", my expanding column

"Sponge companies", my expanding column
  • "Sponge companies", my expanding column

My expanding column this week is titled "Sponge Companies" (see in PDF), and tries to create a certain awareness about the possibilities of an information hyperabundance environment for companies. Big data, social web, smartphones and mobile devices of all kinds, conversations as generators of constant flow information that the company has to know to distill and analyze in order to be able to adapt more and more to some customers that, in addition, tend to increase their level of exigency to know listened. The company of the future is the one that is able to soak up data and convert them into useful information.

Analytics, mobiles, identities ... and precrime

Analytics, mobiles, identities ... and precrime
Analytics, mobiles, identities ... and precrime
 
 
Analytics, mobiles, identities ... and precrime
Analytics, mobiles, identities ... and precrime
Put to mix a series of recent readings as a shaker, one can get the most curious results. Or at least, intriguing, of those who give to think a good time.

Let's join pieces: On the one hand, the mobile has become a fundamental piece without which we do not leave home, loaded with sensors capable of transmitting our position at all times, and that soon will be the complete manager of our identity. A fundamental device that already has its own associated crime, for which it begins to speak of specific strategies. Soon, your terminal will be the only thing you need as a means of identification, payment, or to be able to enter and turn on your car, which will automatically lead you wherever you want to go. Hundreds of thousands of benefits and applications to drive from your calendar, your mail or your reminders, to the evolution of your menstrual period.

To this scenario, certainly futuristic but we have already seen that ma non troppo, add the Component Minority Report: An article by France Presse in the raw story states that policemen in the United States and some other countries are already adopting software tools based on predictive analysis based on behavior patterns, with the ultimate goal of preventing crimes before they take place. No, it's not science fiction: there are programs like crush, criminal reduction utilizing statistical history, which are already in use and are considered responsible for strong reductions in the rates of crime in cities like Memphis, or private companies like PredPol, who collaborate with the LAPD. In the face of intuition and the sixth sense of the human police, machines capable of analyzing more than 200 million pages of structured and unstructured content, or calculating calculating 200 million of chess positions per second. Good time to see the 2002 movie again.

On the other hand, another article, this in the German press that you can also see quoted in Slashdot and in ActivePolitic, in which it is claimed that after tragic episodes such as those of Norway or Aurora (CO), not having account on Facebook or a lack of activity in it can be an element of behavior that reaches the point of becoming suspicious. No, it's not that not being on Facebook turns you into a murderer. But we're close.

The rest of the story, if you want, you móntatela. But don't say you thought it was science fiction. Or that you weren't warned.

martes, 4 de abril de 2017

Hadoop: The omnipresent Elephant

Hadoop: The omnipresent Elephant
  • Hadoop: The omnipresent Elephant
 
 
Hadoop: The omnipresent Elephant
Hadoop: The omnipresent Elephant
Hadoop is a name that you will see for many places in the next times, the thread of the phenomenon big data. His logo is that yellow Elephant, the favorite toy of the son of his original creator, Doug Cutting, when he started his development.

Hadoop is a digital development infrastructure created in open source under the Apache license, a project built and used by a wide variety of programmers using Java. Doug Cutting started his development when he was in Yahoo! drawing on technologies released by Google, specifically MapReduce and Google File System (GFS), in order to use it as a basis for a distributed search engine. After dedicating full-time to its development and converting Yahoo! to the main contributor to the project, cutting left Yahoo! to join Clouda, a company whose product offerings revolves around Hadoop.

What is the importance of Hadoop? Basically, it allows to develop very intensive tasks of massive computation, dividing in small pieces and them in a whole as large as you want of machines: petabyte analysis of data, in distributed environments formed by many simple machines: a proposal of very reasonable value in the hyperconnected times that we live, and that use up satiety companies like Google , Yahoo!, Tuenti, Twitter, ebay or Facebook. But they are not the only ones: the use of Hadoop is becoming popular at high speed in all types of companies.

Besides, it's an interesting case, because your free license is making it adopted by a large number of competitors, including the "usual suspects" of large systems (Oracle, Dell, NetApp, EMC, etc.), which is leading to an acceleration of both its dissemination and its benefits. If you are in the world of corporate technology or preparing your professional development within it, Hadoop is one of the areas that, depending on your potential, you should definitely consider: sooner or later you'll meet the elephant.

Update: If we say it before ... according to Slashdot, Hadoop is revealed as a very promising job market.

Big data and the future of medicine

Big data and the future of medicine
  • Big data and the future of medicine
 
 
Big data and the future of medicine
Big data and the future of medicine
A recommendable article in Gigaom, "better medicine, brought to you by big data" affects a topic that we have talked about before, the one for me fascinating intersection between the analytical possibilities emerged from the massive proliferation of data and health sciences.

The article cites, without going into depth, eight areas where the adoption of technologies related more or less directly to the idea of big data could affect medical practice: genomics, the possibilities of the business intelligence in the hands of the doctors, the semantic search in files of enormous and distributed cases, the use of Hadoop in the analysis of biological data , the use of supercomputers and artificial intelligence software such as Watson to access information and answers to questions in natural language, the use of diagnostic predictive models through data mining, the idea of creating the professional profile of the data scientist resident in hospitals, and the application of crowdsourcing to scientific research through social networks and voluntary sharing of data.

We're in a moment of huge data profusion. The rate at which these data are generated is excessive, and many of these data have potential uses and consequences for medical research. We all carry on a mobile phone become more and more in a real set of sensors capable of providing all kinds of data about us, our lifestyle, our movements, the noise level around us, the ambient temperature, our sleep rhythms, etc. No, that's not your phone, it's your tracker, and the decision to obsess over your potential in a world dominated by some great brother's apprentice versus using that data for health improvement can be interesting. After all, we already live episodes in which people decide, I do not know if the whole "freely", share the information of your purchases with credit card or the black box of your car with their respective health insurance companies or car insurance in order to save part of the amount of the policy (and that , as a result, pay in cash if you are going to have a burger with extra cholesterol:-) The possibilities seem much more noble if one decides to donate that data, suitably anonymized, to a biomedical research team.

But it's not just the motive. A progressively larger number of people opt for some variety of the "quantified self" or "quantified I", which automatically converts us to generators of information on our levels of activity, distance traveled, exceeded level, calories consumed and ingested approximate, consumption of water, weight, percentage of fat, body mass index, or even variables such as heart rate, blood pressure, blood glucose and others if you have the appropriate devices. For a few weeks, every morning I climb to a scale that transmits via WiFi my weight and percentage of fat to an application that processes and stores. Would it be problematic to share that information for research purposes if I ensure confidentiality? In my case, the answer is clearly not. I would do it immediately if I can contribute to the progress of science.

And what about the permanent advancement in the field of "Personal genomics"? The first time we mentioned 23andme on this page in March of 2008, his Test of sequencing from a sample of saliva able to determine the geographical origin of your genetic markers and the propensity to genetically predetermined diseases cost a thousand dollars. Now it costs three hundred. Thousands of people around the world are opting for such tests through companies like Appistry, Bina Technologies, DNAnexus or NextBio, giving rise to a future in which the availability of genetic information will affect us in no doubt in our development as individuals and as communities.

Decidedly, a different world. And a sign for those who want to look for opportunities in the interface between medicine, data analytics or product/service design in this area. Much of what many people today consider science fiction has long since it is no longer. Its application to an area considered part of the common good is only a matter of time and the provision of appropriate guarantees. And it's definitely not going to be long.

lunes, 3 de abril de 2017

Big Data's ethical implications

Big Data's ethical implications
  • Big Data's ethical implications
Big Data's ethical implications
Big Data's ethical implications
As you enter more and reading more about the development of the great trend that represents big data, more force are taking the considerations about the ethical implications of an absolutely unstoppable trend.
At that point, it is clear that big data is already a clear consolidation trend that will lead the technological landscape of the coming years. We are talking about something that will clearly differentiate companies capable of extracting information from the ecosystem from those unable to do so, of a set of high-potency analytical applications that act on information that, in a growing way, users deposit in the public space or deliver directly in applications of all kinds.

Discarding paranoid visions about it, but trying to anticipate the use that different actors involved can make big data as technology, we should start by considering that a detailed study of the data generated by a user in all possible formats structured or not, united to a similar amount or more of the so-called shadow data (accesses, searches and non-explicit data of all types that are also stored) , they are susceptible to offering a vision of the person who far exceeds the knowledge that the person has of himself. You think you know yourself? It has nothing to do with the conclusions that you could try to extract from an analysis of all your activity, conscious and unaware, in a network that retains everything. And even less with the ones you can derive from a compared analysis with thousands of other people. Tendencies, behaviors, influences, sequences ... originated in our own behavior and in the growing flow of information that we decided to share. Remember: We talk about everything from the sites you frequent (geolocation) to the analysis of your comments, going through any data you generate in a specific application to which a company can access: A level of analytical unimaginable.

What information do we provide completely voluntarily and without any pressure? Millions of people voluntarily make their geolocations public at certain times (Foursquare, Twitter, Instagram, etc.), their mood, their interests, their opinions, what they are watching on television (miso, getglue ...), which draws attention to them, the people with whom they dialogue or exchange opinions, their map of influences and relationships ... a whole world of information with diverse or unstructured structures , in multiple formats, interlaced by relationships of all kinds, and conditioned solely by a "who has access to what". A public data is a public fact. Provided voluntarily in exchange for a particular value proposition (communicating, sharing, learning, socializing in some way) and linked to a profile. Semantizable, with capacity many times to associate a positive, negative or neutral feeling to a word, to a mark, to an idea. We're just scratching the surface of all this.

Is it possible to limit the use that companies make of these data? My impression is that I don't. That we are slowly evolving to live in a world where, inevitably, companies happen to know what purchases, when you buy, how often, when you consume, with what attitude, with whom ... and not because they are "spying", but because you tell them yourself. We are already experiencing two-way communication processes that, for novelty, we find it surprising: companies that constantly audit the conversation and react to mentions of products, brands, opinions. Where does a much more exhaustive use of the data, analyzed as a whole? Our information, segmented by the possibilities that each one has to have access to it: if at the time of the CRM we learned to measure the businesses in terms of "informative intensity" and "level of permission" to use that information, now these types of variables take all their meaning: the best competitor in a given industry will be able to extract a greater intensity of information and know how to treat it in a respectful way , that does not benefit in a sense of "invasion" or loss of privacy of its users. All this, in addition, in a context in which the idea of privacy itself is in deep revision and generational evolution.

BigML, modeling and artificial intelligence

BigML, modeling and artificial intelligence
  • BigML, modeling and artificial intelligence
 
BigML, modeling and artificial intelligence
BigML, modeling and artificial intelligence
It took some time following BigML, the last creation of Francisco Martín, with whom I had enough contact during his time in strands. In addition to Francisco, you will find in the team of the company a few mythical names of hacking and the machine learning. Last Tuesday we were talking in a small presentation of the company, and now I have been invited to join their Strategic Advisory Committee. The company is based in Covallis, Oregon, hence most of the meetings are done through Google + Hangouts.

BigML is a tool in the cloud, still in closed beta by invitation, for a theme that I love: the modeling of data and the development of models of artificial intelligence (machine learning) from them. Very the thread of tools that are emerging proposals by companies like Google, but with a completely transparent and simple data policy: Your data are yours, only yours, and nothing more than yours. The idea is that users can upload data series to a safe environment and work on their analysis for the development of predictive models on them. A topic that connects with my interest in the trend that has been given in calling big data, on which I have already written on several occasions, and with many of the tools I have used in a regular way in my research, particularly the models of structural equations that I had the luxury of being able to study at UCLA with Peter Bentler, father of EQs , who I ended up asking to be part of my doctoral thesis committee.

On July 17, in the IIIa-CSIC in Barcelona, there will be a workshop on BigML for those interested in the theme Big data, modeling and machine learning. I get caught on the other side of the world at a conference in Peru, but from what I've been seeing about the possibilities of BigML, the thing can be really good.

There is a "homemade" video that illustrates the idea of the product in a very simple way:





You can also see some predictive analytical models based on commonly used free data files in academia, such as Titanic survival, credit risk estimation, diabetes prevention, churn in telecommunications, etc. The possibilities, from a set of data with a certain quality like those that each day more are generated by the business operation, are practically unlimited.

domingo, 2 de abril de 2017

Big Data and health information

Big Data and health information
Big Data and health information
Big Data and health information
  • Big Data and health information
 
The origin that is usually quoted for the analysis of big data as a trend is the intersection between CRM Technologies, that allow to store all the operational information with respect to a client (marketing, transactional, administration, after-sales, etc.), and the world of the social web, which gives rise to a much richer information environment.

This usually gives big Data's projects a "big Brother" background, in which companies "stalk" social networks to capture trends, opinions, etc. and introduce them to their marketing. However, big data is much more than that: a lot of the data that are processed in this type of projects are not even personal type or have nothing to do with social networks, but come from another of the great trends of the time: the development of sensors for capturing information of all kinds , from environmental to traffic, through continuous measurements of all types of parameters.

One of the trends that has caught my attention reviewing projects is the application of big data to the world of health: hospitals, despite the increasing sophistication of their systems, often live in what Seth Godin calls "the pre-digital phase", despite the fact that the incorporation of analytical intelligence in this sense can be of critical importance. The medical-hospital environment is increasingly invaded by machines of all kinds that generate torrents of data about the patients to whom they are connected. Data which, however, are usually simply not stored – are used for short-term analysis and associated with a specific moment – or be printed and collected in a rudimentary manner in a folder. On a personal level, Google Health, one of Google's recently closed projects, was trying to provide support for health information and facilitate it being shared with third parties: Enter the results of your analyses, your prescriptions, your medications, etc. in a file and share it with your doctor or with hospitals, making it easier for them to access your file to include more information. An idea with possibilities, but whose low level of adoption did not allow their survival.

It is estimated that an average patient generates about two gigas of information, which grow rapidly in the case of certain treatments. What kind of information are we talking about? Of everything a bit: from information perfectly tabulated, as in the case of analytical results, to unstructured data, such as images of all types or readings of varied parameters. All of it is scannable information, but in very few cases it is scanned and stored properly. Without a doubt, a perfect field for the application of techniques of big data, not only for a question of application to the patient, but also-and with great possibilities-to the treatment of the information added.

The first projects are focusing on issues related to hospital savings and management, where it is possible to carry out an objective economic impact estimation or a better allocation of resources. But there is no doubt that there is enormous potential for that which begins to become increasingly paradoxical that in an environment such as the one we live is not yet available: the storage of a person's data in such a way that it allows for a centralized treatment and analysis at the moments that are really needed.

Where are we going? I have already heard visions of people talking about voluntary health monitoring services through non-intrusive sensors that send real-time data, surely a topic for which there are still a few years, not so much because of the lack of maturity of the technology and the possibilities of carrying out a development that economically makes sense. But for the moment, I'm sure that thinking about applying technology to this kind of topics allows us to think about the big data theme with a somewhat different optics.

Big Data: A historical perspective

Big Data: A historical perspective

  • Big Data: A historical perspective

Big Data: A historical perspective
Big Data: A historical perspective
-According to IBM calculations, the human being generated, from the beginning of its history until the year 2003, about five exabytes of information, five billion gigabytes. Last year, we generated roughly that same volume of information every two days. Next year, we will generate it approximately every ten minutes.

GPS Mobile phone locations, Facebook likes, e-commerce transactions, surveillance camera images, instant messaging ... A clear example of the extent to which technology can overcome our ability to use it.

(Adapted from "Big data or too much information", a highly recommended article from Smithsonian.com)''
 
 
 
 
We are still learning how to capture these data, where they are generated, what shape they are and what possibilities they offer. We have a lot to learn about how to analyze them, a discipline in which we will see a lot of innovation at all levels. Companies that are able to interpret all that data in a way that makes sense and, above all, with the right attitude, will be able to generate a great competitive advantage. Those who do not, or worse, those who devote themselves to persecuting and harassing their clients through misuse of these data, will disappear.

sábado, 1 de abril de 2017

Understanding the Future: the Evolution of databases

Understanding the Future: the Evolution of databases
  • Understanding the Future: the Evolution of databases
 
Understanding the Future: the Evolution of databases
It is undoubtedly one of the most provocative and that is calling me the attention of the analysis of the trend that is assuming the phenomenon big data in the business sectors: the enormous difficulty to understand it without lowering until the systematics that sustains it. An undoubtedly relevant topic: As long as you try to explain big data by "prescribing" as a magic formula the reports of analysts like Forrester, McKinsey, Gartner, etc. or resorting to application cases, the average manager will not be able to understand what really lies behind this world, let alone its possibilities.

What are we really talking about? For me, the greatest difficulty inherent in understanding the difference posed by big data is to make the idea of what it means to move from the database schema that we all know at different levels, to the idea of non-relational or NoSQL databases. A world that is often defined as negative, for "What is not," which adds even more conceptual difficulty.

Sounds intimidating, but wait, don't unplug yet:-) Let's try to approach the concept: SQL-based databases (structured Query language) is what the vast majority of users know. You can know it at very different levels: from the one who operates with them, manages the language as such, understands the rules of normalization of a conventional database or is able to analyze its limitations; Even those who simply imagine them as a large electronic file system as a drawer and folders of a cabinet. A relational database based on SQL, typically managed with systems such as Oracle, MySQL, DB2, Informix, Microsoft SQL Server, Sybase, PostgreSQL, etc, is an operation that results us, so to speak, "natural": Follow the rules acid (atomicity, consistency, isolation and durability, or atomicity, consistency, insulation and durability), which allows the instructions can be considered a transaction, and respond to a simple vision , in which a data is stored in an unequivocal manner and with defined relationships. The view of tables with rows and columns in which a query always returns the same fields.

What happens if we extend the concept to accommodate other types of realities that are becoming more and more frequent in our usual operations today? Does any data have clear these structures? Or are we just leaving out of our analysis everything that our database operation is not able to pick up? Databases NoSQL (not only SQL, does not imply that SQL is dead or should not be used, but there are better solutions) to relax many of the limitations inherent in conventional databases and how to work with them. Collections of documents with fields defined in a lax manner, rather than tables with rows and columns, which allow much faster and more efficient analyses and, above all, not limited to the conventional structure. The idea is to store data in a massive way, which responds very well to the enormous wealth of data generated by the world today, and analyze them without necessarily following standards that do not necessarily adapt to them. Where relational databases are costly and time-consuming, the NoSQL alternative is much more efficient and inexpensive to manipulate data without necessarily having to adapt them to a rigid structure. In purity, a system of this type is not even a database understood as such, but a distributed storage system to manage data endowed with a certain structure, structure that can also be enormously flexible.

The problem? For most people, the difficulty of "thinking" in such a system. Our mental schemes adapt to a rigid system, with clear standards and marked structures. Parallelisms with stores divided into shelves, cabinets and folders are something that works for us mentally. However, how to manage with such a system, for example, huge database searches that contain completely heterogeneous references to each other and with relationships of all kinds, not necessarily unique? In many cases, we talk about systems that have been precisely developed by companies like Google, Yahoo!, Facebook and the like to manage their own operatives, using almost always open source, in order to obtain a structure that, with a reasonable cost and performance, allows them to treat enormous amounts of data with many very complex relationships with each other.

In a sense, to understand the subject it is necessary to "unlearn". But the need to do so is evident, given the adaptation of these kinds of structures to the problems of operating in the world in which we live today. But it is not easy: for some time, many companies will continue to torture their database systems re