Tech companies are competing against one another for attracting the top talent in the field of artificial intelligence and Yahoo! Inc. has made a dramatic move in this area; it is giving away a massive amount of data about how the interaction of users with its services. The embattled internet firm announced on Thursday that the largest cache of data concerning internet behavior would be released by them, which would include the hovers, clicks and scrolls of about 20 million anonymous users who have visited Yahoo’s finance, sports, real estate and various other pages. Only universities will have access to this trove and it is expected to provide researchers with a real-world and rare look at the online behavior of a large number of people.
After dealing with years of stagnant growth, Yahoo is dealing with a brain drain and it is attempting to lure academic researchers in the highly competitive and fast-growing field of artificial intelligence. This data dump by Yahoo comes at a time when technology companies are racing against each other for strengthening their ties with academia, especially in the area about artificial intelligence, which has also been termed as deep learning and machine learning. This involves providing training to machines so they can mine large quantities of data and use it for making predictions and resolving complex queries.
Top researchers have been recruited by Google Inc. and Facebook Inc. Nonetheless, experts have said regardless of the talent these companies have, they will continue seeking more because these big tech firms don’t think they have enough people for doing what they want to do. For machine learning to happen, large quantities of data are needed as computers use them for spotting complex patterns. For instance, in Yahoo’s case, they can figure out what kind of design features or headlines attract teenage girls.
Only major internet companies have this data and they keep it close as they don’t want to disclose too much about the business. The data set of Yahoo is around 13.4 terabytes, which is equal to around two-thirds of the size of Congress’s library. This is larger than anything at the disposal of academic computer scientists and it is so massive that the university system will not be enough to store it. A cloud computing system would probably be needed, one by Alphabet Inc.’s Google or Amazon.com Inc. Last year, the Carnegie Mellon University had signed a $10-million partnership with Yahoo for five years for developing personalized apps that are based on user data.
The sheer size of the Yahoo cache makes it extremely valuable, according to experts. The algorithms designed for analysis of large quantities of data are vastly different from those analyzing lesser amounts. The generous release by Yahoo can assist researchers in learning how to develop large-scale algorithms, which can be immensely beneficial for corporations. Since 2006, about 50 sets of data have been released by Yahoo, which includes a cache of 100 million Flickr photos that were released in 2014. The largest release before this one was 413 gigabytes.