neural.networks.in.translation.systems.pt11

The previous post in this series is here. The first post in this series is here.

Welcome to the series in which we explain the foundations of neural network technologies to dive into their use of natural language processing and neural machine translation.

Global Vectors and Latent Semantic Analysis

You may remember that in part seven we have seen a method for constructing word vectors developed by Mikolov et al. that goes by the name word2vec. It dates back to 2013, and since then, people tried to improve it. Already in 2014, work has been done by researchers at Stanford, namely Jeffrey Pennington, Richard Socher, and Christopher D. Manning, and resulted in a model called ‘GloVe’, which stands for ‘global vectors’. In its spirit, it is quite similar to word2vec. The name already hints at the main difference. Recall, that for training word2vec there were two dummy tasks, namely ‘Continuous-bag-of-words’ and ‘Skip-gram’. Both were local in their nature, i.e. only the neighboring words matter. In GloVe a different approach is used, similar to what is called ‘Latent semantic analysis’ or ‘LSA’.

The main idea of LSA is to look at co-occurrences of words in a large set of documents. Those co-occurrence rates are written in a matrix and a so-called ‘singular-value decomposition’ is performed to extract word vectors. This is an impressive method and actually works without any machine learning, though the linear algebra used is also a bit more sophisticated (though not really difficult). However, as it turns out, it is still better to just learn the co-occurrence matrix via a deep learning algorithm and that is exactly what GloVe does.

Furthermore, the authors should also be credited with writing down a clear derivation of the underlying cost function and some motivation explaining the effectiveness of the model. With some basic math knowledge, you can read it up yourself in their paper ‘GloVe: Global Vectors for Word Representation’, or even better, watch the lecture series ‘Natural Language Processing with Deep Learning’ from the 2017 winter semester. You can find it for free on the youtube channel of Stanford University.

Google Translate and Neural Machine Translation

Until now this blog has been concentrated solely on the academic side of neural networks and machine translation. Let me also mention the main interface between science and business, namely, Google Translate. We’ve written a piece about it recently and you might want to have a look. It is interesting to note that while Google Translate has been around since 2006, it has switched to the neural network approach just recently, at the end of 2016. It shows you how much technology and effort had to go into neural machine translation to outdo statistical machine translation. Of course, a lot of the innovations have been done by Google itself, as word2vec mentioned earlier. This is all part of a larger effort by Google to develop artificial intelligence, through a research team called ‘Google Brain’, open-source software TensorFlow and even the TPU (Tensor processing unit) specifically designed for machine learning.

So what about Google’s algorithm for neural machine translation? It is all explained in the paper ‘Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation’. Have a look if you want, but I have to warn you. It is not an easy read, even with what I have explained until now. Still, the takeaway message is that Google is not shy to share its insights, something common in academia (but maybe not so business).

One last remark about Google. Remember, the matrix M from the last blog? It turns out, that the success story of the tech giant also started with a matrix. Back in 1996 Larry Page and Sergey Brin working at Stanford University (yup, it’s Stanford again) devised the algorithm now called ‘PageRank’ (after Larry Page). As the double meaning of the name suggests, its purpose is to rank web pages by their importance. The heart of the algorithm is the so-called ‘Google matrix’ and finding its eigenvector with the largest eigenvalue (don’t worry if you don’t know what this means), will give you a weighting of web pages by their importance. Something quite useful if you want to build a search engine! Two years later in 1998, Larry Page and Sergey Brin founded Google, and the rest is history.

For the moment, that is it. I hope you have enjoyed reading this blog as much as I have enjoyed writing it. I also hope that I could give you a taste of what all this fuss about neural networks is really about, without having you lost in the details. It should be clear that this is not the end but rather the beginning of an exciting journey both for entrepreneurs and researchers.