The bidding continues between the Gafam. Objective: build the machine learning model with the largest number of parameters. Google now displays 540 billion.
By 2020, the GPT-3 model had once again proven the nearly limitless capabilities of giant neural networks. With its 175 billion parameters, this Transformer type deep learning model signed OpenAI has been trained on hundreds of billions of words. Its use cases are numerous, from the automatic generation of text to translation and the generation of computer codes. The challenge ? Generate impressive results, both in terms of volumes and precision. Faced with GPT-3, Google unveiled Pathways Language Model (PaLM) a few weeks ago. A Transformer that goes up to 540 billion parameters and whose performance surpasses its latest generation equivalents.
Like GPT-3, PaLM uses the few-shot learning technique. How it works ? In the case of image recognition, this type of learning will only have a few photos of the subject to be identified (a face for example) to then re-identify it. Instead of training to classify from large series of examples, few-shot learning thus uses a few reference patterns from which it calculates a similarity score. Several mega neural networks have since been inspired by GPT-3 and this method with a view to further improving the performance obtained. This is the case of GLaM, LaMDA and Gopher, all three also created by Google, or Megatron-Turing NLG which was developed by Microsoft and Nvidia.
150 language modeling tasks
Google passed PaLM through its BIG-bench benchmark (for Beyond the Imitation Game Benchmark). An open source framework that sifts through 150 language modeling tasks. “Result: PaLM mostly outperforms Gopher and Chinchilla on a set of 58 common tasks”, note Sharan Narang and Aakanksha Chowdhery at Google Research (see graphs below).
Among the actions put forward by Google, PaLM is particularly successful in terms of automatic management of application code, and in particular code generation from requests formulated in natural language. “In this area, its performance in few-shot learning is comparable to that of Codex (a variation of GPT-3 centered on the same types of tasks, editor’s note) while its training dataset contains 50 times less Python language content,” point out Sharan Narang and Aakanksha Chowdhery. learning from other languages […] is better.”
“PaLM demonstrates impressive capabilities in natural language understanding and generation”
The two Google software engineers add: “PaLM also demonstrates impressive natural language understanding and generation capabilities. In particular, it can provide explanations for scenarios that require a complex combination of multi-step logical inference, knowledge of the world and language. For example, he is able to explain new jokes.” (see gif below)
PaLM was trained on the Tensor Processing Unit (TPU) infrastructure, the largest ever used by Google in terms of machine learning. Composed of 6,144 chips, it is backed by the latest generation of American cloud TPU pods, with key data parallelization processes implemented within each pod. The learning was carried out on a series of multilingual datasets combining documents and books available online, conversations, Wikipedia content and source code available on GitHub. The race continues to see who will achieve the largest NLP model with ever wider and more precise task coverage.