New AI research proposes Pythia: a set of decoder-only automatic regression language models ranging from 70m to 12B parameters

Wireless

Transformer-based models are one of the most advanced and advanced categories of models that exist today. It is reasonable to conclude that these models are capable of making a paradigm shift in the rapidly developing field of AI due to their wide range of use cases, such as generation tasks in natural language processing (NLP), text-to-image based tasks, 3D protein structure prediction, etc. . In addition, Language Large Models (LLMs) have proven to be the most successful and effective implementation of adapter-based models. Their use has also increased dramatically over the past few years as researchers continue to dive deeper into larger and more complex architectures. However, although these models are widely adopted, there is little knowledge about how and why these models work so well. This is where understanding how the LLM evolves over the course of training comes into play. Moreover, previous research has shown that some approximate regular patterns are visible when the language model is measured, but relating these patterns in a way that considers how the trained model measures is still unknown territory. One of the main reasons behind this is the lack of access to publicly available LLMs that meet all researchers’ requirements.

In order to propose a solution to this problem statement, the non-profit artificial intelligence research group, Eleuther AI, recently unveiled Pythia, a set of 16 LLMs trained on generic data in the same order specifically designed to facilitate scientific research. Currently, Pythia is the only publicly available model suite that includes models trained on the same data in the same order, and these models span several orders of magnitude. The team issued 154 checkpoints for each of the 16 models, and the LLM ranged in size from 70M to 12B. Furthermore, all data and corresponding tools to download and replicate the exact training process are released to the public to facilitate further research. These key characteristics have helped the researchers behind Pythia conduct various experiments to understand how gender bias, memorization, and under-shot learning are affected by the training data and the model scale.

Currently, there is no set of models that are accessible to the general public, follow a well-established training process, and maintain uniformity between metrics. This is where the Pythia researchers did groundbreaking work. As previously described, all models are publicly accessible and trained using the Pile dataset, which is a collection of English language data commonly used to develop LLMs (especially large regressive transformers). The researchers have designed Pythia so that all intermediate checkpoints are available for analysis. This makes it possible for researchers to link data-driven progress to a particular checkpoint. In addition, the training process and hyperparameters have been thoroughly documented to support future research.

🚀 Join the fastest ML Subreddit community

The primary goal of the Eleuther AI behind the development of Pythia is to enable future scientific research to understand the capabilities and overcome the limitations of large language paradigms. For this purpose, the researchers mainly focused on three case studies, mitigating gender bias, memorizing large language models, and term frequency effects on the performance of few shots to demonstrate Pythia’s empirical methodology. Through their experiments, the researchers concluded that this highly controlled setting could be used to gain new insights into LLMs and the dynamics of their training. The researchers went on to say that these case studies of language modeling research could not have been conducted using any pre-existing model assemblies.

In conclusion, Eleuther AI’s Pythia is a set of trained LLMs with consistent data arrangement and model architecture that spans multiple orders of magnitude. Their research primarily focuses on three case studies that show how Pythia can be used to enable experiments with previously unheard of levels of detail for a generic model set. These case studies focus on gender deviance, conservation, and the effects of term repetition. The researchers have high hopes that their findings and analysis will spur additional investigation into how language paradigms change during training and how different paradigm sizes might relate to the various estimated patterns observed during training.


scan the paper And getup. All credit for this research goes to the researchers on this project. Also, don’t forget to join 18k+ML Sub RedditAnd discord channelAnd Email newsletterwhere we share the latest AI research news, cool AI projects, and more.


Khushboo Gupta is a Consultant Trainee at MarktechPost. She is currently pursuing her Bachelor of Technology degree from Indian Institute of Technology (IIT), Goa. She is passionate about the areas of machine learning, natural language processing, and web development. You enjoy learning more about the technical field by participating in various challenges.


🔥 MUST READ – What is an AI Hallucination? What’s going wrong with AI chatbots? How do you discover the presence of artificial intelligence hallucinations?

Source link

Post a Comment

Cookie Consent
We serve cookies on this site to analyze traffic, remember your preferences, and optimize your experience.
Oops!
It seems there is something wrong with your internet connection. Please connect to the internet and start browsing again.
AdBlock Detected!
We have detected that you are using adblocking plugin in your browser.
The revenue we earn by the advertisements is used to manage this website, we request you to whitelist our website in your adblocking plugin.
Site is Blocked
Sorry! This site is not available in your country.