The Stack includes 3TB of code across 30 programming languages and is 3x bigger in size than the next-largest public code dataset. BigCode only includes code that has permissive software licenses (MIT, Apache 2.0, etc) and provides an opt-out process for developers to remove their code from the dataset.
The paper shows that transformers can improve themselves autonomously through trial and error without ever updating their weights. No prompting, no finetuning. A single transformer simply collects its own data and maximizes rewards on new tasks.
CarperAI, a new research lab within the EleutherAI research collective, is releasing an "instruction-tuned" large language model trained using Reinforcement Learning from Human Feedback (RHLF). In effect, releasing an open source equivalent of GPT·3.
Meta's Universal Speech Translator project makes it possible to train AI models on languages that are primarily oral and do not have a standard or widely used writing system. Meta built and shared an AI translation system for a primarily oral language, Hokkien.
Google releases a model for generating videos from text, with prompts that can change over time, and videos that can be as long as multiple minutes.
The blueprint is intended to "help guide the design, use, and deployment of automated systems to protect the American Public.” They are currently non-regulatory, non-binding, and not yet enforceable.
Meta releases a paper for text-to-video generation using an improved model design to 1) accelerate training 2) not require paired text-video data, and 3) generated videos have greater possibilities and vastness than before.
Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. It is effective with accented speech, background noise, and technical language. It works in multiple languages and can translate those languages into English.
Adept Labs announces Action Transformer 1 (ACT-1), a model that can control software from human requests. (For example, search Zillow or add new records to Salesforce.)
GitHub releases Copilot: "an AI pair programmer" to suggest and generate code.
Google releases one of the largest LLMs resulting in breakthrough capabilities on a wide range of tasks such as reasoning, multilingual tasks, and code generation.
This paper implies data is the limiting factor for model performance and that compressing models is a promising effort to reduce finetuning and inference costs. DeepMind finds that current language models are undertrained for the number of parameters they have. Chinchilla is a model with 70B parameters and 4x more data than another model Gopher w/ 280B they trained with the same compute budget during training. Chinchilla significantly outperforms Gopher and other models such as GPT·3 (175B) and Megatron-Turing NLG(530B).
Deepmind released a neural network-based model in the 14th Critical Assessment of protein Structure Prediction (CASP14) demonstrating high accuracy of over 80% greatly outperforming previous methods.
Anthropic was started by OpenAI alums and has raised over $700M from the likes of Sam Bankman-Fried (founder of FTX) and Jaan Tallinn (founder of Skype).
GPT-Neo is an open-source 1.3B and 2.7B parameter GPT2/3-like model released by EleutherAI.
DALL·E is a 12-billion parameter version of GPT·3 trained to generate images from text descriptions using a dataset of text–image pairs.
A minimal and simple PyTorch re-implementation of the OpenAI GPT model. This was one of the first open-source implementations of GPT.
GPT·2 is a large transformer-based language model with 1.5 billion parameters, trained on a dataset of 8 million web pages. GPT·2 is trained with a simple objective: predict the next word, given all of the previous words within some text. The diversity of the dataset causes this simple goal to contain naturally occurring demonstrations of many tasks across diverse domains. GPT·2 is a direct scale-up of GPT, with more than 10X the parameters and trained on more than 10X the amount of data.
BERT revolutionized NLP and paved the way for many LLM developments. It popularized the idea of pre-training on large texts and creating a general NLP model for many tasks.
OpenAI has obtained state-of-the-art results on a suite of diverse language tasks with a scalable, task-agnostic system. Their approach is a combination of two existing ideas: transformers and unsupervised pre-training.
Google released a demo that can carry out natural phone conversations to accomplish "real world" tasks like booking haircuts or making restaurant reservations.
ELMo presents one of the first adopted approaches to representing a word into numbers based on the word's context. This was a watershed idea in NLP and led to significant advancements in model architectures.
Google Brain researchers introduce a new simple network architecture based solely on attention mechanisms allowing self-supervised learning to be possible. This monumental paper enabled many of the latest advances in AI/ML.
AlphaGo is a model trained through reinforcement learning and supervised learning from human expert Go games. AlphaGo achieved a 99.8% win rate against other Go programs and defeated the human European Go champion by 5 games to 0.
GANs are new generative models trained by training two models: a generative model G that generates new examples and a discriminative model D that tries to classify examples as real or fake. This novel design led to breakthrough ML capabilities in image generation, video generation, and voice generation abilities.
Now considered one of the most influential papers published in computer vision, AlexNet is a convolutional neural network that significantly outperformed state-of-the-art object recognition benchmarks.