bleeding edge

The Stack includes 3TB of code across 30 programming languages and is 3x bigger in size than the next-largest public code dataset. BigCode only includes code that has permissive software licenses (MIT, Apache 2.0, etc) and provides an opt-out process for developers to remove their code from the dataset.

huggingface.co

Oct 27

huggingface.co

Shutterstock announces DALL·E integration and fund to compensate contributors

theverge.com

Oct 25

theverge.com

DeepMind proposes a method for in-context RL with transformers.

The paper shows that transformers can improve themselves autonomously through trial and error without ever updating their weights. No prompting, no finetuning. A single transformer simply collects its own data and maximizes rewards on new tasks.

arxiv.org

Oct 25

arxiv.org

Stable Diffusion v1.5 unofficially released by Runway

huggingface.co

Oct 20

huggingface.co

Google announces U-PaLM 540B

arxiv.org

Oct 20

arxiv.org

Google releases open-source LLM Flan-T5

arxiv.org

Oct 20

arxiv.org

CarperAI announces plans for the first open-source GPT·3 like model

CarperAI, a new research lab within the EleutherAI research collective, is releasing an "instruction-tuned" large language model trained using Reinforcement Learning from Human Feedback (RHLF). In effect, releasing an open source equivalent of GPT·3.

carper.ai

Oct 20

carper.ai

Replit releases mobile app with codegen built-in

blog.replit.com

Oct 19

blog.replit.com

Meta releases first speech-to-speech translation system for Hokkien

Meta's Universal Speech Translator project makes it possible to train AI models on languages that are primarily oral and do not have a standard or widely used writing system. Meta built and shared an AI translation system for a primarily oral language, Hokkien.

ai.facebook.com

Oct 19

ai.facebook.com

Potential GitHub Copilot lawsuit

githubcopilotinvestigation.com

Oct 17

githubcopilotinvestigation.com

Microsoft announces Designer: A DALL-E powered design tool

designer.microsoft.com

Oct 12

designer.microsoft.com

Joe Rogan interviews Steve Jobs generated by AI

podcast.ai

Oct 11

podcast.ai

State of AI publishes their 2022 report

stateof.ai

Oct 11

stateof.ai

Stable Diffusion meets virtual reality

twitter.com

Oct 11

twitter.com

Google releases Audio LM: a LLM for audio generation

ai.googleblog.com

Oct 06

ai.googleblog.com

A working implementation of text-to-3D dreamfusion, powered by stable diffusion.

github.com

Oct 06

github.com

Google announces Phenaki: a model to generate videos from text

Google releases a model for generating videos from text, with prompts that can change over time, and videos that can be as long as multiple minutes.

phenaki.github.io

Oct 05

phenaki.github.io

US White House releases a "Blueprint for an AI Bill of Rights"

The blueprint is intended to "help guide the design, use, and deployment of automated systems to protect the American Public.” They are currently non-regulatory, non-binding, and not yet enforceable.

whitehouse.gov

Oct 04

whitehouse.gov

September 2022

Google introduces DreamFusion: Text-to-3D using 2D Diffusion

dreamfusion3d.github.io

Sep 29

dreamfusion3d.github.io

A human motion diffusion model

guytevet.github.io

Sep 29

guytevet.github.io

Meta introduces Make-A-Video to generate videos from text

Meta releases a paper for text-to-video generation using an improved model design to 1) accelerate training 2) not require paired text-video data, and 3) generated videos have greater possibilities and vastness than before.

makeavideo.studio

Sep 29

makeavideo.studio

DALL·E now available without waitlist

openai.com

Sep 28

openai.com

Stable Diffusion Photoshop plugin

twitter.com

Sep 26

twitter.com

Stable Diffusion running on iPhone

twitter.com

Sep 24

twitter.com

Reddit user claims GPT3 got them straight A's in school

reddit.com

Sep 23

reddit.com

NVIDIA releases GET3D, a generative 3D object model

nv-tlabs.github.io

Sep 23

nv-tlabs.github.io

DeepMind announces Sparrow: a safe dialogue agent

deepmind.com

Sep 22

deepmind.com

Getty Images bans AI-generated content

theverge.com

Sep 21

theverge.com

Whisper, OpenAI's near human level English speech recognition model

Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. It is effective with accented speech, background noise, and technical language. It works in multiple languages and can translate those languages into English.

openai.com

Sep 21

openai.com

NVIDIA announces BioNeMo: a service to train LLMs for bio

blogs.nvidia.com

Sep 20

blogs.nvidia.com

Character.AI opens public beta for their advanced chatbots

twitter.com

Sep 16

twitter.com

Google announces PaLI: A Jointly-Scaled Multilingual Language-Image Model

arxiv.org

Sep 16

arxiv.org

ACT-1: An LLM to automate manual tasks on software tools

Adept Labs announces Action Transformer 1 (ACT-1), a model that can control software from human requests. (For example, search Zillow or add new records to Salesforce.)

twitter.com

Sep 14

twitter.com

Action Transformer (ACT-1) announcement, natural language to computer action

adept.ai

Sep 14

adept.ai

A demo of GPT·3 armed with a python interpreter

twitter.com

Sep 12

twitter.com

August 2022

DALL·E adds support for extending images (outpainting)

openai.com

Aug 31

openai.com

An AI-generated artwork wins first place in state art competition

vice.com

Aug 30

vice.com

A Stable Diffusion search engine

twitter.com

Aug 24

twitter.com

Microsoft introduces BEIT-3: a general-purpose multimodal foundation model

arxiv.org

Aug 22

arxiv.org

Stable Diffusion public release

stability.ai

Aug 22

stability.ai

Stable Diffusion announcement

stability.ai

Aug 10

stability.ai

Meta announces BlenderBot 3: a self improving chatbot that can search the web

ai.facebook.com

Aug 09

ai.facebook.com

Researchers release GLM-130B: A bilingual model for English & Chinese

keg.cs.tsinghua.edu.cn

Aug 04

keg.cs.tsinghua.edu.cn

July 2022

DALL·E available in beta

openai.com

Jul 20

openai.com

Midjourney: a multiplayer generative art model moves to open-beta

twitter.com

Jul 13

twitter.com

BigScience releases BLOOM: the largest open multilingual language model

huggingface.co

Jul 12

huggingface.co

Google announces Minerva: a LLM for technical tasks

arxiv.org

Jul 01

arxiv.org

June 2022

Yandex releases YaLM 100B: a bilingual LLM for English & Russian

github.com

Jun 27

github.com

GitHub Copilot becomes generally available to all developers

GitHub releases Copilot: "an AI pair programmer" to suggest and generate code.

github.com

Jun 21

github.com

A marketplace to buy&sell prompts for DALL-E, GPT-3, Stable Diffusion, Midjourney

promptbase.com

Jun 02

promptbase.com

May 2022

DeepMind announces Gato: A generalist agent

arxiv.org

May 19

arxiv.org

A demo of Google's translation AR glasses

twitter.com

May 11

twitter.com

Facebook releases OPT-175b

ai.facebook.com

May 03

ai.facebook.com

Imagen, a text-to-image diffusion model

imagen.research.google

May 02

imagen.research.google

April 2022

Deepmind announces Flamingo: a visual language model (VLM)

deepmind.com

Apr 28

deepmind.com

GPT·3 now supports editing text

openai.com

Apr 15

openai.com

OpenAI announces DALL·E 2

openai.com

Apr 13

openai.com

Google releases MaxViT: Multi-Axis Vision Transformer

github.com

Apr 04

github.com

Google releases PaLM: A LLM with 540B Parameters

Google releases one of the largest LLMs resulting in breakthrough capabilities on a wide range of tasks such as reasoning, multilingual tasks, and code generation.

ai.googleblog.com

Apr 04

ai.googleblog.com

Google releases SayCan: a LLM for interacting with robots

say-can.github.io

Apr 04

say-can.github.io

March 2022

DeepMind introduces Chinchilla LLM: showing that data and model size should be scaled equally

This paper implies data is the limiting factor for model performance and that compressing models is a promising effort to reduce finetuning and inference costs. DeepMind finds that current language models are undertrained for the number of parameters they have. Chinchilla is a model with 70B parameters and 4x more data than another model Gopher w/ 280B they trained with the same compute budget during training. Chinchilla significantly outperforms Gopher and other models such as GPT·3 (175B) and Megatron-Turing NLG(530B).

arxiv.org

Mar 29

arxiv.org

February 2022

GPT-NeoX

blog.eleuther.ai

Feb 02

blog.eleuther.ai

January 2022

Google introduces LaMDA: a LLM for Dialog

arxiv.org

Jan 20

arxiv.org

December 2021

Google introduces paper about insights from "Scaling Language Models"

arxiv.org

Dec 08

arxiv.org

Researchers announces VATT: Transformers for multimodal learning

arxiv.org

Dec 07

arxiv.org

November 2021

Google introduces paper "Show your work" to help LLMs think "step by step"

arxiv.org

Nov 30

arxiv.org

September 2021

Meta releases GSLMs: Generative Spoken Language Models

ai.facebook.com

Sep 09

ai.facebook.com

August 2021

University researchers announce C5T5: Generate Organic Molecules with Transformers

arxiv.org

Aug 23

arxiv.org

Google introduces paper about limits of code generation models

arxiv.org

Aug 17

arxiv.org

OpenAI Codex, a natural language to code model

openai.com

Aug 10

openai.com

July 2021

Inceptive launches to design RNA molecules using deep learning

inceptive.com

Jul 28

inceptive.com

Deepmind announces AlphaFold: accurate protein structure prediction

Deepmind released a neural network-based model in the 14th Critical Assessment of protein Structure Prediction (CASP14) demonstrating high accuracy of over 80% greatly outperforming previous methods.

nature.com

Jul 15

nature.com

June 2021

GPT-J

github.com

Jun 08

github.com

Researchers announce MERLOT: Multimodal Neural Script Knowledge Models

arxiv.org

Jun 04

arxiv.org

Researchers announce PIGLeT: Language grounded in 3D world dynamics

arxiv.org

Jun 02

arxiv.org

May 2021

AnthropicAI launches to research reliable, interpretable, and steerable AI systems.

Anthropic was started by OpenAI alums and has raised over $700M from the likes of Sam Bankman-Fried (founder of FTX) and Jaan Tallinn (founder of Skype).

twitter.com

May 28

twitter.com

March 2021

GPT-Neo

GPT-Neo is an open-source 1.3B and 2.7B parameter GPT2/3-like model released by EleutherAI.

github.com

Mar 21

github.com

January 2021

DALL·E: Creating Images from Text

DALL·E is a 12-billion parameter version of GPT·3 trained to generate images from text descriptions using a dataset of text–image pairs.

openai.com

Jan 05

openai.com

OpenAI introduces CLIP: Connecting Text and Images

openai.com

Jan 05

openai.com

September 2020

Meta introduces wav2vec2: a framework for self-supervised speech representations

ai.facebook.com

Sep 24

ai.facebook.com

August 2020

Andrej Karpathy releases minGPT: an implementation of GPT using PyTorch

A minimal and simple PyTorch re-implementation of the OpenAI GPT model. This was one of the first open-source implementations of GPT.

github.com

Aug 17

github.com

June 2020

Google releases SimCLR: a framework for contrastive visual representations

github.com

Jun 17

github.com

OpenAI releases a transformer model for image completion and sampling

openai.com

Jun 17

openai.com

OpenAI now accessible through API

openai.com

Jun 11

openai.com

April 2020

OpenAI announces Jukebox: A generative model for music

jukebox.openai.com

Apr 30

jukebox.openai.com

December 2019

A LLM powered text adventure dungeon game

play.aidungeon.io

Dec 05

play.aidungeon.io

November 2019

OpenAI releases the final GPT·2 model (1.5B parameters)

openai.com

Nov 05

openai.com

October 2019

Google releases T5 models for transfer learning

github.com

Oct 23

github.com

Cohere launches startup to provide NLP models as a service

txt.cohere.ai

Oct 05

txt.cohere.ai

HuggingFace announces DistilBERT

arxiv.org

Oct 02

arxiv.org

September 2019

Nvidia releases Megatron-LM: an efficient way to train LLMs

github.com

Sep 17

github.com

August 2019

OpenAI releases their 774 million parameter GPT·2 language model

openai.com

Aug 20

openai.com

July 2019

Meta announces RoBERTa: An optimized BERT pretraining approach

arxiv.org

Jul 26

arxiv.org

June 2019

Researchers announce XLNet: an improvement to BERT

arxiv.org

Jun 19

arxiv.org

February 2019

OpenAI announces GPT·2

GPT·2 is a large transformer-based language model with 1.5 billion parameters, trained on a dataset of 8 million web pages. GPT·2 is trained with a simple objective: predict the next word, given all of the previous words within some text. The diversity of the dataset causes this simple goal to contain naturally occurring demonstrations of many tasks across diverse domains. GPT·2 is a direct scale-up of GPT, with more than 10X the parameters and trained on more than 10X the amount of data.

openai.com

Feb 14

openai.com

October 2018

HuggingFace launches Transformers library

github.com

Oct 29

github.com

Google releases BERT: Bidirectional Encoder Representations from Transformers

BERT revolutionized NLP and paved the way for many LLM developments. It popularized the idea of pre-training on large texts and creating a general NLP model for many tasks.

arxiv.org

Oct 12

arxiv.org

June 2018

State of the art language model from OpenAI (GPT·1)

OpenAI has obtained state-of-the-art results on a suite of diverse language tasks with a scalable, task-agnostic system. Their approach is a combination of two existing ideas: transformers and unsupervised pre-training.

openai.com

Jun 11

openai.com

May 2018

Google Duplex: An AI System for Accomplishing Real-World Tasks Over the Phone

Google released a demo that can carry out natural phone conversations to accomplish "real world" tasks like booking haircuts or making restaurant reservations.

ai.googleblog.com

May 08

ai.googleblog.com

February 2018

Resources introduce ELMo (Embeddings from Language Models)

ELMo presents one of the first adopted approaches to representing a word into numbers based on the word's context. This was a watershed idea in NLP and led to significant advancements in model architectures.

arxiv.org

Feb 16

arxiv.org

June 2017

Google introduces Transformer models in: “Attention is All You Need” paper

Google Brain researchers introduce a new simple network architecture based solely on attention mechanisms allowing self-supervised learning to be possible. This monumental paper enabled many of the latest advances in AI/ML.

arxiv.org

Jun 12

arxiv.org

September 2016

Deepmind announces WaveNet: A generative model for audio

deepmind.com

Sep 08

deepmind.com

January 2016

DeepMind announces AlphaGo: first program to defeat a pro Go player

AlphaGo is a model trained through reinforcement learning and supervised learning from human expert Go games. AlphaGo achieved a 99.8% win rate against other Go programs and defeated the human European Go champion by 5 games to 0.

nature.com

Jan 27

nature.com

June 2014

Researchers propose GANs: generative adversarial networks

GANs are new generative models trained by training two models: a generative model G that generates new examples and a discriminative model D that tries to classify examples as real or fake. This novel design led to breakthrough ML capabilities in image generation, video generation, and voice generation abilities.

arxiv.org

Jun 10

arxiv.org

September 2012

Groundbreaking object classification results using a convolutional neural network

Now considered one of the most influential papers published in computer vision, AlexNet is a convolutional neural network that significantly outperformed state-of-the-art object recognition benchmarks.

proceedings.neurips.cc

Sep 30

proceedings.neurips.cc

August 2009

Researchers announce ImageNet: an open and diverse dataset

ImageNet is one of the most diverse and biggest annotated image datasets created at its time. It inspired and kicked off a wave of computer vision advances.

ieeexplore.ieee.org

Aug 18

ieeexplore.ieee.org

Introducing Meta Llama 3: The most capable openly available LLM to date

The Claude 3 Model Family: Opus, Sonnet, Haiku

Gemma: Open Models Based on Gemini Research and Technology

Sora, a text-to-video model from OpenAI

Gemini 1.5: long-context up to 10 million tokens

Revisiting Feature Prediction for Learning Visual Representations from Video

OLMo : Accelerating the Science of Language Models

Nomic Embed: Training a Reproducible Long Context Text Embedder

Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling

Fuyu Heavy, a new multimodal model from Adept for UI understanding “behind only GPT-4 and Gemini”

Lumiere: A Space-Time Diffusion Model for Video Generation

mixtral-8x7b-32kseqlen

ReconFusion: 3D Reconstruction with Diffusion Priors

NVIDIA announces model for high-resolution text to video generation

StabilityAI released StableLM: an open source LLM with 3b and 7b parameters

Meta releases DINOv2 the first method for training computer vision models that uses self-supervised learning

OpenAI released Consistency Models: generative models that support fast one-step generation of images

Meta releases SAM:Segment Anything Model to segment any object from any photo or video

ChatGPT adds plugin capabilities (including a web browser and Python interpreter)

GitHub announces Copilot X: chat, voice, PRs, QA for coding

Microsoft introduces 365 Copilot - bringing AI assistance into the Office suite

OpenAI releases GPT4: a multimodal LLM with longer context windows

Anthropic releases Claude and Claude Instant: AI assistant APIs

Google announces PaLM API and new AI features in workspace

Stanford researchers introduce Alpaca 7B: an instruct tuned LLaMA model comparable to GPT3

Google Brain releases a new optimization algorithm called: Lion (EvoLved Sign Momentum)

Researchers release ControlNet: control large diffusion models using inputs beyond just text prompts.

StabilityAI launched MedARC: a research community building medical foundation models

Microsoft previews AI-powered browser in Edge

Microsoft announces The New Bing featuring an OpenAI integration

Google announces Bard, a search integrated chat bot

Runway announces Gen-1: a video-to-video generation model

Google announces Dreamix: Video Diffusion Models are General Video Editors

Microsoft integrates LLMs into Teams

OpenAI announces a new AI classifier for indicating AI-written text

Google announces MusicLM: a model to generate music from text

Microsoft announces Azure OpenAI Service

OpenAI announces API for ChatGPT

Sourcegraph announces an in-editor coding assistant called Cody built on Anthropic's Claude LLM

Google announces DreamerV3: the first general and scalable RL algorithm for playing games

Microsoft announces VALL-E: a LLM to generate speech from text

Apple announces new feature to generate audiobooks using AI

Researchers release a new benchmark: Holistic Evaluation of Language Models (HELM)

Google announces Muse: a text to image transformer model

Researchers train language models using one GPU for one day

Google introduces Med-PaLM: a LLM built on Flan-PaLM to answer medical questions

Meta announces plans to open-source OPT-IML: a LLM fine tuned on 2000+ tasks, to researchers

OpenAI open-sources Point-E: a diffusion model that generates 3D models

Text to audio generation (using Stable Diffusion)

New and Improved Embedding Model: text-embedding-ada-002

Meta introduces RA-CM3, the first multimodal model that can generate mixtures of images and text

OpenAI demos a model ('ChatGPT') for conversational interactions

We’re the research team working on Meta AI’s Universal Speech Translator project — Ask us anything!

OpenAI releases new GPT-3 model (text-davinci-003)

Meta presents CICERO: an AI that can play Diplomacy

Magic3D: Nvidia's high-resolution text-to-3D content creation tool

Notion introduces AI-assisted writing features

Galactica, a large language model for science

Metaphor releases search engine powered by LLMs

Google releases InfiniteNature: a model to generate high-resolution 3D flythroughs from an image

Midjourney starts alpha testing for new v4 model

DALL·E API now available in public beta

OpenAI announces startup program offering early access to models and $1m investment

Google releases AI-generated video from a long text prompt demo

Meta released ESMFold: A LLM to predict protein structures

Lex Friedman interviews Andrej Karpathy (formerly AI Lead at Tesla)

Interactive storybook demo powered by Stable Diffusion

BigCode releases The Stack: the largest code dataset (notably, all permissively licensed)

Shutterstock announces DALL·E integration and fund to compensate contributors

DeepMind proposes a method for in-context RL with transformers.

Stable Diffusion v1.5 unofficially released by Runway

Google announces U-PaLM 540B

Google releases open-source LLM Flan-T5

CarperAI announces plans for the first open-source GPT·3 like model

Replit releases mobile app with codegen built-in

Meta releases first speech-to-speech translation system for Hokkien

Potential GitHub Copilot lawsuit

Microsoft announces Designer: A DALL-E powered design tool

Joe Rogan interviews Steve Jobs generated by AI

State of AI publishes their 2022 report