Killed by LLM

A memorial to the benchmarks that defined—and were defeated by—AI progress.

About

This website is meant to be a bit of fun, and to help us take a look back and remember the massive amount of progress that has been made — much of which I didn't believe I'd see within my lifetime.

It has also been heavily inspired by Cody Ogden's Killed by Google.

Understanding "Saturation"

For KilledByLLM, "Saturation" means a benchmark can no longer measure the frontier. While these benchmarks are still incredibly useful, valuable tools — they are no longer able to meaningfully contribute to the question of "Can AI do X?"

Data Collection Challenges

This project represents a best-effort attempt to document benchmarks-of-note that have been enveloped by LLMs. Proper attribution, timing and scores have been difficult to determine definitively, hence there may be some errors.

To illustrate this, let's examine "Qwen-2.5-72B-instruct" on MATH:

From Qwen's technical report - 83.1
From Stanford's HELM - 79.0
From Huggingface's Open LLM Leaderboard - 38.7

These scores significantly deviate from each other!

Score Selection Approach

We take scores in the following priority:

From the author's paper/technical report/model card
From succeeding benchmark papers (e.g. SQuAD 2.0 discusses SQuAD 1.1 performance)
From third party sources (e.g. Stanford's HELM)

Please raise an issue or PR if you identify any discrepancies!

Development

# Install dependencies
npm install

# Run development server
npm run dev

# Build for production
npm run build

Contributing

Benchmarks are defined in src/data.ts. To add a new benchmark or update existing data:

Fork the repository
Add/modify the benchmark data
Submit a PR with your changes

Please include:

Links to papers/evidence supporting the saturation claim
Original and final scores
The model that first achieved saturation
Date of creation and saturation

License

MIT

P.S. The em dashes on this page were lovingly handwritten by humans.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/workflows		.github/workflows
dist-github		dist-github
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
tailwind.config.js		tailwind.config.js
tsconfig.app.json		tsconfig.app.json
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Killed by LLM

About

Understanding "Saturation"

Data Collection Challenges

Score Selection Approach

Development

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

R0bk/killedbyllm

Folders and files

Latest commit

History

Repository files navigation

Killed by LLM

About

Understanding "Saturation"

Data Collection Challenges

Score Selection Approach

Development

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages