- Blind mode tutorial
lichess.org
Donate

Aix: Efficiently storing and querying chess game collections

ChessSoftware DevelopmentAnalysis
The Lichess database contains over 7 billion chess games played on Lichess. To make it easier to query these games, I have released the open-source Aix extension for DuckDB and the accompanying Aix-compatible Lichess database.

This blog post is a shortened version of the more in-depth post on my website.

Introduction

Every month, the Lichess database grows about 100 million games in size. All together, the size of the compressed PGN files adds up to over 2 TB. Uncompressed, they would be over 15 TB. This is not trivial to query.

Several tools exist that can handle queries on large chess databases, such as Scoutfish, pgn-extract or CQL. These tools are very useful, but also have their limitations:

  • They require uncompressed PGN files, which can take up a lot of space.
  • They only implement filtering, other query operations (e.g., aggregations) are not supported.
  • They only execute queries on the game’s moves (and perhaps PGN tags), while clock times and engine evaluations are out of scope.

Any processing beyond the above items would need custom code with the help of libraries such as python-chess or shakmaty.

To make it easier to query such massive databases, I developed Aix. Now, with the Aix extension for DuckDB, and the Aix-compatible Lichess database (or pgn-to-aix for your own PGNs), you can execute SQL queries over (Lichess) games. For example, to generate a heatmap of king move destinations, this query can do the job:

with king_destinations as (
    select
        move_details(movedata)
            .list_filter(lambda m: m.role = 'k')
            .apply(lambda m: m.to)
        as destinations
    from 'aix_lichess_2025-12_low.parquet'
),
unnested as (
    select unnest(destinations) as destination from king_destinations
),
aggregated as (
    select destination, count() from unnested group by 1 order by 2 desc
)

from aggregated;

Which results in:


 destination  count_star() 
   varchar       int64     

 g1               74020594 
 g8               71579360 
 g7               23388424 
...

The move_details function from the Aix extension makes this possible – the other functions (such as list_filter) are core functions from DuckDB and are very powerful in combination with the Aix functions.

The Aix-compatible Lichess database offers three compression levels for the encoded chess games: Low, Medium, and High. A lower compression level provides faster decoding, so the choice between these levels enables a trade-off between speed and space usage. On the December 2025 file with Low compression (15.5 GB), the heatmap query takes about 92 seconds (AMD Ryzen Threadripper 3960X, with DuckDB thread count limited to 24). On the Medium (13.5 GB) and High (11.8 GB) compression files, this takes 104 and 241 seconds respectively.

Getting started with Aix

Start by installing DuckDB 1.4.4, then install the Aix extension. The extension is now available in the Community Extensions as aixchess, so you can install it with the following DuckDB command:

INSTALL aixchess FROM community;

(If you installed the extension earlier from the files in the GitHub release, do FORCE INSTALL aixchess FROM community;.)

Then load the extension (also do this for every new DuckDB session):

LOAD aixchess;

Now all of the extension functions are available. Download one of the Aix-compatible Lichess database files on Hugging Face and try out the above heatmap query! To experiment on a smaller file, download one of the older months or derive a smaller file from a large one using SQL’s LIMIT clause.

(At the time of writing, the dataset on Hugging Face is still very incomplete, as I’m still generating and uploading Aix files. If you want to try Aix on a month that’s not available yet, you could use pgn-to-aix to generate it already.)

If you have your own PGN file that you want to use Aix on, you first need to convert it to an Aix-compatible Parquet file using pgn-to-aix.

The functions documentation provides further guidance, and the unit tests also provide some simple usage examples.

If you want to process the Aix database files in a manner too complex for an SQL query, you can decode Aix-encoded games using the Rust crate aix-chess-compression. Read the rows from a database using a crate such as duckdb or parquet, then use aix-chess-compression to decode the moves/positions.

Further reading

If you'd like to learn more, refer to the more in-depth blog post on my website. Don't forget to try out Aix yourself and let me know your thoughts!