Skip to content

Latest commit

 

History

History

generation

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Chessort - Data Generation

Overview

This is a helper script designed to pre-generate Chessort games from FEN strings using the Stockfish engine. It retrieves the top N moves, filters out positions that do not meet a minimum number of moves, and saves the results to CSV files along with metadata.

Folder Structure

  • out/: This folder contains the output files generated by the script. Each output file represents a chunk of processed data along with its corresponding metadata file. Move files from this folder into chunks when they are ready to be used for the game.
  • chunks/: This folder is for pre-processed chunks of data that are ready for further use or integration into a database. We enforce the limit of 1000 for each chunk.
  • lichess-data/: This folder holds the CSV files from the Lichess Open Database project (Lichess Puzzles). The actual raw files are too large and are not included directly here. The input file must be named lichess_db_puzzle.csv. Refer to lichess_db_puzzle.csv.sha256 to identify the version we're using.

Script

generate.py

The generate.py script processes chess positions from the Lichess puzzle CSV file using the Stockfish engine. See the header inside the file for more information.

criteria_filter.py

The criteria_filter.py script helps determine how many results match specific filter criteria. Use it to help configure generate.py.

Functionality

  • Analyzes chess positions from FEN strings using Stockfish.
  • Retrieves the top N moves for each position.
  • Filters out positions that do not meet a minimum number of moves.
  • Saves the results to CSV files.
  • Generates metadata for each chunk of processed data.

How to Run

  1. Setup Stockfish: Download and install Stockfish from the Stockfish website. Ensure the Stockfish engine path is correctly set in the STOCKFISH_PATH environment variable.
  2. Prepare Input Data: Place the Lichess puzzle CSV file in the lichess-data/ folder. The input file must be named lichess_db_puzzle.csv.
  3. Run the Script:
    python generate.py

Output

  • CSV Files: The processed data is saved in the out/ folder as CSV files with the format chessort-{offset}-{limit}.csv.
  • Metadata Files: Each CSV file is accompanied by a metadata JSON file with the same prefix, containing details about the processing parameters and file hashes.

Example

After running the script, you might see the following files in the out/ folder:

  • chessort-10000-10.csv
  • chessort-10000-10.metadata.json

These files represent a chunk of 10 processed lines starting from the offset of 10,000 in the Lichess puzzle CSV file, along with their processing metadata.

Example Files

chessort-10000-10.csv

LichessPuzzleId,FEN,Rating,PreLastMovePositionEvaluation,LastMove,CurrentPositionEvaluation,EvaluatedMoves
09XNg,3rr1k1/ppp1qppp/5n2/8/1PPn4/P6P/1B1PBP2/RN1QR1K1 b - - 0 14,1425,-48,g2h3 -370,+370,"d4e2 +370,e7e6 +187,f6e4 +17,e7e5 -2,f6h5 -22,f6d5 -25,d4f3 -112,d4f5 -183,f6d7 -201,e7e4 -214"
09Xh7,B4rk1/p1p5/3bp3/5pq1/3P2n1/2P5/PPQ1Pp2/R1B2K1R b - - 1 20,1871,+107,e1f1 #-2,#+2,"g4h2 #+2,d6f4 -65,g5g7 -65,g5g6 -152,g5f6 -239,g5e7 -254,g5d8 -318,g5e3 -383,g4e3 -385,g5f4 -387"
09Xpo,2b1r1k1/4pp1p/3p1npB/5qP1/2Q5/5Pr1/1P2B2K/1R3R2 w - - 3 30,1887,+360,d5f5 -62,+62,"h2g3 +62,c4h4 -509,c4f7 -513,c4c8 -558,c4g4 -590,c4e6 -626,g5f6 #-1,b1c1 #-1,e2d3 #-1,h6f8 #-1"
09Xtv,8/8/5k2/8/ppp3PP/2P2P2/PP3K2/8 b - - 0 36,2045,+530,h2h4 +37,-37,"a4a3 -37,f6g6 -543,b4b3 -543,b4c3 -555,f6e5 -557,f6f7 -559,f6e6 -570,f6e7 -574,f6g7 -576"
09XVr,r5k1/4Rp2/1qP3pp/1p1pQN2/1P6/6P1/r4PP1/4R1K1 b - - 0 39,973,+429,d4f5 #-2,#+2,"b6f2 #+2,g6f5 -210,b6d4 -770,f7f6 #-2,h6h5 #-1,a2f2 #-1,a8e8 #-1,a2c2 #-1,a8f8 #-1,a2a1 #-1"

chessort-10000-10.metadata.json

{
    "stockfishVersion": "16.1",
    "offset": 10000,
    "limit": 10,
    "evaluationDepth": 10,
    "multipv": 10,
    "minimumMovesRequired": 4,
    "minPopularityRequired": 90,
    "minNumberPlaysRequired": 100,
    "maxRatingDeviation": 100,
    "inputLichessFileSha256": "a480b5c25389d653800889bcf223d32a622249bd3d6ba3e210b8c75bc8092300",
    "outputFileSha256": "3c3fcc7e1f077d5299c903da2495ee170b196f34aa147d2d816dcba813f7362f"
}