This is a helper script designed to pre-generate Chessort games from FEN strings using the Stockfish engine. It retrieves the top N moves, filters out positions that do not meet a minimum number of moves, and saves the results to CSV files along with metadata.
- out/: This folder contains the output files generated by the script. Each output file represents a chunk of processed data along with its corresponding metadata file. Move files from this folder into
chunks
when they are ready to be used for the game. - chunks/: This folder is for pre-processed chunks of data that are ready for further use or integration into a database. We enforce the limit of
1000
for each chunk. - lichess-data/: This folder holds the CSV files from the Lichess Open Database project (Lichess Puzzles). The actual raw files are too large and are not included directly here. The input file must be named
lichess_db_puzzle.csv
. Refer tolichess_db_puzzle.csv.sha256
to identify the version we're using.
The generate.py
script processes chess positions from the Lichess puzzle CSV file using the Stockfish engine. See the header inside the file for more information.
The criteria_filter.py
script helps determine how many results match specific filter criteria. Use it to help configure generate.py
.
- Analyzes chess positions from FEN strings using Stockfish.
- Retrieves the top N moves for each position.
- Filters out positions that do not meet a minimum number of moves.
- Saves the results to CSV files.
- Generates metadata for each chunk of processed data.
- Setup Stockfish: Download and install Stockfish from the Stockfish website. Ensure the Stockfish engine path is correctly set in the
STOCKFISH_PATH
environment variable. - Prepare Input Data: Place the Lichess puzzle CSV file in the
lichess-data/
folder. The input file must be namedlichess_db_puzzle.csv
. - Run the Script:
python generate.py
- CSV Files: The processed data is saved in the
out/
folder as CSV files with the formatchessort-{offset}-{limit}.csv
. - Metadata Files: Each CSV file is accompanied by a metadata JSON file with the same prefix, containing details about the processing parameters and file hashes.
After running the script, you might see the following files in the out/
folder:
chessort-10000-10.csv
chessort-10000-10.metadata.json
These files represent a chunk of 10 processed lines starting from the offset of 10,000 in the Lichess puzzle CSV file, along with their processing metadata.
LichessPuzzleId,FEN,Rating,PreLastMovePositionEvaluation,LastMove,CurrentPositionEvaluation,EvaluatedMoves
09XNg,3rr1k1/ppp1qppp/5n2/8/1PPn4/P6P/1B1PBP2/RN1QR1K1 b - - 0 14,1425,-48,g2h3 -370,+370,"d4e2 +370,e7e6 +187,f6e4 +17,e7e5 -2,f6h5 -22,f6d5 -25,d4f3 -112,d4f5 -183,f6d7 -201,e7e4 -214"
09Xh7,B4rk1/p1p5/3bp3/5pq1/3P2n1/2P5/PPQ1Pp2/R1B2K1R b - - 1 20,1871,+107,e1f1 #-2,#+2,"g4h2 #+2,d6f4 -65,g5g7 -65,g5g6 -152,g5f6 -239,g5e7 -254,g5d8 -318,g5e3 -383,g4e3 -385,g5f4 -387"
09Xpo,2b1r1k1/4pp1p/3p1npB/5qP1/2Q5/5Pr1/1P2B2K/1R3R2 w - - 3 30,1887,+360,d5f5 -62,+62,"h2g3 +62,c4h4 -509,c4f7 -513,c4c8 -558,c4g4 -590,c4e6 -626,g5f6 #-1,b1c1 #-1,e2d3 #-1,h6f8 #-1"
09Xtv,8/8/5k2/8/ppp3PP/2P2P2/PP3K2/8 b - - 0 36,2045,+530,h2h4 +37,-37,"a4a3 -37,f6g6 -543,b4b3 -543,b4c3 -555,f6e5 -557,f6f7 -559,f6e6 -570,f6e7 -574,f6g7 -576"
09XVr,r5k1/4Rp2/1qP3pp/1p1pQN2/1P6/6P1/r4PP1/4R1K1 b - - 0 39,973,+429,d4f5 #-2,#+2,"b6f2 #+2,g6f5 -210,b6d4 -770,f7f6 #-2,h6h5 #-1,a2f2 #-1,a8e8 #-1,a2c2 #-1,a8f8 #-1,a2a1 #-1"
{
"stockfishVersion": "16.1",
"offset": 10000,
"limit": 10,
"evaluationDepth": 10,
"multipv": 10,
"minimumMovesRequired": 4,
"minPopularityRequired": 90,
"minNumberPlaysRequired": 100,
"maxRatingDeviation": 100,
"inputLichessFileSha256": "a480b5c25389d653800889bcf223d32a622249bd3d6ba3e210b8c75bc8092300",
"outputFileSha256": "3c3fcc7e1f077d5299c903da2495ee170b196f34aa147d2d816dcba813f7362f"
}