A powerful and versatile yet easy to use game scraper written in C++ for use with multiple frontends running on a RetroPie system. It scrapes various game resources from various web sources, including media such as screenshot, cover and video.
Currently supports the following frontends (set with '-f'):
- EmulationStation
- AttractMode
Currently supports the following platforms (set with '-p'):
- Amiga (OCS, ECS, AGA, CD32, CDTV)
- Apple 2
- Arcade
- Atari2600
- Atari5200
- Atari7800
- Atari Jaguar
- Atari ST
- ColecoVision
- Commodore 64
- Game Boy
- Game Boy Advance
- Game Boy Color
- Megadrive / Genesis
- MSX (MSX, MSX 2, MSX 2+, MSX Laserdisc)
- NeoGeo
- Nintendo 64
- Nintendo DS
- Nintendo Entertainment System
- PC-Engine / TurboGrafx-16
- Playstation
- Playstation Portable
- Sega CD
- Sega Game Gear
- Sega Master System
- Super Nintendo
- ZX Spectrum
... More platforms will be added in future releases!
Currently supports the following scraping sources (set with '-s')
- WEB: openretro.org
- WEB: thegamesdb.net
- WEB: worldofspectrum.org
- WEB: adb.arcadeitalia.net (Arcade Database by motoschifo, [email protected], https://www.youtube.com/c/ArcadeDatabase)
- WEB: screenscraper.fr
- LOCAL: localdb (scrapes exclusively from cached resources, read more under "Local database features")
- LOCAL: import (imports resources located in '[homedir]/.skyscraper/import' into the local database. Read more under "Local data import")
... More scraping sources will be added in future releases!
For Amiga I STRONGLY recommend you to set up your RetroPie to use WHDLoad. Follow this guide:
Install this package:
- $ sudo apt-get install qt5-default
- [enter your 'pi' user password, default is 'raspberry']
Create a folder for the Skyscraper source, download the latest release, compile it and install it:
- $ cd /home/pi
- $ mkdir sources
- $ cd sources
- $ wget https://github.com/muldjord/skyscraper/archive/2.0.1.tar.gz
- $ tar xvzf 2.0.1.tar.gz
- $ cd skyscraper-2.0.1
- $ qmake
- $ make
- $ sudo make install
- [enter your 'pi' user password, default is 'raspberry']
If everything went well you are now ready to run Skyscraper!
IMPORTANT!!! In order for Skyscraper to work properly, it is necessary to quit your frontend before running it! If you're running EmulationStation, you can quit it by pressing F4.
Now, I recommend taking a look at the command line options first:
- $ Skyscraper --help
This will give you a description of everything Skyscraper can do if you feel adventurous!
However, Skyscraper was designed to work with default options. If you're using the EmulationStation frontend, basically all you need to do is type:
- $ Skyscraper -p [platform]
Where [platform] must be one of the supported platforms (check '--help' for a list). This will scrape using the default scraping module for that platform.
NOTE: To enable video scraping for the scraping modules that support it, you need to add the '--videos' command line option. This is disabled per default because of the significant space requirements needed to save them.
I recommend scraping with different scraping modules one after the other with the '-s' option:
- $ Skyscraper -p [platform] -s [scraper]
This will scrape using the specified scraping module instead of the default one. Every time you scrape with a new scraping module, all resources from that module will be cached locally. After you've scraped with different scraping modules, always rescrape with:
- $ Skyscraper -p [platform] -s localdb
Scraping with localdb will combine all of your locally cached resources into the most complete results. Remember to always overwrite your gamelist and NOT skip existing entries, unless you have a specific reason to do so. Read on for a more thorough description of localdb and how to prioritize the results.
Whenever you scrape any platform with any web scraping module, Skyscraper caches each resource locally. A resource can, for instance, be a game 'title' or a game 'screenshot'. Each game can have several versions of each resource cached locally. One of each type per web scraping module. This comes in handy when using the 'localdb' scraping module.
After a while you'll have accumulated a decent amount of locally cached data for any given platform. To exclusively make use of this data Skyscraper provides the 'localdb' scraping module (set with '-s localdb'). By using this source Skyscraper only scrapes from the locally cached data. Depending on how many sources you've scraped any given platform with, the 'localdb' module will give you almost perfect results, with almost no data missing. Per default any resource type is prioritized by timestamp. But it is also possible to prioritize them by scraping source. So if you prefer the 'description' results from a certain scraper, you can easily make sure that these will be prioritized above any other descriptions available. Read more about how to do this here.
Normally the locally cached data is persistent. This means that it will only allow one instance of any type of resource for any rom per scraping source. If you later wish to update the resources for a certain source, Skyscraper provides the '--updatedb' option. If this flag is set on the command line, any data in the local cache will be updated with the new incoming data. So if rom X has a description that you feel is lacking, and you've noticed that the data from a specific scraping module is more to your liking, simply rescrape the platform with '-s [scraping module] --updatedb' and the locally cached data will be updated. Then prioritize it to make use of it with '-s localdb'. Read more about how to do this here.
The default folder for all of Skyscrapers' locally cached data is in the '[homefolder]/.skyscraper/dbs' subfolder. In this folder you'll find the individual platform db subfolders. Any platform db folder is selfcontained and can be copied to a USB drive, or zipped up and uploaded to share with friends.
Normally Skyscraper uses a default local db folder for each platform. But a friend might have send you a copy of his local database folder, and you wish to scrape from his data. In this case Skyscraper allows you to force the use of a local database with the '-d [db folder]' command line option. Keep in mind that if your friend has zipped the db folder for convenience, you need to unzip it before use. Skyscraper does not currently support zipped db folders.
If you start copying your local databases to and from friends, or you accumulate some really big local databases that you sleep with at night because you love them so much - ALWAYS remember to back these up from time to time! Skyscraper is software. Software has bugs. And even though I do quite a bit of testing and feel confident in my code, bugs are inevitable from time to time.
Basically what I'm trying to say is that it is entirely your own fault if you've spent 6 months creating a bunch of local db's and suddenly you overwrite them unintentionally or Skyscraper corrupts the data for some i-have-no-idea-how reason. It could happen. So... PLAN YOUR BACKUPS! And don't come crying to me. :D
I addition to allowing scraping from local resources, Skyscraper also allows you to import your own data into the local cache, which in turn allows you to scrape your roms with it using the '-s localdb' scraping module.
Skyscraper allows you to import various rom resources from the local '[homedir]/.skyscraper/import' folders. Simply place your data inside these folders with the EXACT filename of the roms you wish to connect them to. For instance, if you have a rom called 'Bubble Bobble.nes' you would place your snap for this rom inside '[homedir]/.skyscraper/import/snaps' called 'Bubble Bobble.png'. Other image file formats are also supported.
Now run the scraper with the '-s import' option:
- $ Skyscraper -p [platform] -s import
If you've named the files correctly, the game will show with a green 'YES' for cover/boxart, screenshot/snap and video. If you've imported textual data, it will show the data at the relevant output line. Now, to make use of the imported data, scrape with the '-s localdb' scraper and your resources will be prioritized above all others, as defined in '[homedir]/.skyscraper/dbs/[platform]/priorities.xml'.
- $ Skyscraper -p [platform] -s localdb
Then start your frontend and enjoy your newly imported rom data. :)
For textual data, you need to first create a file called '[homedir]/.skyscraper/import/definitions.dat'. In this file, you must define the file content format you are providing for each rom. For instance, if your data comes in the form of 1 xml file per rom, and you wish to scrape 'publisher' for this rom, perhaps your input file has a node like '<publisher>This is the publisher</publisher>
'. In the 'definitions.dat' file you'd then add a line looking like '<publisher>###PUBLISHER###</publisher
'. The '###PUBLISHER###
' tag is recognized by Skyscraper. Read a more detailed description with examples here.
Skyscraper allows you to fully customize how you want the final artwork to appear and what effects should be applied. Check the 'artwork' section in 'config.ini.example' for a full list of available options. I've also created some artwork examples to get you started. Try appending one of the following lines to your Skyscraper command line options.
Append command line option '-c config_artwork01.ini'
Append command line option '-c config_artwork02.ini'
Append command line option '-c config_artwork03.ini'
- Updated 'arcadedb' result parsing to fit new format
- Removed 'mamedb' source files as they were no longer in use
- Slightly changed help text for scraping modules
- 'thegamesdb' now properly uses Qt's XML parser
- 'screenscraper' now properly uses Qt's XML parser
- Started implementing region and lang support for 'screenscraper', but still not enabled
- Back to basics: Removed several web sources. Now only allows the ones I have explicit permission to use.
- Properly implemented official API for 'arcadedb' module
- Added scraping module info to output per result but only when using '--verbose'
- Added check for unreasonably bad scraping runs, making Skyscraper exit if 30 of 30 files miss from the get-go
- Added support for 'coleco' platform
- Added support for 'pcengine' platform
- Added zip support to all platforms per request from users. :)
- Added 'rating' scraping to 'thegamesdb'
- Added 'arcadedb' scraper module with video support
- Vastly improved scraping of 'neogeo' and 'arcade' platforms in general by mapping the filenames to real names from mameMap.csv
- Improved 'neogeo' and 'arcade' search platform matching
- Added textual import with 'import' scraper using '[homedir]/.skyscraper/important/definitions.dat' file
- Added video import with 'import' scraper
- Improved 'uvlist' description scraping
- Now properly handles empty nodes in EmulationStation gamelist.xml export
- Added 'developer' support for 'uvlist' scraper
- Improved html unescaping a lot
- Cleaned up xml escaping
- Added 'import' scraper, scraping from resources located in '[homedir]/.skyscraper/import' folder
- Added 'uvlist' scraping module
- Added rating resource and support
- Added rating support to lemonamiga
- Added rating support to lemon64
- Added rating support to mobygames
- Fixed make install
- Moved all source files to 'src' folder
- '[homedir]/.skyscraper' is now default folder for all files used by Skyscraper
- '/usr/local/bin/Skyscraper' is now default location for Skyscraper executable
- Refined '--help' output a bit
- Fixed lemon64 scraping
- Added 'lemonamiga' scraping module
- Added '--skipped' command line option
- Added 'make install' for correct installation of files
- MAJOR: Fixed and refined 'attractmode' frontend implementation, now works in a basic manner
- 'attractmode' can now skip existing entries
- 'emulationstation' now properly add brackets to 'name' on skipped entries
- Added check for 'db.xml' when doing '--cleandb'
- Refactored GameEntry variables
- Changed GameEntry from struct to class
- Added 'Overall title similarity' to final output
- Added 'Overall completeness' to final output
- Code refactoring here, there and everywhere
- Now accepts results where we have low editDistance, but high similarity (For instance "Disney's Darkwing Duck" with fileName "Darkwing Duck").
- Added '--nobrackets' option that disables and [] and () tags in the frontend game titles. (Thanks for the feedback 'incunabula')
- Fixed bracket parsing
- Now always uses completeBaseName since some filenames have more than one '.'
- Completely rewrote sorting algorithm. 30 lines became one with a nifty C++11 lambda :D
- Added zip format to GameGear and MSX platforms
- Now uses filenames for output image files again
- Now allows more resources of same type, as long as 'source' differs
- Now allows user to set priorities for local resource sources
- Fixed a bug that would nullify timestamp of local resources
- Optimized LocalDb communication to improve scraping speed
- Added README.md to dbs subfolder
- Added priorities.xml.example file to dbs subfolder. Automatically copies this to new databases when they are created if none already exists.
- Implemented '--cleandb' command line option that removes files with no resource entry
- Implemented '--mergedb' command line option that merges two local databases together
- Now no longer does sha1 for roms bigger than 50 MBs (Pi runs out of ram when reading them). Instead does sha1 on filename for those special cases.
- Removed default platform when scraping. You are now forced to put in a valid platform with '-p [platform]'
- Added more initial info when running Skyscraper
- Added '--unattend' command line option
- Added 'source' attribute to local database resources
- Removed 'mobygames' descriptions from 'openretro' scraper. Now uses native descriptions.
- Improved cover and screenshot scraping for 'openretro' module
- Disabled filling in missing data when scraping from web sources. User is meant to use 'localdb' scraping module for this.
- Implemented date formats to standardize output and better support EmulationStation requirements
- MAJOR: Added support for local database resources
- MAJOR: Added support for video scraping (currently supported in the 'screenscraper' scraping module)
- MAJOR: Added 'localdb' scraping module
- Added video tag in EmulationStation gamelist.xml output. Beware though, the Pi's are having a difficult time showing the videos properly.
- Added several new command line options relevant to the new video and localdb features
- Added cover, screenshot and video as part of the result output with "YES" or "NO" depending on whether they were found or not
- Fixed a bug where image tag in gamelist.xml had wrong path when using non-default path
- Now uses rom or filename (for .uae) sha1 for image filename, in case people have several roms with the same name under subdirs
- Added 'players' scraping for 'mobygames' module and improved screenshot getter even more
- Implemented a slightly hacky fix that removes some (but not all) warnings caused by a known Qt bug
- Improved final sorting algorithm
- Code refactoring / polishing
- Hardcoded screenscraper devid and password, since this seems to be the right way of doing it
- Refactored internal config handling to be MUCH cleaner
- Fixed bug in xml escaping
- Fixed the final sorting algorithm so it no longer outputs double entries
- Added '--nosubdirs' command line argument to exclude subdirs when scraping
- Slightly changed screen output colors and text
- Implemented generic AbstractFrontend class
- Added support for multiple frontends set with '-f'
- Changed default scraper for certain platforms based on new stats
- Corrected max threads from 4 to 8 in command line help text
- Fixed default config value bug
- Added 'msx' platform support
- Added 'psp' platform support
- Added possibility for one input platform to match many different scraper source platforms
- Added 'atarist' support
- Improved mobygames getScreenshot function
- Improved sorting by always moving "The" to end of game title
- Upped allowed number of threads to 8 (Wrooooooom!!!)
- Fixed a nasty race condition related to config file access that caused regular crashes
- Refactored config default loading to be easier to understand
- Fixed a silly memory hog that caused memory to be eaten up (not a leak, just really silly)
- Now allows user to skip existing entries
- Now allows user to set max description length with '-l'
- Refactored thread result communication to be much cleaner
- Fixed a bug where Skyscraper would get caught in an endless loop when scraping 'amiga' using the 'openretro' scraping module.
- Now quits if GamesDatabase has blocked you.
- Added support for user ID and KEY when using scrapers that require it
- Added 'screenscraper' scraping module
- Mobygames scraping module no longer truncates descriptions
- OpenRetro scraping module no longer truncates descriptions
- OpenRetro now always removes '[AGA]' from search results, since it will be appended later
- Now supports multitags of both the '()' and '[]' variety
- Now properly removes html tags from game descriptions in various scraper modules
- Fixed situations where using OpenRetro would result in a few blank covers
- Added 'worldofspectrum' scraping module (Have fun, Dom! :D)
- Now also handles filename parenthesis comments (eg. '(disc 1 of 2)')
- Now properly handles number of threads if number of files are less than allowed threads
- Changed default scraping module for a number of platforms based on stats
- Added 'gamesdatabase' scraping module
- Now detects if a game is a sequel and pays more attention to it when looking for matches
- Now properly appends tag such as [AGA] back into the title name when writing the xml
- Added '--stats' command line option for exporting platform scraping stats
- Changed default scraper for a bunch of platforms based on stats
- Added support for the 'apple2' platform
- Added support for the 'atari5200' platform
- Added support for the 'atarijaguar' platform
- Added support for the 'gb' platform
- Added support for the 'gbc' platform
- Added support for the 'n64' platform
- Added support for the 'nds' platform
- Added support for the 'segacd' platform
- Added MameDB scraper module
- Added support for the 'neogeo' platform
- Added support for the 'arcade' platform
- Added support for the 'atari2600' platform
- Added support for the 'atari7800' platform
- Added support for the 'gamegear' platform
- Added support for the 'mastersystem' platform
- Added 'Estimated time' to output
- Redesigned thread initialization
- FINALLY found and fixed memory leak :) Verified with 'ps'
- Probably fixed a memory leak... I hope. :S
- Artwork is now fully customizable (check 'config.ini.example')
- Added artwork dropshadow effect
- Added artwork config examples
- Added support for megadrive / genesis
- HOL scraping module added
- Now properly supports config file setup using '-c' (see 'config.ini.example')
- Now allows user to force a certain scraper module using '-s'
- MobyGames scraping support added
- Added support for psx, nes, snes and zxspectrum
- Now provide 'config.ini.example'
- Lemon64 scraping support added (now default for 'c64')
- Fixed major bug in Amiga scraping which caused it to skip more than half :S
- Fixed minor error in command line descriptions
- Now prints chosen platform when starting
- Removed 'unpublished' command line option, as it was too specific
- Now supports both Amiga and C64 scraping. Automatically chooses best scraper for each platform
- Modularized scraping definitions
- TheGamesDb scraping support added (default for 'c64')
- OpenRetro + MobyGames (for descriptions) scraping support added (default for Amiga)
- Added threaded scraping! Set number of threads with "-t" command line option. Check "--help" for more info
- Added options for gamelist and images folder
- First public release