diff --git a/README b/README index f89b89b..4fe9fe4 100644 --- a/README +++ b/README @@ -9,7 +9,7 @@ COMPILE/INSTALL/RUN Windows ------- Windows users can download and run a pre-compiled Windows binary -[here](https://github.com/ttsiodras/MandelbrotSSE/releases/download/2.10/mandelSSE-win32-2.10.zip). +[here](https://github.com/ttsiodras/MandelbrotSSE/releases/download/2.11/mandelSSE-win32-2.11.zip). After decompressing, you can simply execute either one of the two .bat files. The 'autopilot' one zooms in a specific location, while the other @@ -22,27 +22,18 @@ cross-compilation instructions later in this document. For Linux/BSD/OSX users ----------------------- -Make sure you have libSDL2 installed - then... +Make sure you have libSDL2 installed. In Debian and its derivatives, +like Ubuntu, just `sudo apt install libsdl2-dev`. + +Then, build the code - with... $ ./configure $ make -You can then simply... +Usage +----- - $ src/mandelSSE -h - Usage: ./src/mandelSSE [-a|-m] [-h] [-b] [-f rate] [WIDTH HEIGHT] - Where: - -h Show this help message - -m Run in mouse-driven mode - -a Run in autopilot mode (default) - -b Run in benchmark mode (implies autopilot) - -v Force use of AVX - -s Force use of SSE - -d Force use of non-AVX, non-SSE code - -f fps Enforce upper bound of frames per second (default: 60) - (use 0 to run at full possible speed) - - If WIDTH and HEIGHT are not provided, they default to: 1024 768 +You can then try these: $ src/mandelSSE (Runs in autopilot in a 1024x768 window) @@ -51,8 +42,30 @@ You can then simply... (Runs in mouse-driven mode, in a 1280x720 window) (left-click zooms-in, right-click zooms out) -For ultimate speed, disable the frame limiter - by default, you are -limited to 60fps: +Option `-h` gives you additional information about how to control +the Mandelbrot zoomer: + + $ ./src/mandelSSE -h + + Usage: ./src/mandelSSE [-a|-m] [-h] [-b] [-v|-s|-d] [-i iter] [-p pct] [-f rate] [WIDTH HEIGHT] + Where: + -h Show this help message + -m Run in mouse-driven mode + -a Run in autopilot mode (default) + -b Run in benchmark mode (implies autopilot) + -v Force use of AVX + -s Force use of SSE + -d Force use of non-AVX, non-SSE code + -i iter The maximum number of iterations of the Mandelbrot loop (default: 2048) + -p pct The percentage of pixels computed per frame (default: 0.75) + (the rest are copied from the previous frame) + -f fps Enforce upper bound of frames per second (default: 60) + (use 0 to run at full possible speed) + + If WIDTH and HEIGHT are not provided, they default to: 1024 768 + +For ultimate rendering speed, you can disable the frame limiter (option `-f`). +By default, you are limited to 60fps: $ src/mandelSSE -m -f 0 1280 720 @@ -62,19 +75,43 @@ tell SDL you don't care about displaying the fractal: $ SDL_VIDEODRIVER=dummy src/mandelSSE -b 512 384 +Be mindful of your CPU's thermal throttling if you are benchmarking :-) +Note that you can force AVX (-v), SSE (-s) or dumb floating point (-d) +to see the speed impact made by our usage of special Intel instructions. + +You can also control: + +- the percentage of pixels actually computed per frame, with option `-p`. + If you e.g. pass `-p 0.5`, then 100-0.5 = 99.5% of the pixels will be + copied from the previous frame, and only 0.5% will be actually derived + through the Mandelbrot computations. Amazingly, this is enough for + a decent quality fly-through zoom in the fractal. + By default, this is set to 0.75. + +- the number of Mandelbrot iterations (option `-i`). By default this is + set to 2048 to allow for decent zoom levels, but if you want to see + insane speeds, set this to something low, like 128; and disable the + frame limiter; i.e. use `-f 0 -i 128`. + WHAT IS THIS, AGAIN? ==================== +Long story. + When I got my hands on an SSE enabled processor (an Athlon-XP, back in 2002), I wanted to try out SSE programming... And over the better part of a weekend, I created a simple implementation of a Mandelbrot zoomer in SSE assembly. I was glad to see that my code was almost 3 times faster than pure C. But that was just the beginning. + Over the last two decades, I kept coming back to this, enhancing it. - I learned how to use the GNU autotools, and made it work on most Intel - platforms: checked with Linux, Windows (MinGW) and OpenBSD. + platforms: checked with Linux, Windows (MinGW) and OpenBSD. + A decade later, I also tested it on Raspbian and Armbian; it works + fine in ARM machines as well. Autotools also allow me to cross-compile + for Windows (more on that below). - After getting acquainted with OpenMP, in Nov 2009 I added OpenMP #pragmas to run both the C and the SSE code in all cores/CPUs. The SSE code had to @@ -82,7 +119,7 @@ Over the last two decades, I kept coming back to this, enhancing it. was worth it. The resulting frame rate - on a tiny Atom 330 running Arch Linux - sped up from 58 to 160 frames per second. -- I then coded it in CUDA - a 75$ GPU card gave almost two orders of +- I then coded it in CUDA - a 75$ GPU card gave me almost two orders of magnitude of speedup! - Then in May 2011, I made the code switch automatically from single precision @@ -90,11 +127,11 @@ Over the last two decades, I kept coming back to this, enhancing it. - Around 2012 I added a significant optimization: avoiding fully calculating the Mandelbrot lake areas (black color) by drawing at 1/16 resolution and - skipping black areas in full res... + skipping black areas in the full resolution render. - I learned enough VHDL in 2018 to [code the algorithm inside a Spartan3 FPGA](https://www.youtube.com/watch?v=yFIbjiOWYFY). That was quite a - [learning exercise](https://github.com/ttsiodras/MandelbrotInVHDL). + [learning experience](https://github.com/ttsiodras/MandelbrotInVHDL). - In September 2020 I [ported a fixed-point arithmetic]( https://github.com/ttsiodras/Blue_Pill_Mandelbrot/) version of the @@ -104,7 +141,7 @@ Over the last two decades, I kept coming back to this, enhancing it. - In October 2020, I implemented what I understood to be the XaoS algorithm; that is, re-using pixels from the previous frame to optimally update the next one. Especially in deep-dives and large windows, this delivered - amazing speedups. + amazing speedups; between 2 and 3 orders of magnitude. - In July 2022, I optimised further with AVX instructions (+80% speed in CoreLoopDouble). I also ported the code to libSDL2, which stopped @@ -152,11 +189,15 @@ This used to be my main loop, right after I ported to SSE back in 2002: jz short nomore ; yes, we're done inc ecx - cmp ecx, 119 + cmp ecx, ITERATIONS jnz short loop1 -The new AVX code (inside CoreLoopDouble) follows the same motif; except -that it also includes periodicity checking, and uses the YMM registers. +The new AVX code (inside CoreLoopDoubleAVX) follows the same motif; +except that it also includes periodicity checking, and uses the YMM +registers. + +The comments should help you follow what's happening... Basically, +we compute 4 pixels at a time. XaoS ---- @@ -206,20 +247,24 @@ Then download the source code of libSDL and compile it as follows: $ make $ sudo make install -Finally, come back to this source folder, and compile: +Finally, come back to this source folder, and configure it like this: $ ./configure --host=x86_64-w64-mingw32 \ --with-sdl-prefix=/usr/local/packages/SDL-2.0.22-win32 \ --disable-sdltest $ make $ cp src/mandelSSE.exe \ - /usr/local/packages/SDL-2.0.22-win32/bin/SDL.dll \ + /usr/local/packages/SDL-2.0.22-win32/bin/SDL2.dll \ /some/path/for/Windows/ +You can also get the "ingredients" (DLLs for SDL2, OpenMP, libstd++, etc) +from the packaged release +[here](https://github.com/ttsiodras/MandelbrotSSE/releases/download/2.11/mandelSSE-win32-2.11.zip). + MISC ==== -Since it reports frame rate at the end, you can use this as a benchmark -for AVX instructions - it puts the AVX registers under quite a load. +Since it reports frame rate at the end (option `-b`), you can use this as +a benchmark for AVX instructions - it puts the AVX registers under quite a load. I've also coded a [CUDA version](https://www.thanassis.space/mandelcuda-1.0.tar.bz2), diff --git a/README.md b/README.md index f89b89b..4fe9fe4 100644 --- a/README.md +++ b/README.md @@ -9,7 +9,7 @@ COMPILE/INSTALL/RUN Windows ------- Windows users can download and run a pre-compiled Windows binary -[here](https://github.com/ttsiodras/MandelbrotSSE/releases/download/2.10/mandelSSE-win32-2.10.zip). +[here](https://github.com/ttsiodras/MandelbrotSSE/releases/download/2.11/mandelSSE-win32-2.11.zip). After decompressing, you can simply execute either one of the two .bat files. The 'autopilot' one zooms in a specific location, while the other @@ -22,27 +22,18 @@ cross-compilation instructions later in this document. For Linux/BSD/OSX users ----------------------- -Make sure you have libSDL2 installed - then... +Make sure you have libSDL2 installed. In Debian and its derivatives, +like Ubuntu, just `sudo apt install libsdl2-dev`. + +Then, build the code - with... $ ./configure $ make -You can then simply... +Usage +----- - $ src/mandelSSE -h - Usage: ./src/mandelSSE [-a|-m] [-h] [-b] [-f rate] [WIDTH HEIGHT] - Where: - -h Show this help message - -m Run in mouse-driven mode - -a Run in autopilot mode (default) - -b Run in benchmark mode (implies autopilot) - -v Force use of AVX - -s Force use of SSE - -d Force use of non-AVX, non-SSE code - -f fps Enforce upper bound of frames per second (default: 60) - (use 0 to run at full possible speed) - - If WIDTH and HEIGHT are not provided, they default to: 1024 768 +You can then try these: $ src/mandelSSE (Runs in autopilot in a 1024x768 window) @@ -51,8 +42,30 @@ You can then simply... (Runs in mouse-driven mode, in a 1280x720 window) (left-click zooms-in, right-click zooms out) -For ultimate speed, disable the frame limiter - by default, you are -limited to 60fps: +Option `-h` gives you additional information about how to control +the Mandelbrot zoomer: + + $ ./src/mandelSSE -h + + Usage: ./src/mandelSSE [-a|-m] [-h] [-b] [-v|-s|-d] [-i iter] [-p pct] [-f rate] [WIDTH HEIGHT] + Where: + -h Show this help message + -m Run in mouse-driven mode + -a Run in autopilot mode (default) + -b Run in benchmark mode (implies autopilot) + -v Force use of AVX + -s Force use of SSE + -d Force use of non-AVX, non-SSE code + -i iter The maximum number of iterations of the Mandelbrot loop (default: 2048) + -p pct The percentage of pixels computed per frame (default: 0.75) + (the rest are copied from the previous frame) + -f fps Enforce upper bound of frames per second (default: 60) + (use 0 to run at full possible speed) + + If WIDTH and HEIGHT are not provided, they default to: 1024 768 + +For ultimate rendering speed, you can disable the frame limiter (option `-f`). +By default, you are limited to 60fps: $ src/mandelSSE -m -f 0 1280 720 @@ -62,19 +75,43 @@ tell SDL you don't care about displaying the fractal: $ SDL_VIDEODRIVER=dummy src/mandelSSE -b 512 384 +Be mindful of your CPU's thermal throttling if you are benchmarking :-) +Note that you can force AVX (-v), SSE (-s) or dumb floating point (-d) +to see the speed impact made by our usage of special Intel instructions. + +You can also control: + +- the percentage of pixels actually computed per frame, with option `-p`. + If you e.g. pass `-p 0.5`, then 100-0.5 = 99.5% of the pixels will be + copied from the previous frame, and only 0.5% will be actually derived + through the Mandelbrot computations. Amazingly, this is enough for + a decent quality fly-through zoom in the fractal. + By default, this is set to 0.75. + +- the number of Mandelbrot iterations (option `-i`). By default this is + set to 2048 to allow for decent zoom levels, but if you want to see + insane speeds, set this to something low, like 128; and disable the + frame limiter; i.e. use `-f 0 -i 128`. + WHAT IS THIS, AGAIN? ==================== +Long story. + When I got my hands on an SSE enabled processor (an Athlon-XP, back in 2002), I wanted to try out SSE programming... And over the better part of a weekend, I created a simple implementation of a Mandelbrot zoomer in SSE assembly. I was glad to see that my code was almost 3 times faster than pure C. But that was just the beginning. + Over the last two decades, I kept coming back to this, enhancing it. - I learned how to use the GNU autotools, and made it work on most Intel - platforms: checked with Linux, Windows (MinGW) and OpenBSD. + platforms: checked with Linux, Windows (MinGW) and OpenBSD. + A decade later, I also tested it on Raspbian and Armbian; it works + fine in ARM machines as well. Autotools also allow me to cross-compile + for Windows (more on that below). - After getting acquainted with OpenMP, in Nov 2009 I added OpenMP #pragmas to run both the C and the SSE code in all cores/CPUs. The SSE code had to @@ -82,7 +119,7 @@ Over the last two decades, I kept coming back to this, enhancing it. was worth it. The resulting frame rate - on a tiny Atom 330 running Arch Linux - sped up from 58 to 160 frames per second. -- I then coded it in CUDA - a 75$ GPU card gave almost two orders of +- I then coded it in CUDA - a 75$ GPU card gave me almost two orders of magnitude of speedup! - Then in May 2011, I made the code switch automatically from single precision @@ -90,11 +127,11 @@ Over the last two decades, I kept coming back to this, enhancing it. - Around 2012 I added a significant optimization: avoiding fully calculating the Mandelbrot lake areas (black color) by drawing at 1/16 resolution and - skipping black areas in full res... + skipping black areas in the full resolution render. - I learned enough VHDL in 2018 to [code the algorithm inside a Spartan3 FPGA](https://www.youtube.com/watch?v=yFIbjiOWYFY). That was quite a - [learning exercise](https://github.com/ttsiodras/MandelbrotInVHDL). + [learning experience](https://github.com/ttsiodras/MandelbrotInVHDL). - In September 2020 I [ported a fixed-point arithmetic]( https://github.com/ttsiodras/Blue_Pill_Mandelbrot/) version of the @@ -104,7 +141,7 @@ Over the last two decades, I kept coming back to this, enhancing it. - In October 2020, I implemented what I understood to be the XaoS algorithm; that is, re-using pixels from the previous frame to optimally update the next one. Especially in deep-dives and large windows, this delivered - amazing speedups. + amazing speedups; between 2 and 3 orders of magnitude. - In July 2022, I optimised further with AVX instructions (+80% speed in CoreLoopDouble). I also ported the code to libSDL2, which stopped @@ -152,11 +189,15 @@ This used to be my main loop, right after I ported to SSE back in 2002: jz short nomore ; yes, we're done inc ecx - cmp ecx, 119 + cmp ecx, ITERATIONS jnz short loop1 -The new AVX code (inside CoreLoopDouble) follows the same motif; except -that it also includes periodicity checking, and uses the YMM registers. +The new AVX code (inside CoreLoopDoubleAVX) follows the same motif; +except that it also includes periodicity checking, and uses the YMM +registers. + +The comments should help you follow what's happening... Basically, +we compute 4 pixels at a time. XaoS ---- @@ -206,20 +247,24 @@ Then download the source code of libSDL and compile it as follows: $ make $ sudo make install -Finally, come back to this source folder, and compile: +Finally, come back to this source folder, and configure it like this: $ ./configure --host=x86_64-w64-mingw32 \ --with-sdl-prefix=/usr/local/packages/SDL-2.0.22-win32 \ --disable-sdltest $ make $ cp src/mandelSSE.exe \ - /usr/local/packages/SDL-2.0.22-win32/bin/SDL.dll \ + /usr/local/packages/SDL-2.0.22-win32/bin/SDL2.dll \ /some/path/for/Windows/ +You can also get the "ingredients" (DLLs for SDL2, OpenMP, libstd++, etc) +from the packaged release +[here](https://github.com/ttsiodras/MandelbrotSSE/releases/download/2.11/mandelSSE-win32-2.11.zip). + MISC ==== -Since it reports frame rate at the end, you can use this as a benchmark -for AVX instructions - it puts the AVX registers under quite a load. +Since it reports frame rate at the end (option `-b`), you can use this as +a benchmark for AVX instructions - it puts the AVX registers under quite a load. I've also coded a [CUDA version](https://www.thanassis.space/mandelcuda-1.0.tar.bz2), diff --git a/configure b/configure index e1b9946..a9f35ff 100755 --- a/configure +++ b/configure @@ -1,6 +1,6 @@ #! /bin/sh # Guess values for system-dependent variables and create Makefiles. -# Generated by GNU Autoconf 2.71 for mandelSSE 2.9. +# Generated by GNU Autoconf 2.71 for mandelSSE 2.11. # # Report bugs to . # @@ -611,8 +611,8 @@ MAKEFLAGS= # Identity of this package. PACKAGE_NAME='mandelSSE' PACKAGE_TARNAME='mandelsse' -PACKAGE_VERSION='2.9' -PACKAGE_STRING='mandelSSE 2.9' +PACKAGE_VERSION='2.11' +PACKAGE_STRING='mandelSSE 2.11' PACKAGE_BUGREPORT='ttsiodras@gmail.com' PACKAGE_URL='' @@ -1348,7 +1348,7 @@ if test "$ac_init_help" = "long"; then # Omit some internal or obsolete options to make the list less imposing. # This message is too long to be a string in the A/UX 3.1 sh. cat <<_ACEOF -\`configure' configures mandelSSE 2.9 to adapt to many kinds of systems. +\`configure' configures mandelSSE 2.11 to adapt to many kinds of systems. Usage: $0 [OPTION]... [VAR=VALUE]... @@ -1420,7 +1420,7 @@ fi if test -n "$ac_init_help"; then case $ac_init_help in - short | recursive ) echo "Configuration of mandelSSE 2.9:";; + short | recursive ) echo "Configuration of mandelSSE 2.11:";; esac cat <<\_ACEOF @@ -1533,7 +1533,7 @@ fi test -n "$ac_init_help" && exit $ac_status if $ac_init_version; then cat <<\_ACEOF -mandelSSE configure 2.9 +mandelSSE configure 2.11 generated by GNU Autoconf 2.71 Copyright (C) 2021 Free Software Foundation, Inc. @@ -1833,7 +1833,7 @@ cat >config.log <<_ACEOF This file contains any messages produced by compilers while running configure, to aid debugging if configure makes a mistake. -It was created by mandelSSE $as_me 2.9, which was +It was created by mandelSSE $as_me 2.11, which was generated by GNU Autoconf 2.71. Invocation command line was $ $0$ac_configure_args_raw @@ -3516,7 +3516,7 @@ fi # Define the identity of the package. PACKAGE='mandelsse' - VERSION='2.9' + VERSION='2.11' printf "%s\n" "#define PACKAGE \"$PACKAGE\"" >>confdefs.h @@ -7319,7 +7319,7 @@ cat >>$CONFIG_STATUS <<\_ACEOF || ac_write_fail=1 # report actual input values of CONFIG_FILES etc. instead of their # values after options handling. ac_log=" -This file was extended by mandelSSE $as_me 2.9, which was +This file was extended by mandelSSE $as_me 2.11, which was generated by GNU Autoconf 2.71. Invocation command line was CONFIG_FILES = $CONFIG_FILES @@ -7387,7 +7387,7 @@ ac_cs_config_escaped=`printf "%s\n" "$ac_cs_config" | sed "s/^ //; s/'/'\\\\\\\\ cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1 ac_cs_config='$ac_cs_config_escaped' ac_cs_version="\\ -mandelSSE config.status 2.9 +mandelSSE config.status 2.11 configured by $0, generated by GNU Autoconf 2.71, with options \\"\$ac_cs_config\\" diff --git a/configure.ac b/configure.ac index 4363132..554adb4 100644 --- a/configure.ac +++ b/configure.ac @@ -1,4 +1,4 @@ -AC_INIT([mandelSSE], [2.9], [ttsiodras@gmail.com]) +AC_INIT([mandelSSE], [2.11], [ttsiodras@gmail.com]) AC_LANG(C++) AC_CONFIG_HEADERS([src/config.h]) diff --git a/src/common.cc b/src/common.cc index f78adb4..37dbef2 100644 --- a/src/common.cc +++ b/src/common.cc @@ -18,13 +18,14 @@ void init256colorsMode(const char *windowTitle) panic("[x] Couldn't initialize SDL: %d\n", SDL_GetError()); atexit(SDL_Quit); - SDL_Window *window = SDL_CreateWindow( + window = SDL_CreateWindow( windowTitle, SDL_WINDOWPOS_UNDEFINED, SDL_WINDOWPOS_UNDEFINED, MAXX, MAXY, SDL_WINDOW_RESIZABLE); if (!window) panic("[x] Couldn't create window: %d", SDL_GetError()); + SDL_GetWindowSize(window, &window_width, &window_height); if (!minimum_ms_per_frame) renderer = SDL_CreateRenderer(window, -1, 0); @@ -112,8 +113,8 @@ int kbhit(int *xx, int *yy) { window_width = event.window.data1; window_height = event.window.data2; + return SDL_WINDOWEVENT; } - return SDL_WINDOWEVENT; break; case SDL_MOUSEBUTTONDOWN: if (event.button.button == SDL_BUTTON_LEFT) { diff --git a/src/common.h b/src/common.h index 90a9f23..f7b887c 100644 --- a/src/common.h +++ b/src/common.h @@ -5,9 +5,6 @@ #include -// Number of Mandelbrot iterations per pixel -#define ITERA 2048 - // Number of frames to zoom-in #define ZOOM_FRAMES 2500 @@ -20,11 +17,15 @@ // Fractal resolution GLOBAL long MAXX, MAXY; +// Number of Mandelbrot iterations per pixel +GLOBAL int iterations; + // 60 fps maximum => minimum milliseconds per frame = 1000/60 = 17 // Default value set in getopt parsing in main. GLOBAL unsigned minimum_ms_per_frame; // SDL global state +GLOBAL SDL_Window *window; GLOBAL SDL_Renderer *renderer; GLOBAL SDL_Surface *surface; GLOBAL int window_width, window_height; diff --git a/src/mandel.cc b/src/mandel.cc index 089e703..d3cf5e3 100644 --- a/src/mandel.cc +++ b/src/mandel.cc @@ -26,7 +26,7 @@ void usage(char *argv[]) { - printf("Usage: %s [-a|-m] [-h] [-b] [-v|-s|-d] [-f rate] [WIDTH HEIGHT]\n", argv[0]); + printf("Usage: %s [-a|-m] [-h] [-b] [-v|-s|-d] [-i iter] [-p pct] [-f rate] [WIDTH HEIGHT]\n", argv[0]); puts("Where:"); puts("\t-h\tShow this help message"); puts("\t-m\tRun in mouse-driven mode"); @@ -35,6 +35,9 @@ void usage(char *argv[]) puts("\t-v\tForce use of AVX"); puts("\t-s\tForce use of SSE"); puts("\t-d\tForce use of non-AVX, non-SSE code"); + puts("\t-i iter\tThe maximum number of iterations of the Mandelbrot loop (default: 2048)"); + puts("\t-p pct\tThe percentage of pixels computed per frame (default: 0.75)"); + puts("\t \t(the rest are copied from the previous frame)"); puts("\t-f fps\tEnforce upper bound of frames per second (default: 60)"); puts("\t \t(use 0 to run at full possible speed)\n"); puts("If WIDTH and HEIGHT are not provided, they default to: 1024 768"); @@ -46,8 +49,11 @@ int main(int argc, char *argv[]) int opt, fps = 60; bool autoPilot = true, benchmark = false; bool forceAVX = false, forceSSE = false, forceDefault = false; + double percent = 0.75; - while ((opt = getopt(argc, argv, "hmabvsdf:")) != -1) { + iterations = 2048; + + while ((opt = getopt(argc, argv, "hmabvsdi:p:f:")) != -1) { switch (opt) { case 'h': usage(argv); @@ -71,9 +77,19 @@ int main(int argc, char *argv[]) case 'd': forceDefault = true; break; + case 'i': + if (1 != sscanf(optarg, "%d", &iterations)) + panic("[x] Invalid number of iterations: '%s'", optarg); + break; case 'f': if (1 != sscanf(optarg, "%d", &fps)) - panic("[x] Not a valid frame rate: '%s'", optarg); + panic("[x] Invalid frame rate: '%s'", optarg); + break; + case 'p': + if (1 != sscanf(optarg, "%lf", &percent)) + panic("[x] Invalid percentage: '%s'", optarg); + if (0.05 > percent || 100.0 < percent) + panic("[x] Invalid percentage: '%s' (0.05 <= percent <= 100.0)", optarg); break; default: /* '?' */ usage(argv); @@ -117,7 +133,7 @@ int main(int argc, char *argv[]) puts("[-] e.g. you can use '-a' to enable autopilot."); else puts("[-] e.g. you can use '-m' to pilot with your mouse."); - printf("[-] Autopilot: %s\n", autoPilot ? "On" : "Off"); + printf("[-]\n[-] Autopilot: %s\n", autoPilot ? "On" : "Off"); printf("[-] Benchmark: %s\n", benchmark ? "On" : "Off"); printf("[-] Dimensions: %ld x %ld\n", MAXX, MAXY); if (!minimum_ms_per_frame) @@ -136,10 +152,11 @@ int main(int argc, char *argv[]) __builtin_cpu_supports("avx") ? CoreLoopDoubleAVX : __builtin_cpu_supports("sse") ? CoreLoopDoubleSSE : CoreLoopDoubleDefault; - printf("[-] Mode: %s\n", + printf("[-] Mode: %s\n", CoreLoopDouble == CoreLoopDoubleAVX ? "AVX" : CoreLoopDouble == CoreLoopDoubleSSE ? "SSE" : "non-AVX/non-SSE"); + printf("[-] Iterations: %d\n", iterations); #else CoreLoopDouble = CoreLoopDoubleDefault; printf("[-] Mode: %s\n", "non-AVX"); @@ -155,9 +172,9 @@ int main(int argc, char *argv[]) double fps_reported; if (autoPilot) { srand(time(NULL)); - fps_reported = autopilot(benchmark); + fps_reported = autopilot(percent, benchmark); } else - fps_reported = mousedriven(); + fps_reported = mousedriven(percent); SDL_Quit(); printf("[-] Frames/sec: %5.2f\n\n", fps_reported); fflush(stdout); diff --git a/src/sse.cc b/src/sse.cc index 40419a1..a0f900f 100644 --- a/src/sse.cc +++ b/src/sse.cc @@ -52,7 +52,7 @@ void CoreLoopDoubleDefault(double xcur, double ycur, double xstep, unsigned char CLEAR_ARRAY(yold); CLEAR_ARRAY(k1); - while (k < ITERA) { + while (k < iterations) { #define WORK_ON_SLOT(x) \ if (!k1[x]) { \ o1 = rez[x] * rez[x]; \ @@ -64,7 +64,7 @@ void CoreLoopDoubleDefault(double xcur, double ycur, double xstep, unsigned char if (o1 + o2 > 4) \ k1[x] = k; \ if (rez[x] == xold[x] && imz[x] == yold[x]) \ - k1[x] = ITERA; \ + k1[x] = iterations; \ } WORK_ON_SLOT(0) @@ -108,7 +108,7 @@ void CoreLoopDoubleSSE(double xcur, double ycur, double xstep, unsigned char **p // x' = x^2 - y^2 + a // y' = 2xy + b // - asm("mov %6,%%ecx\n\t" // ecx is ITERA + asm("mov %6,%%ecx\n\t" // ecx is iterations "xor %%ebx, %%ebx\n\t" // period = 0 "movapd %3,%%xmm5\n\t" // 4. 4. ; xmm5 "movapd %1,%%xmm6\n\t" // a0 a1 ; xmm6 @@ -141,10 +141,10 @@ void CoreLoopDoubleSSE(double xcur, double ycur, double xstep, unsigned char **p "or %%eax,%%eax\n\t" // have both pixels overflowed ? "je 2f\n\t" // yes, jump forward to label 2 (hence, 2f) and end the loop - "dec %%ecx\n\t" // otherwise, repeat the loop ITERA times... + "dec %%ecx\n\t" // otherwise, repeat the loop iterations times... "jnz 22f\n\t" // but before redoing the loop, first do periodicity checking - // We've done the loop ITERA times. + // We've done the loop 'iterations' times. // Set non-overflowed outputs to 0 (inside xmm3). Here's how: "movapd %%xmm2,%%xmm4\n\t" // xmm4 has all 1s in the non-overflowed pixels... "xorpd %5,%%xmm4\n\t" // xmm4 has all 1s in the overflowed pixels (toggled, via xoring with allbits) @@ -175,7 +175,7 @@ void CoreLoopDoubleSSE(double xcur, double ycur, double xstep, unsigned char **p "2:\n\t" "movapd %%xmm3,%0\n\t" :"=m"(outputs[0]) - :"m"(re[0]),"m"(im[0]),"m"(fours[0]),"m"(ones[0]),"m"(allbits[0]),"i"(ITERA) + :"m"(re[0]),"m"(im[0]),"m"(fours[0]),"m"(ones[0]),"m"(allbits[0]),"m"(iterations) :"%eax","%ebx","%ecx","xmm0","xmm1","xmm2","xmm3","xmm4","xmm5","xmm6","xmm7","xmm8","xmm9","xmm10","memory"); int tmp = (int)(outputs[0]); @@ -192,7 +192,7 @@ void CoreLoopDoubleSSE(double xcur, double ycur, double xstep, unsigned char **p // x' = x^2 - y^2 + a // y' = 2xy + b // - asm("mov %6,%%ecx\n\t" // ecx is ITERA + asm("mov %6,%%ecx\n\t" // ecx is iterations "xor %%ebx, %%ebx\n\t" // period = 0 "movapd %3,%%xmm5\n\t" // 4. 4. ; xmm5 "movapd %1,%%xmm6\n\t" // a0 a1 ; xmm6 @@ -225,10 +225,10 @@ void CoreLoopDoubleSSE(double xcur, double ycur, double xstep, unsigned char **p "or %%eax,%%eax\n\t" // have both pixels overflowed ? "je 2f\n\t" // yes, jump forward to label 2 (hence, 2f) and end the loop - "dec %%ecx\n\t" // otherwise, repeat the loop ITERA times... + "dec %%ecx\n\t" // otherwise, repeat the loop 'iterations' times... "jnz 22f\n\t" // but before redoing the loop, first do periodicity checking - // We've done the loop ITERA times. + // We've done the loop 'iterations' times. // Set non-overflowed outputs to 0 (inside xmm3). Here's how: "movapd %%xmm2,%%xmm4\n\t" // xmm4 has all 1s in the non-overflowed pixels... "xorpd %5,%%xmm4\n\t" // xmm4 has all 1s in the overflowed pixels (toggled, via xoring with allbits) @@ -259,7 +259,7 @@ void CoreLoopDoubleSSE(double xcur, double ycur, double xstep, unsigned char **p "2:\n\t" "movapd %%xmm3,%0\n\t" :"=m"(outputs[0]) - :"m"(re[0]),"m"(im[0]),"m"(fours[0]),"m"(ones[0]),"m"(allbits[0]),"i"(ITERA) + :"m"(re[0]),"m"(im[0]),"m"(fours[0]),"m"(ones[0]),"m"(allbits[0]),"m"(iterations) :"%eax","%ebx","%ecx","xmm0","xmm1","xmm2","xmm3","xmm4","xmm5","xmm6","xmm7","xmm8","xmm9","xmm10","memory"); tmp = (int)(outputs[0]); @@ -284,7 +284,7 @@ void CoreLoopDoubleAVX(double xcur, double ycur, double xstep, unsigned char **p // x' = x^2 - y^2 + a // y' = 2xy + b // - asm("mov %6,%%ecx\n\t" // ecx is ITERA + asm("mov %6,%%ecx\n\t" // ecx is iterations "xor %%ebx, %%ebx\n\t" // period = 0 "vmovapd %3,%%ymm5\n\t" // 4. 4. 4. 4. ; ymm5 "vmovapd %1,%%ymm6\n\t" // a0 a1 a2 a3 ; ymm6 @@ -317,10 +317,10 @@ void CoreLoopDoubleAVX(double xcur, double ycur, double xstep, unsigned char **p "or %%eax,%%eax\n\t" // have all 4 pixels overflowed ? "je 2f\n\t" // yes, jump forward to label 2 (hence, 2f) and end the loop // - "dec %%ecx\n\t" // otherwise, repeat the loop up to ITERA times... + "dec %%ecx\n\t" // otherwise, repeat the loop up to iterations times... "jnz 22f\n\t" // but before redoing the loop, first do periodicity checking - // We've done the loop ITERA times. + // We've done the loop iterations times. // Set non-overflowed outputs to 0 (inside ymm3). Here's how: "vmovapd %%ymm2,%%ymm4\n\t" // ymm4 has all 1s in the non-overflowed pixels... "vxorpd %%ymm12,%%ymm4,%%ymm4\n\t" // ymm4 has all 1s in the overflowed pixels (toggled, via xoring with allbits) @@ -350,7 +350,7 @@ void CoreLoopDoubleAVX(double xcur, double ycur, double xstep, unsigned char **p "vcvttpd2dq %%ymm3, %%xmm0\n\t" // Convert 4 doubles into 4 ints. "movapd %%xmm0,%0\n\t" :"=m"(outputs[0]) - :"m"(re[0]),"m"(im[0]),"m"(fours[0]),"m"(ones[0]),"m"(allbits[0]),"i"(ITERA) + :"m"(re[0]),"m"(im[0]),"m"(fours[0]),"m"(ones[0]),"m"(allbits[0]),"m"(iterations) :"%eax","%ebx","%ecx","%ymm0","%ymm1","%ymm2","%ymm3","%ymm4","%ymm5","%ymm6","%ymm7","%ymm8","%ymm9","%ymm10","%xmm0","%ymm11","%ymm12","memory"); *(*p)++ = outputs[0]; diff --git a/src/xaos.cc b/src/xaos.cc index 8a7d943..927831c 100644 --- a/src/xaos.cc +++ b/src/xaos.cc @@ -203,7 +203,7 @@ void mandel( } // Armed now with the xlookup and ylookup, we can render the frame. -#pragma omp parallel for private(xcur, j) +#pragma omp parallel for private(xcur, j) schedule(dynamic,1) for (int i=0; i 200) - // If we haven't moved for more than 200ms, - // go to sleep - no need to waste the CPU - SDL_Delay(minimum_ms_per_frame); - else if (SDL_GetTicks() - time_since_we_moved > 50) { - // if we haven't moved for 50 to 200ms, - // draw an accurate frame (0% reuse from previous frame) + if (!moved && (SDL_GetTicks() - time_since_we_moved > 200)) { + // if we haven't moved for more than 200ms + // then draw an accurate frame (0% reuse from previous frame) + if (!drawn_full) { + // But only do this ONCE. + drawn_full = true; + unsigned st = SDL_GetTicks(); + mandel(xld, yld, xru, yru, 100.0); + unsigned en = SDL_GetTicks(); + ticks += en-st; + frames++; + if (en - st < minimum_ms_per_frame) + SDL_Delay(minimum_ms_per_frame - en + st); + } else + SDL_Delay(minimum_ms_per_frame); + } else if (moved) { + // Otherwise, if we moved, draw a low-accuracy frame: reuse pixels + // from previous frame, and only compute 'percent' new ones + drawn_full = false; unsigned st = SDL_GetTicks(); - mandel(xld, yld, xru, yru, 100.0); + mandel(xld, yld, xru, yru, percent); unsigned en = SDL_GetTicks(); ticks += en-st; frames++; - } else { - // Otherwise draw a low-accuracy frame - // (reuse 99.0% of the pixels) - unsigned st = SDL_GetTicks(); - mandel(xld, yld, xru, yru, 1.0); - unsigned en = SDL_GetTicks(); - ticks += en-st; - frames++; - // Limit frame rate to 60 fps. + // Limit frame rate to desired one (default: 60 fps) if (en - st < minimum_ms_per_frame) SDL_Delay(minimum_ms_per_frame - en + st); + moved = false; } int result = kbhit(&x, &y); if (result == SDL_QUIT) break; else if (result == SDL_BUTTON_LEFT || result == SDL_BUTTON_RIGHT) { + moved = true; time_since_we_moved = SDL_GetTicks(); double ratiox = ((double)x)/window_width; double ratioy = ((double)y)/window_height; @@ -367,10 +378,13 @@ double mousedriven() yru -= direction*0.01*ratioy*yrange; } else if (result == SDL_WINDOWEVENT) { // force a redraw - e.g. the user just resized the window + moved = true; time_since_we_moved = SDL_GetTicks(); + SDL_GetWindowSize(window, &window_width, &window_height); } } // Inform point reached, for potential autopilot target + printf("[-] Rendered %d frames.\n", frames); return ((double)frames)*1000.0/ticks; } diff --git a/src/xaos.h b/src/xaos.h index ea2ffa7..6f59410 100644 --- a/src/xaos.h +++ b/src/xaos.h @@ -1,8 +1,8 @@ #ifndef __MANDEL_XAOS_H__ #define __MANDEL_XAOS_H__ -double autopilot(bool benchmark); -double mousedriven(); +double autopilot(double percent, bool benchmark); +double mousedriven(double percent); void mandel( double xld, double yld, double xru, double yru, double percentageOfPixelsToRedraw);