Sir Happenlance and the Spear of Density»Blog
Miles

The game is finally officially done and has just been put up for sale on Steam:

https://store.steampowered.com/app/1663410/Happenlance/

I hope you enjoy the game! I will try to write up some technical postmortems soon.

Miles
I'm starting up on devlog videos again after a long hiatus. In this one, I show off all the features we built into the level editor, as well as some of the challenges of making editor tools for a game with extensive 2D parallax.

Phillip Trudeau-Tavara
TL;DR: You have to do a "Low Level Keyboard Hook" and then write the screenshot function yourself.

Hey there Happenlancers, this is a quick aside technical post about windowing & fullscreen - more devlogs are forthcoming, promise!

I recently read "Fullscreen Exclusive is a Lie" by Mason Remaley and was happy to discover that Happenlance already offers most of the functionality recommended in the conclusion of the article. It's a good read and I recommend you investigate this on your own project before jumping into this quick blog post.

In the article, Mason dives into the ways you can enable, and test for, "exclusive fullscreen" mode -- the only window display mode that bypasses Windows's compositor, or roughly, "allows screen tearing". As it turns out, most tests for exclusivity are wrong, but one correct way to test is to take a screenshot with the PrintScreen key. If the screenshot is wrong, then your game is successfully bypassing the Windows compositor. Thanks, Microsoft!
More importantly, it turns out that there's no foolproof way to enable exclusive fullscreen at all on the average player's setup (Windows 10, one monitor, NVIDIA graphics card). So, any code to try and enable it will be a best-effort attempt (namely setting the window style to WS_POPUP).

SDL2, the library Happenlance uses for windowing, does indeed perform this best-effort attempt, so we were automatically getting exclusive fullscreen on our test machines. However, as a result, we noticed that PrintScreen yielded wrong screenshots (without knowing why), and so I ended up writing code to "make PrintScreen work", whatever it took.

Coincidentally, in the final paragraph, Mason has to weigh the tradeoff between allowing exclusivity for some players vs. letting those players actually use the PrintScreen key to take proper screenshots. Since I accidentally solved (sort of?) this tradeoff, I thought I'd at least propose the solution and post the code that I use currently to implement that solution.

In short, it installs a "low-level keyboard hook", normally used to disable the Windows key (as Mason points out), and catches the PrintScreen key before it does anything. Then, it just flags the game to take a screenshot once it's done rendering a frame and write it to disk, and opts not to propagate the PrintScreen keypress down the line to other programs -- every other program on your computer effectively didn't know there was any key pressed at all. Powerful!

Of course, this took some elbow grease - low level key hooks are known to cause full-machine stalls when debugging, since the keyboard's hook is halted by its own debugger. Plus if you don't run the hook on a separate thread, then lag in your game will stall that hook until the next event pump, which means your entire computer will get delayed key presses -- unacceptable in a commercial game. Lastly, ignoring key releases (which you need to do sometimes) is risky since hooking mid-press will make other apps think the key is still pressed, so keys become "sticky" in a bad way.

What I did was to spawn a simple thread that installs the hook and pumps messages in a green way (use GetMessage, not PeekMessage!) and also opportunistically unhooks itself and exits thread if it detects that a debugger is running. I also made sure to prevent the "sticky-keys" issue by copying some of the SDL2 source code to address it. Lastly, in order to early-out the hook procedure when the window is not focused, I needed to make the main thread update a global variable every frame (technically should be an atomic on less forgiving platforms than x64).

Yuck!!

It's certainly "fun" (a.k.a gross) having to deal with Windows in this way, but the lack of frame hitching from the PrintScreen capture (which again, gives you wrong screenshots anyway) was a win in my personal opinion. Plus, I took the opportunity to capture and ignore the Windows key as well, because screw the Windows key.

Here's the key hook code, verbatim:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
static bool takeScreenshot = false;
static bool hasInputFocus = false;
#ifdef _WIN32
static u8 preHookKeyState[256]; //the docs explicitly say 256
static bool restartHook = false;
static void snapshotHook() {
    GetKeyboardState(preHookKeyState);
    restartHook = true;
}
static LRESULT CALLBACK LowLevelKeyboardProc(int nCode, WPARAM wParam, LPARAM lParam) {
    //you don't want to be hooking the keyboard while debugging...
    if (IsDebuggerPresent() || nCode < 0 || nCode != HC_ACTION) return CallNextHookEx(nullptr, nCode, wParam, lParam);
    if (!hasInputFocus) { //can't use GetFocus() - that's only for the current thread
        snapshotHook();
        return CallNextHookEx(nullptr, nCode, wParam, lParam);
    }

    bool eat = false;
    auto p = (PKBDLLHOOKSTRUCT) lParam;
    switch (wParam) {
        case WM_KEYDOWN:  
        case WM_SYSKEYDOWN:
        case WM_KEYUP:    
        case WM_SYSKEYUP: {
            if (p->vkCode == VK_SNAPSHOT) {
                eat = true;
                //you have to track when the key went up because this receives key-repeats
                static bool lastWasUp = true;
                if (restartHook) lastWasUp = !preHookKeyState[p->vkCode];
                if (wParam == WM_KEYDOWN || wParam == WM_SYSKEYDOWN) {
                    if (lastWasUp) takeScreenshot = true;
                    lastWasUp = false;
                } else if (wParam == WM_KEYUP || wParam == WM_SYSKEYUP) {
                    lastWasUp = true;
                }
            } else if (p->vkCode == VK_LWIN || p->vkCode == VK_RWIN) {
                eat = true;
            }
        } break;
    }
    restartHook = false;

    if (wParam == WM_KEYUP || wParam == WM_SYSKEYUP) {
        //Pass on key releases from pre-hook state (big brain move from SDL source):
        //"If the key was down prior to our hook being installed, allow the
        // key up message to pass normally the first time. This ensures other
        // windows have a consistent view of the key state, and avoids keys
        // being stuck down in those windows if they are down when the grab
        // happens and raised while grabbed."
        if (p->vkCode <= 0xFF && preHookKeyState[p->vkCode]) {
            // print_log("Passing on snapshotted key!\n");
            preHookKeyState[p->vkCode] = 0;
            eat = false;
        }
    }
    return eat? 1 : CallNextHookEx(nullptr, nCode, wParam, lParam);
}
#endif



Installing the hook at program startup (written in C++; converting to C is an exercise for the reader):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
#ifdef _WIN32
        //Install the low-level keyboard hook
        if (!IsDebuggerPresent()) {
            auto hookThread = [](LPVOID) -> DWORD {
                snapshotHook();
                auto hook = SetWindowsHookEx(WH_KEYBOARD_LL, LowLevelKeyboardProc, GetModuleHandle(nullptr), 0);
                //"the thread that installed the hook must have a message loop"
                //https://docs.microsoft.com/en-us/windows/win32/winmsg/lowlevelkeyboardproc
                MSG m;
                while (GetMessage(&m, NULL, 0, 0) > 0) {
                    if (IsDebuggerPresent()) { UnhookWindowsHookEx(hook); break; }
                    TranslateMessage(&m); DispatchMessage(&m);
                }
                return 0;
            };
            CreateThread(NULL, 0, hookThread, NULL, 0, NULL); //ignore if it fails, oh well.
        }
#endif

Miles
This one's a doozy. I talk about new editor features, sprite serialization, new enemy behavior, game AI and game design, arrow physics, gif rendering, fast image resizing, and a new method for implementing a foolproof undo system.



As promised at the end of the video, here is the image resizing code as it currently exists:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
#include <tmmintrin.h> //requires SSSE3 for _mm_hadd_epi32 and _mm_shuffle_epi8
#include <stdint.h>

typedef struct Pixel { uint8_t r, g, b, a; } Color;
struct Image { Pixel * p; int w, h; inline Pixel * operator[] (int y) { return &p[y * w]; } };

Image downsize2x(Image in) {
    int w = in.w / 2, h = in.h / 2;
    Image out = { (Pixel *) malloc(w * h * sizeof(Pixel)), w, h };
    for (int y = 0; y < h; ++y) {
        int x = 0;
        for (; x < w - 3; x += 4) {
            __m128i p1 = _mm_loadu_si128((__m128i *) &in.p[(y * 2 + 0) * in.w + (x * 2 + 0)]);
            __m128i p2 = _mm_loadu_si128((__m128i *) &in.p[(y * 2 + 0) * in.w + (x * 2 + 4)]);
            __m128i p3 = _mm_loadu_si128((__m128i *) &in.p[(y * 2 + 1) * in.w + (x * 2 + 0)]);
            __m128i p4 = _mm_loadu_si128((__m128i *) &in.p[(y * 2 + 1) * in.w + (x * 2 + 4)]);
            __m128i i1 = _mm_avg_epu8(p1, p3), i2 = _mm_avg_epu8(p2, p4);
            __m128i s1 = _mm_and_si128(_mm_srli_epi32(i1, 1), _mm_set1_epi32(0x7F7F7F7F));
            __m128i s2 = _mm_and_si128(_mm_srli_epi32(i2, 1), _mm_set1_epi32(0x7F7F7F7F));
            __m128i d = _mm_hadd_epi32(s1, s2);
            _mm_storeu_si128((__m128i *) &out.p[y * w + x], d);
        }
        for (; x < w; ++x) {
            Pixel p1 = in.p[(y * 2 + 0) * in.w + (x * 2 + 0)];
            Pixel p2 = in.p[(y * 2 + 0) * in.w + (x * 2 + 1)];
            Pixel p3 = in.p[(y * 2 + 1) * in.w + (x * 2 + 0)];
            Pixel p4 = in.p[(y * 2 + 1) * in.w + (x * 2 + 1)];
            out.p[y * w + x] = {
                (uint8_t) ((p1.r + p2.r + p3.r + p4.r) >> 2),
                (uint8_t) ((p1.g + p2.g + p3.g + p4.g) >> 2),
                (uint8_t) ((p1.b + p2.b + p3.b + p4.b) >> 2),
                (uint8_t) ((p1.a + p2.a + p3.a + p4.a) >> 2),
            };
        }
    }
    return out;
}

Image downsize4x(Image in) {
    int w = in.w / 4, h = in.h / 4;
    Image out = { (Pixel *) malloc(w * h * sizeof(Pixel)), w, h };
    for (int y = 0; y < h; ++y) {
        for (int x = 0; x < w; ++x) {
            __m128i p1 = _mm_load_si128((__m128i *) &in.p[(y * 4 + 0) * in.w + (x * 4)]);
            __m128i p2 = _mm_load_si128((__m128i *) &in.p[(y * 4 + 1) * in.w + (x * 4)]);
            __m128i p3 = _mm_load_si128((__m128i *) &in.p[(y * 4 + 2) * in.w + (x * 4)]);
            __m128i p4 = _mm_load_si128((__m128i *) &in.p[(y * 4 + 3) * in.w + (x * 4)]);
            __m128i r = _mm_avg_epu8(_mm_avg_epu8(p1, p2), _mm_avg_epu8(p3, p4));
            __m128i m = _mm_and_si128(_mm_srli_epi32(r, 2), _mm_set1_epi32(0x3F3F3F3F));
            __m128i h1 = _mm_hadd_epi32(m, m);
            __m128i h2 = _mm_hadd_epi32(h1, h1);
            _mm_storeu_si32(&out.p[y * w + x], h2);
        }
    }
    return out;
}

Image downsize8x(Image in) {
    int w = in.w / 8, h = in.h / 8;
    Image out = { (Pixel *) malloc(w * h * sizeof(Pixel)), w, h };
    for (int y = 0; y < h; ++y) {
        for (int x = 0; x < w; ++x) {
            __m128i p01 = _mm_load_si128((__m128i *) &in.p[(y * 8 + 0) * in.w + (x * 8 + 0)]);
            __m128i p02 = _mm_load_si128((__m128i *) &in.p[(y * 8 + 0) * in.w + (x * 8 + 4)]);
            __m128i p03 = _mm_load_si128((__m128i *) &in.p[(y * 8 + 1) * in.w + (x * 8 + 0)]);
            __m128i p04 = _mm_load_si128((__m128i *) &in.p[(y * 8 + 1) * in.w + (x * 8 + 4)]);
            __m128i p05 = _mm_load_si128((__m128i *) &in.p[(y * 8 + 2) * in.w + (x * 8 + 0)]);
            __m128i p06 = _mm_load_si128((__m128i *) &in.p[(y * 8 + 2) * in.w + (x * 8 + 4)]);
            __m128i p07 = _mm_load_si128((__m128i *) &in.p[(y * 8 + 3) * in.w + (x * 8 + 0)]);
            __m128i p08 = _mm_load_si128((__m128i *) &in.p[(y * 8 + 3) * in.w + (x * 8 + 4)]);
            __m128i p09 = _mm_load_si128((__m128i *) &in.p[(y * 8 + 4) * in.w + (x * 8 + 0)]);
            __m128i p10 = _mm_load_si128((__m128i *) &in.p[(y * 8 + 4) * in.w + (x * 8 + 4)]);
            __m128i p11 = _mm_load_si128((__m128i *) &in.p[(y * 8 + 5) * in.w + (x * 8 + 0)]);
            __m128i p12 = _mm_load_si128((__m128i *) &in.p[(y * 8 + 5) * in.w + (x * 8 + 4)]);
            __m128i p13 = _mm_load_si128((__m128i *) &in.p[(y * 8 + 6) * in.w + (x * 8 + 0)]);
            __m128i p14 = _mm_load_si128((__m128i *) &in.p[(y * 8 + 6) * in.w + (x * 8 + 4)]);
            __m128i p15 = _mm_load_si128((__m128i *) &in.p[(y * 8 + 7) * in.w + (x * 8 + 0)]);
            __m128i p16 = _mm_load_si128((__m128i *) &in.p[(y * 8 + 7) * in.w + (x * 8 + 4)]);
            __m128i r1 = _mm_avg_epu8(_mm_avg_epu8(p01, p02), _mm_avg_epu8(p03, p04));
            __m128i r2 = _mm_avg_epu8(_mm_avg_epu8(p05, p06), _mm_avg_epu8(p07, p08));
            __m128i r3 = _mm_avg_epu8(_mm_avg_epu8(p09, p10), _mm_avg_epu8(p11, p12));
            __m128i r4 = _mm_avg_epu8(_mm_avg_epu8(p13, p14), _mm_avg_epu8(p15, p16));
            __m128i r = _mm_avg_epu8(_mm_avg_epu8(r1, r2), _mm_avg_epu8(r3, r4));
            __m128i m = _mm_and_si128(_mm_srli_epi32(r, 2), _mm_set1_epi32(0x3F3F3F3F));
            __m128i h1 = _mm_hadd_epi32(m, m);
            __m128i h2 = _mm_hadd_epi32(h1, h1);
            _mm_storeu_si32(&out.p[y * w + x], h2);
        }
    }
    return out;
}

Image bilinear(Image in, float scale) {
    int w = in.w * scale, h = in.h * scale;
    Image out = { (Pixel *) malloc(w * h * sizeof(Pixel)), w, h };
    float xs = in.w / (float) w, ys = in.h / (float) h;
    for (int y = 0; y < h; ++y) {
        for (int x = 0; x < w; ++x) {
            float x0 = (x + 0.5f) * xs, y0 = (y + 0.5f) * ys;
            int x1 = x0, y1 = y0;
            float xf = x0 - x1, yf = y0 - y1;

            Pixel p1 = in.p[(y1 + 0) * in.w + (x1 + 0)];
            Pixel p2 = in.p[(y1 + 0) * in.w + (x1 + 1)];
            Pixel p3 = in.p[(y1 + 1) * in.w + (x1 + 0)];
            Pixel p4 = in.p[(y1 + 1) * in.w + (x1 + 1)];

            __m128 v1 = _mm_cvtepi32_ps(_mm_set_epi32(p1.a, p1.b, p1.g, p1.r));
            __m128 v2 = _mm_cvtepi32_ps(_mm_set_epi32(p2.a, p2.b, p2.g, p2.r));
            __m128 v3 = _mm_cvtepi32_ps(_mm_set_epi32(p3.a, p3.b, p3.g, p3.r));
            __m128 v4 = _mm_cvtepi32_ps(_mm_set_epi32(p4.a, p4.b, p4.g, p4.r));

            __m128 r1 = _mm_add_ps(_mm_mul_ps(v1, _mm_set1_ps(1 - xf)), _mm_mul_ps(v2, _mm_set1_ps(xf)));
            __m128 r2 = _mm_add_ps(_mm_mul_ps(v3, _mm_set1_ps(1 - xf)), _mm_mul_ps(v4, _mm_set1_ps(xf)));
            __m128 c = _mm_add_ps(_mm_mul_ps(r1, _mm_set1_ps(1 - yf)), _mm_mul_ps(r2, _mm_set1_ps(yf)));

            __m128i i = _mm_cvtps_epi32(c);
            __m128i p = _mm_shuffle_epi8(i, _mm_set_epi8(-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 12, 8, 4, 0));
            _mm_storeu_si32(&out.p[y * w + x], p);
        }
    }
    return out;
}
Miles
In this devlog I show off the weekly progress, including level serialization and new editor tools. I also talk at some length about what kind of thought process goes into deciding how to organize and structure the data in a program, and specifically how that applies in the context of this game. I'm sure it will be familiar topic to most people in HMN. And finally I gripe for a bit about some C++ related problems we've been running into.