Bot Fighting 103. Code Integrity Checks, Code Scrambling
[[This is Chapter 29(e) from “beta” Volume VIII of the upcoming book "Development&Deployment of Multiplayer Online Games", which is currently being beta-tested. Beta-testing is intended to improve the quality of the book, and provides free e-copy of the "release" book to those who help with improving; for further details see "Book Beta Testing". All the content published during Beta Testing, is subject to change before the book is published.
To navigate through the "1st beta" of the book, you may want to use Development&Deployment of MOG: Table of Contents.]]
The next section in this epic chapter on Bot Fighting is related to techniques which mess with our own binary code. In general, as we’re messing with our code, this class of techniques tends to be less in the face of the attacker than outright asking OS to do things, and as such – tends to survive attacks a bit better. Still, the devil (as always) is in details, and unless we integrate protection with the rest of our code really well – we’ll end up in a protection which can be disabled in one single point (and as a result, won’t last long).
Code Integrity Checks
One simplistic way of seeing that our executable is still intact, would be to:
- Have a function checkIntegrity() which calculates checksum of our code in-memory (more specifically – of our code segment in RAM) – and compares it to some-value-stored-in-data-segment (let’s name it codeChecksum); for the time being – let’s fill codeChecksum with 0xFFs.1
- Compile our PE executable with checkIntegrity() and codeChecksum in it.
- Calculate checksum value (let’s name it precompiled_checksum_value) of our code segment in compiled PE executable (for Windows, code segment usually looks as .text section in PE .exe); of course, the algorithm of checksum calculation should be exactly the same as the algorithm in checkIntegrity().
- Patch our .exe so that codeChecksum corresponds to the precompiled_checksum_value
- To do it, we can use linker map.
- We will be patching .data or .rdata section, so there won’t be any issues with circular dependencies.
- Bingo! Each time we run checkIntegrity(), we can be sure that our code is not modified.
This approach might even work; however – it has a very serious flaw, which decreases its anti-bot-writer efficiency very significantly:
- We have a function CalcFuncCrc(),2 which takes two pointers (beginning and end of the function).
- We have DebuggeeFunction(), which integrity we want to check.
- We have a fake and empty DebuggeeFunctionEnd(), which resides right after DebuggeeFunction() in code.
- NB: here, we’re essentially relying on compiler and linker to put the functions in exactly the same order to .text section, and on each function being one monolithic block of binary code.
- As [Kulchytskyy] notes, for our assumptions to stand, we have to use /INCREMENTAL:NO linker option (otherwise layout of the generated code will look differently)
- Actually, modern compilers can easily generate functions which are not represented by a monolithic block; still – it won’t cause our code to fail (checks for such a checksum will still work 100% of the time, it is just that this checksum will cover a-bit-different-piece-of-code from what-we-might-intuitively-expect, which is usually good enough).
- Whenever we want to check integrity of our DebuggeeFunction() (which integrity can be violated as a part of hack or by debugger inserting a breakpoint) – we call CalcFuncCrc(), feeding it pointers to DebuggeeFunction() and DebuggeeFunctionEnd() as parameters.
- Then, we compare the result of the CalcFuncCrc() with a pre-calculated global constant.
- To prevent inlining of the DebuggeeFunction(), [Kulchytskyy] uses #pragma auto_inline(off)
- From what I’ve seen, __declspec(noinline) is a more reliable way of achieving it; however – make sure to read on randomized-checks technique below, which allows to avoid disabling inlining most of the time.
- Bingo! We’ve got an ability to have lots of different per-function checksums in our code.
- NB: here, we’re essentially relying on compiler and linker to put the functions in exactly the same order to .text section, and on each function being one monolithic block of binary code.
IF we put lots of such checks (of different functions) into our Client – it will make life of the attacker significantly more difficult. Still, there is a significant room for improvement over the technique shown in [Kulchytskyy]. In particular:
- In [Kulchytskyy], CalcFuncCrc() function (unless it is very small or force-inlined) can be compiled as non-inlined regular function; this, in turn, will allow the attacker, after identifying this function once, to identify all the places of interest (and this is a serious weakness).
- To deal with it, we can either force-inline CalcFuncCrc(),
- Or (even better) we can (and SHOULD) have different functions to calculate function checksums (ideally – these checksum functions should be generated per-function and per-build automatically).
- With [Kulchytskyy], we have to specify manually which of the functions we want to protect – and to jump through the hoops to enable this protection; as a rule of thumb – this is going to lead to too-few-functions-to-be-protected.
- To deal with it, we can (and SHOULD) make our code generation completely automated, and without any changes to source code either.
- With [Kulchytskyy], we have to force no-inlining of the functions of interest, which has significant performance ramifications. It is possible to avoid it.
- With [Kulchytskyy], the result of CalcFuncCrc() is simply checked, and a certain action is taken; as such – this action can be easily disabled (and also such pieces of code are at risk of being easily identified). We can (and SHOULD) use checks which are orders of magnitude less obvious.
To mitigate issues discussed above, the following approach will be a further significant improvement over [Kulchytskyy]:
- “we’re merely considering the whole code segment as a piece of data-which-should-be-constant, and are performing random checks over it.We say that by default, we DON’T really care about protecting specific functions. Instead, we’re merely considering the whole code segment as a piece of data-which-should-be-constant, and are performing random checks over it.
- As such, we’re not even required to calculate checksums over contiguous chunks of bytes; our integrity checks can be anything from a single-DWORD-check, to a check which runs a pseudo-random generator (PRNG), uses PRNG-generated numbers as addresses within code segment, and calculates some kind of obscure (and automatically generated on each build) checksum over data-which-resides-at-those-PRNG-based-addresses.
- That being said, we SHOULD be still able to specifically check those functions of particular interest; these functions can be specified externally to the code – and (unless they’re inlined) can be found in “linker map” – and then used to generate yet another check function.
- In addition, we DON’T want to make our reaction to the checks too obvious – that’s why we’ll use the results of our checksum calculations as an input to our obfuscation routines (more on them below). In a very simplistic example – if we’re passing a number from one part of the program to another one, we can XOR it with a constant (and XOR again with the same constant on receipt). To integrate it with our integrity checks – we can XOR the number with the result of our checksum calculation on one side, and XOR with the pre-calculated constant value on the other side. If (as it should be) our code segment is intact – everything will work like a charm; otherwise – the program will break in an extremely non-obvious manner (and without any obvious hint on where-the-fault-has-originated).
After we do all these things, our development process will look as follows:
- We DON’T make any integrity-check-related changes to our regular manually-written source code <phew />
- We still DO use declarative obfuscation (as discussed in [[TODO]] section below), replacing types with their obf<type> counterparts wherever we feel like it.
- As a part of our build process, we:
- Decide how many of the integrity checks (and of which complexity) we want to generate
- We generate source-level code for random checking routines (those ranging from a single read from the code segment to an obscure-checksum-over-addresses-generated-by-a-PRNG) – and integrate them into our automatically-generated obfuscation code (see [[TODO]] section below for detailed discussion of generated obfuscation).
- Usually, obfuscation can be separated into two different routines – obfuscation and deobfuscation. We’ll use calculated-checksum-from-code-segment for obfuscation, and constant-read-from-data-segment for deobfuscation (or vice versa). For the time being, as we don’t know the value of the checksums yet – all the checksum-constants-in-data-segment will be filled with 0xFF bytes.3
- To calculate checksums at runtime, bytes from code segment can be read in several different ways:
- If we remove relocation tables for our .exe (which as of now, I tend to lean towards) – then we can simply read data from fixed pointer values.
- If relocation is possible – at least in theory, we should find out where the code segment is currently loaded (can be calculated as simple loaded_offset=pointer_to_function-position_of_the_same_function_in_text_segment), and use loaded_offset as a base for all future accesses.
- As a yet another way of reading our code segment – system calls such as ReadProcessMemory() and/or Toolhelp32ReadProcessMemory() can be used. BTW, these functions can be used to read either our own process, or a different one
- One particularly interesting case is cross-reading (and cross-verification) by parent process and child process (especially if parent debugs child as a part of self-debugging).
- We store algorithms of these randomly-generated checksum-calculating routines for future use.
- We compile and link our executable (including those generated random checking routines and generated obfuscation).
- Using those algorithms-stored-above, we calculate checksums based on real contents of the code segment (=”.text section in .exe”).
- Using linker map, we identify addresses of the checksum-constants-in-data-segment – and patch them with the values calculated in a previous step (of course, we’ll have to fix PE checksums afterwards too).
- “Bingo! We’ve got an executable, which automagically performs TONS of integrity checks, which checks are spread all over the executable, and are extremely non-obvious too.Bingo! We’ve got an executable, which automagically performs TONS of integrity checks, which checks are spread all over the executable, and are extremely non-obvious too.
When speaking about 3rd-party protection from reverse engineering, probably the most-popular-selling-point is so-called “code encryption”. Actually, at least 99% of the time4 the key of such “encryption” is contained within the same executable, i.e. easily available to the attacker. And encryption-with-the-key-available-to-attacker usually doesn’t qualify as “encryption”,5 but is merely a rather sophisticated scrambling (with or without compression/packing).
All-at-Once Code Descrambling
Whatever the name is – this Code Scrambling is breakable by design; however, as all the other protection techniques described to date aren’t any better – among other obfuscation techniques there is certainly a potential space for Code Scrambling too.
Simple commercial protections tend to take the whole executable, “encrypt” (actually – scramble) all the code in it, and prepend descrambler to the scrambled code. Then, on the start of the scrambled executable, it is descrambler which gets launched; descrambler code will descramble the code to RAM, and will pass execution to the descrambled code, that’s it. This naïve approach doesn’t fare well against attackers for one Big Fat Reason™:
Specifically: after the code is descrambled – it resides in memory as a whole, and memory can be dumped relatively easily. While “Anti-Dumping” techniques do exist (see, for example, [Ferrie], [Assar], and [Assar2]) – most of them are inherently targeting flaws in naïve dumpers, so at best they provide only temporary protection.
Another drawback of any scrambling techniques is that they usually contain something-which-looks-as-an-encryption-loop, and encryption-loops are known to be detected by antivirus products heuristics, and at least in theory may get your Client executable flagged; while I didn’t see such occurrences when the only-encryption-loop (without other suspicious stuff, more on it in [[TODO]] section below) got signed Client executable flagged by an antivirus – you’d better start testing your game Client under different AV products (to see whether your protection makes your Client flagged as a potential malware) ASAP, and also consider lowering entropy of your scrambled code (more on it in [[TODO]] section below).
On-Demand Code Descrambling“One notable exception (i.e. an anti-dumping technique which works) actually tries to remove that single-point-where-protection-can-be-disabledOne notable exception (i.e. an anti-dumping technique which works) actually tries to remove that single-point-where-protection-can-be-disabled; the idea (described in [Tully]) is to have on-demand descrambling: originally, most of the code pages are effectively empty (or scrambled), and marked as “Guard Pages” (actually, any kind of page-level protection-causing-CPU-exception will do). When CPU attempts to access one of such pages, an exception is generated, which is then caught by descrambler and the corresponding code page is descrambled; after descrambling – execution of the original program is resumed.
To make things a bit further worse to the attacker – we SHOULD use different descrambling routines for different on-demand regions (and descrambling routines should be force-inlined too to prevent them being easily identified).
Attacking properly-implemented on-demand descrambling, while certainly possible, is much more complicated than simplistic dumping. Still, we have to observe that it still has a single point where the protection can be disabled. Specifically, as we have to rely on handling CPU-level exceptions, we have to register our exception handler, and such registered exception handlers (at least those which I seen), are rather easy to find in executable <sad-face />.
BTW, at least in theory it is possible to have section-level on-demand descrambling as an alternative to page-level one described above. If we manage to force linker to avoid merging of .text sections coming from different object files – then each of the sections will become easily descrambleable-on-demand; this would allow us to descramble our stuff logically (=”before we want to access it”), which in turn would eliminate the exposure of the exception handler discussed above.
Technically, such section-level on-demand descrambling seems to be doable using __declspec(code_seg(“segname”)) to separate code into different PE sections (NB: note that for serious C++ code compiled by MSVC, #pragma code_seg will still leave lots of stuff in .text segment, so we have to use much more cumbersome __declspec(code_seg(“segname”)) <sad-face />). And as soon as the code is separated into different segments – they can be descrambled on-demand rather easily; the only remaining caveat is how to identify when we need a certain part code to run – and this is game-dependent and usually not-really-obvious.
A word of caution about the section-level descrambling: while it improves over the page-based one in terms of not-having-obvious-exception-handler – it loses on separating-code-into-logical-blocks (and separating things in an obvious manner is never a good thing). At the moment, I feel that with proper separation into sections, advantages of section-level on-demand descrambling would outweigh issues-it-creates, but honestly, it is just a rather wild guess on my part. And as section-level stuff is significantly more complicated for developers (we’d have to identify dependencies in the code manually – or to write a sophisticated tool doing it for us), most likely I’d still prefer to start with page-based on-demand descrambling (moving to section-level one if the need arises).
From On-Demand Descrambling to On-Demand Keys
As noted above, on-demand scrambling is not too bad (especially if combined with different descrambling routines for different pieces of code), however, for MOGs it can be improved further <smile />. For MOGs, we can have only decryption routines on the Client-Side, and obtain keys for decryption from the Server-Side when the need arises. The idea goes along the following lines:
- We DO have on-demand descrambling (in any of the ways described above), but unlike usual descrambling (with a key hardcoded into the Client), we DON’T store the key in the Client.
- “whenever we need certain code to be descrambled, Client requests appropriate decryption key from the Server-SideInstead, whenever we need certain code to be descrambled, Client requests appropriate decryption key from the Server-Side, and reply arrives as “to get whatever-you-requested, use portion of the encrypted data starting from offset O, with length L, and decrypt it with key K”).
- To make things more interesting, Client-Side SHOULD include not only what it wants to descramble, but also why it wants to descramble it (for example, for guard-page-based descrambling Client could send to the Server-Side address-where-guard-exception-has-occurred – or even all the fields of both EXCEPTION_RECORD and CONTEXT structures of the exception).
If you can manage the complexity of handling this without key-requesting latencies killing your game,6 this approach has very significant advantages even over simpler descrambling-on-demand, namely:
- While the key is no longer hardcoded within the Client-Side executable itself, after the key is released to the Client-Side, all we have is still “mere scrambling”. On the other hand, until the key is released, it is “real encryption”.
- As a result, in this schema (and unlike previously-discussed ones) we SHOULD use real encryption algorithm (and not obfuscation/scrambling); the
- OTOH, in addition to encryption – we still MAY (and IMO SHOULD) use some obfuscation/scrambling, just to make things even more difficult to the attacker.
- More importantly, with this schema Server-Side “knows” which portions of code Client-Side attempts to descramble, and – armed with its own information about which-functions-can-be-legitimately-called-in-current-Client-Side-state (and which guard-page exceptions can be legitimately raised) – it has a potential to detect blatant hacking attempts pretty easily. Examples of potential detections include such scenarios as “hey, non-hacked Client cannot request descrambling of code page 0xABCD before the MOBA match is running”, or even “hey, there is a middle of instruction at address 0xAA00, which means that such an exception can’t legitimately happen”.
- Moreover, if we feel particularly devious, in response to such blatant hacking attempts, our Server-Side can return different (Offset,Length,Key) tuple, which descrambles into the code which is somewhat-similar-but-not-used-in-real-game, so that attacker will spend time hacking something-having-no-resemblance-to-the-really-important-stuff.
- Even more importantly for quite a few games out there, we can avoid revealing our code at all until player’s account is valuable; effectively – it allows to increase attack costs for the player.
- The most obvious example of it is not revealing payment-enabled parts of code until the player has really paid us (i.e. Server-Side won’t release the key to certain parts of the code to non-paying players). And if we identify a hack after the payment has successfully gone through – we can ban not only the hacker, but also his payment method (such as credit card, PayPal account, etc. etc.). This, along with a real-world fact that payment methods are rather expensive to change, will increase attack costs by orders of magnitude.7
- “This approach of 'not revealing code until attack costs are high' is certainly not limited to payments.This approach of “not revealing code until attack costs are high” is certainly not limited to payments. A less obvious example is to have your higher-profile-play which is enabled-only-for-players-with-certain-experience (from certain level, in certain tournaments, etc. etc.) have a differently obfuscated set of protocols (and differently obfuscated pretty-much-everything-else within appropriate portion of the Client-Side too). Even if Game Logic for such higher-profile play is exactly the same, different automagically generated obfuscations (discussed in [[TODO]] section below) will require attacker to break them again; moreover, this time – he’ll have to do it while being under a pressure of losing a not-so-easily-replaceable-account(!).8
- And of course, we DON’T have to reveal all our anti-bot tools until the player reaches certain level of play (so both our risks from his attack and his costs if we catch him, are higher). Just as one example, we could delay some of the AV-like scanning techniques (discussed in [[TODO]] section below) until higher-profile status is reached by player.9
BTW, this technique of key-sent-from-the-Server-Side is logically strictly equivalent to simply shipping the-code-itself from the Server-Side. Practically – sending code tends to cause a bit less traffic (at the cost of executable being a bit larger), but by these-days-standards both these differences are usually negligible anyway.[[TODO: probably worth moving into a separate section, with a discussion of pros and cons over passing keys (code vs key ~= DLL-like vs static-like: arbitrary code vs better integration+more difficult reversing)]]
Code Integrity vs Code Scrambling
As Code Integrity relies on the binary code being constant, and Code Scrambling does change the very same binary code, it is obvious that they’re at odds with each other <sad-face />.
The simplest way to use both Code Integrity+Code Scrambling is to have the code descrambled at the very start of our executable, and then all the code integrity checks will work. Unfortunately, as discussed above, this class of protection is defeated way too easily, and generally, if you have other options, I DON’T recommend it.
Using on-demand descrambling alongside Code Integrity checks is more elaborated, and is not that straightforward:
- If we’re using page-level on-demand descrambling (with a CPU-level exception processed when an undescrambled-yet page is accessed) – Code Integrity will work normally; however, there are some caveats:
- It will lead to descrambling of the code much earlier than otherwise necessary, which in turn will simplify code dumping <sad-face />
- Server-Side hack detection (as discussed in [[TODO]] section above) will become more difficult.
- If we’re using section-level on-demand descrambling (or more generally – any schema when we’re descrambling because we know we want to enable certain functionality, not because of the page access or something) – then we have to make sure that our code-integrity-checking routines are run only for that code which is already descrambled.
- In theory, it can be achieved by building dependency trees involving different sections, and ensuring that obfuscation going to one section, checks only itself and its own prerequisites, but TBH, not only I never seen such systems. I even didn’t hear about somebody contemplating the writing one.
Code Integrity/Scrambling: Summary and Conclusions
Let’s try to summarize our findings on Code Integrity/Scrambling:
- Code Integrity Checks are nice-to-have, though while they do improve resilience to certain classes of attacks, against a dedicated attacker they count only as a rather mild
- With Code Integrity Checks, the best technique I know about is related to interpreting code segment as an array of constant bytes, and using those constants (and randomly generated functions using those constants) as a part of our obfuscation routines which will be discussed in [[TODO]] section below. The basic idea is that if we XOR10 certain data with a randomly-generated-function-calculated-over-the-code on one side of communication, and XOR the result with the pre-calculated-constant-which-should-match-it, on the other side – it means that any change in the portions-of-the-code-touched-by-the-function will cause the program to fall apart very quickly and in completely unexpected ways.
- Code Scrambling techniques, depending on implementation, can be categorized into 3 groups:
- All-at-once Code Descrambling. As it is subject to dumping (and as it can be disabled in one single place), the benefits of such protection are limited (though still, if you can get it from a 3rd-party provider, it is better than nothing).
- On-Demand Code Descrambling – either page-based, or section-based, but with the keys available within the Client. This one is much better than all-at-once descrambling, and is always preferred to all-at-once techniques. In general, if you don’t want to write DIY protection at this level – IMNSHO, choosing 3rd-party tool with an on-demand descrambling should be your choice.
- “This one opens a whole world of possibilities to catch the hackerOn-Demand Code Descrambling with On-Demand Keys coming from the Server-Side(!). This one opens a whole world of possibilities to catch the hacker (formally, adding Server-Side to the picture changes attackers from operating on “Home Turf”, which is generally winnable by the hacker, into being hacker’s “Road Game”, which is generally winnable by gamedevs – see also Vol. I’s chapter on Cheating).
- Moreover – in most cases we can even hide higher-profile functionality (or at least obfuscation for higher-profile functionality) until the player has reached higher-profile status, and ban player’s account (or even his credit card/Paypal/…) if we find he’s hacker – which increases costs of the attack very significantly.
- And as a side note – protection-wise Descrambling with On-Demand Keys is strictly equivalent to sending the-code-itself from the Server-Side.
Taking into account an inherent conflict between Code Scrambling and Code Integrity Checks, we have to make some not-so-obvious choices. As of the end of 2017, if I am charged with protecting a Really Serious Game™ and having a carte-blanche in this regard,11 I would probably arrange my binary-code protections as follows:
- I’d implement page-based On-Demand Code Descrambling with keys coming on demand from the Server-Side. As it is page-based, it will be the one with guard pages and descrambling on exception; and as for keys coming from the Server-Side – I’d have a request for a key having all the information about CPU registers etc. etc. at the point of the exception.
- To enable it for a not-so-slow game – we’d have to descramble whole bunches of related pages (identifying them won’t be trivial, but for quite a few games rather short playtesting12 will reveal these pages).
- Server-Side would look for unusual descrambling request patterns and would feed them to our Anti-Bot Team to see if the pattern is legit – or if it clearly indicates a hack-in-progress.
- In addition to any real encryption – for good measure, I’d use generated obfuscation (see [[TODO]] section below)
- As for the Code Integrity Checks – I’d limit it to original-unscrambled code (or code-descrambled-on-the-program-launch)
- I’d integrate Code Integrity Checks with data-obfuscation-discussed-in-[[TODO]]-section-below, very tightly
This would be a very good foundation (in addition to other measures discussed over the course of this Chapter); in particular, it would enable the following further steps:
- Automated Server-Side Detection of Hack Attempts
- Identifying pages (or sections) which should never be enabled until the player has paid – and banning payment methods on proven hack attempts.
- Section-based Code Descrambling.
- Building dependency hierarchies of sections, which would enable Code Integrity Checks over the whole code.
Disclaimer: while I DO believe that the approach discussed above is very viable – I didn’t have a chance to try it on a large scale, so make sure to take it even with a bigger pinch of salt than usual; ultimately – it is not the advice which matters, but the reasons behind the advice (and only you can decide whether it applies to your particular game, expertise, development team, etc.).
[[To Be Continued…This concludes beta Chapter 29(e) from the upcoming book “Development and Deployment of Multiplayer Online Games (from social games to MMOFPS, with social games in between)”.
Stay tuned for Chapter 29(f), where I finally hope to get to the serious stuff – Data Obfuscation with Build-Time Polymorphism]]
Don't like this post? Comment↯ below.You do?! Please share:
Cartoons by Sergey Gordeev from Gordeev Animation Graphics, Prague.