Merge pull request #188 from kernelwernel/dev

Dev
kernelwernel · Dec 28, 2024 · dcb937e · dcb937e
2 parents a3bceb6 + 851e7ab
commit dcb937e
Show file tree

Hide file tree

Showing 12 changed files with 1,727 additions and 2,456 deletions.
diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
@@ -1,9 +1,11 @@
-# What does this PR do?
+#  MAKE SURE TO READ THE CONTRIBUTION GUIDELINES BEFORE CONTINUING!
+
+## What does this PR do?
 - [ ] Add a new technique
 - [ ] Add a new feature
 - [ ] Fix bugs
 - [ ] Refactoring 
 - [ ] Sync between branches
 - [ ] Other
 
-# Briefly explain what this PR does:
+## Briefly explain what this PR does:
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -0,0 +1,12 @@
+# Contribution Guidelines
+
+# I want to add a new technique, how would I do that?
+There used to be a complicated process in adding techniques, but that's now been replaced with a tool that prompts you on the new technique details and updates the library code automagically. You can check out the python script at auxiliary/add_technique.py 
+
+
+# I want to make a major change to the library
+Depending on how big the change is, if the change is fairly small then just a simple PR is fine. But if it has hundreds of lines of code changes then it's best to create an issue prior to even starting to write the code, or we can discuss on discord (`kr.nl`). I don't bite ;)
+
+
+# I want to contribute but there's something that I don't understand about the library code
+You can create an issue, and I will reply within 24 hours. We have too much free time on our hands in reality.
diff --git a/README.md b/README.md
@@ -15,7 +15,9 @@ The library is:
 - Very easy to use
 - Cross-platform (Windows + MacOS + Linux)
 - Features up to 130+ unique VM detection techniques [[list](https://github.com/kernelwernel/VMAware/blob/main/docs/documentation.md#flag-table)]
+- Features the most cutting-edge techniques with the most 
 - Able to detect 50+ VM brands including VMware, VirtualBox, QEMU, Hyper-V, and much more [[list](https://github.com/kernelwernel/VMAware/blob/main/docs/documentation.md#brand-table)]
+- Able to beat certain [VM hardeners
 - Compatible with x86 and ARM, with backwards compatibility for 32-bit systems
 - Very flexible, with total fine-grained control over which techniques get executed
 - Able to detect various VM and semi-VM technologies like hypervisors, emulators, containers, sandboxes, and so on
@@ -155,7 +157,9 @@ You can view the full docs [here](docs/documentation.md). All the details such a
 <br>
 
 > There's already loads of projects that have the same goal such as 
-<a href="https://github.com/CheckPointSW/InviZzzible">InviZzzible</a>, <a href="https://github.com/a0rtega/pafish">pafish</a> and <a href="https://github.com/LordNoteworthy/al-khaser">Al-Khaser</a>. But the difference between the aforementioned projects is that they don't provide a programmable interface to interact with the detection mechanisms, on top of having little to no support for non-Windows systems. I wanted the core detection techniques to be accessible programmatically in a cross-platform way for everybody to get something useful out of it rather than providing just a CLI tool. It also contains a larger quantity of techniques, so it's basically just a VM detection library and tool on steroids with maximum flexibility.
+<a href="https://github.com/CheckPointSW/InviZzzible">InviZzzible</a>, <a href="https://github.com/a0rtega/pafish">pafish</a> and <a href="https://github.com/LordNoteworthy/al-khaser">Al-Khaser</a>. But the difference between the aforementioned projects is that they don't provide a programmable interface to interact with the detection mechanisms, on top of having little to no support for non-Windows systems. 
+
+I wanted the core detection techniques to be accessible programmatically in a cross-platform way for everybody to get something useful out of it rather than providing just a CLI tool. It also contains a larger quantity of techniques, so it's basically just a VM detection library and tool on steroids with maximum flexibility.
 
 </details>
 
@@ -164,7 +168,9 @@ You can view the full docs [here](docs/documentation.md). All the details such a
 <summary>How does it compare to paid VM detection libraries? Wouldn't it make it inferior for having it open source?</summary>
 <br>
 
-> There are a few paid software to protect the licensing of other software against against reverse engineers or software cracking, such as <a href="https://docs.sentinel.thalesgroup.com/home.htm">Thales' Sentinel RMS</a> and <a href="https://vmpsoft.com/">VMProtect</a>. Although these are not meant to ONLY be VM detection libraries, they are limited in their capabilities in different ways. Sentinel RMS' VM detection does not have as many VM brands (not to mention the pricing is only meant for corporations, not individuals), and VMProtect has a <a href="https://cyber.wtf/2023/02/09/defeating-vmprotects-latest-tricks/">very limited number of detection techniques</a>, where some of them don't require a lot of effort to bypass with only a few configurations to the VM (the detection mechanism has also been <a href="https://github.com/jmpoep/vmprotect-3.5.1/blob/d8fcb7c0ffd4fb45a8cfbd770c8b117d7dbe52b5/runtime/loader.cc#L2464">leaked</a>, so there's no benefit of having it closed source now). Speaking of which, the only downside to VMAware is that it's fully open source, which makes the job of bypassers easier compared to having it closed source. However, I'd argue that's a worthy tradeoff by having as many VM detection techniques in an open and interactive way, including having valuable community feedback to make the library more effective and accurate.
+> There are a few paid software to protect the licensing of other software against against reverse engineers or software cracking, such as <a href="https://docs.sentinel.thalesgroup.com/home.htm">Thales' Sentinel RMS</a> and <a href="https://vmpsoft.com/">VMProtect</a>. Although these are not meant to ONLY be VM detection libraries, they are limited in their capabilities in different ways. Sentinel RMS' VM detection does not have as many VM brands (not to mention the pricing is only meant for corporations, not individuals), and VMProtect has a <a href="https://cyber.wtf/2023/02/09/defeating-vmprotects-latest-tricks/">very limited number of detection techniques</a>, where some of them don't require a lot of effort to bypass with only a few configurations to the VM (the detection mechanism has also been <a href="https://github.com/jmpoep/vmprotect-3.5.1/blob/d8fcb7c0ffd4fb45a8cfbd770c8b117d7dbe52b5/runtime/loader.cc#L2464">leaked</a>, so there's no benefit of having it closed source now).
+
+Speaking of which, the only downside to VMAware is that it's fully open source, which makes the job of bypassers easier compared to having it closed source. However, I'd argue that's a worthy tradeoff by having as many VM detection techniques in an open and interactive way, including having valuable community feedback to make the library more effective and accurate.
 
 </details>
 
@@ -173,14 +179,25 @@ You can view the full docs [here](docs/documentation.md). All the details such a
 <summary>How can the library distinguish between Hyper-V artifacts and an actual Hyper-V VM in the system?</summary>
 <br>
 
-> Hyper-V has an obscure feature where if it's enabled in the host system, the CPU hardware values makes it look like the whole system is running inside Hyper-V, which isn't true. This makes it a challenge to determine whether the hardware values the library is collecting is either a real Hyper-V VM, or just the artifacts of what Hyper-V has left as a consequence of having it enabled in the host system. The reason why this is a problem is because the library might falsely conclude that your the host system is running in Hyper-V, which is a false positive. This is where the **Hyper-X** mechanism comes into play to distinguish between these two. This was designed by <a href="https://github.com/NotRequiem">Requiem</a>
+> Hyper-V has an obscure feature where if it's enabled in the host system, the CPU hardware values makes it look like the whole system is running inside Hyper-V, which isn't true. This makes it a challenge to determine whether the hardware values the library is collecting is either a real Hyper-V VM, or just the artifacts of what Hyper-V has left as a consequence of having it enabled in the host system. 
+
+The reason why this is a problem is because the library might falsely conclude that your the host system is running in Hyper-V, which is a false positive. This is where the **Hyper-X** mechanism comes into play to distinguish between these two. This was designed by <a href="https://github.com/NotRequiem">Requiem</a>
 
 <p align="center">
 <img src="assets/hyper-x/v5/Hyper-X_version_5.drawio.png" align="center" title="Hyper-X">
 <br>
 </details>
 
 
+<details>
+<summary>How effective are VM hardeners against the lib?</summary>
+<br>
+
+> Hardeners like [VmwareHardenedLoader](https://github.com/hzqst/VmwareHardenedLoader) are not effective (specifically because that project doesn't take into account VM strings on the GPU and has certain flaws in its design), but this doesn't mean that the lib is immune to hardeners. The project is by far the most famous one we know, and that's why we created a bypass against it as our main focus. Custom hardeners that we may not be aware of might have a theoretical advantage, but they are substantially more difficult to produce.
+
+</details>
+
+
 <details>
 <summary>Is it possible to spoof the result?</summary>
 <br>
@@ -201,7 +218,9 @@ You can view the full docs [here](docs/documentation.md). All the details such a
 <summary>Why GPL-3.0 and MIT?</summary>
 <br>
 
-> I would've made it strictly MIT so proprietary software can make use of the library, but some of the techniques employed are from GPL 3.0 projects, and I have no choice but to use the same license for legal reasons. This gave me an idea to make an MIT version without all of the GPL code so it can also be used without forcing your code to be open-source. It should be noted that the MIT version removes <b>12</b> techniques out of 116 (as of 1.9 version), and the lesser the number of techniques, the less accurate the overall result might be.
+> I would've made it strictly MIT so proprietary software can make use of the library, but some of the techniques employed are from GPL 3.0 projects, and I have no choice but to use the same license for legal reasons. 
+
+This gave me an idea to make an MIT version without all of the GPL code so it can also be used without forcing your code to be open-source. It should be noted that the MIT version removes <b>12</b> techniques out of 116 (as of 1.9 version), and the lesser the number of techniques, the less accurate the overall result might be.
 
 </details>
 

diff --git a/auxiliary/add_technique.py b/auxiliary/add_technique.py
@@ -67,10 +67,15 @@ def prompt():
     if not file_path.endswith(".cpp") and not file_path.endswith(".cc"):
         raise ValueError("file input MUST be a .cpp file")
     with open(file_path, 'r') as file:
+        is_static = False
         for line in file:
             if "#include" in line.lower():
                 raise ValueError("The cpp file will be directly copied to the lib verbatim, do not add #include as this will end up in vmaware.hpp")
-
+            if "static" in line:
+                is_static = True
+
+        if not is_static:
+            raise ValueError("The function must be set as static")
 
     # 3: function name
     function_name = questionary.text("What's the name of the technqiue function in your .cpp file?").ask()
@@ -117,8 +122,8 @@ def prompt():
     # 6: description
     description = ""
     while True:
-        text = questionary.text("What's the description of your technique? (50-100 characters)").ask()
-        if len(text) < 50:
+        text = questionary.text("What's the description of your technique? (30-100 characters)").ask()
+        if len(text) < 30:
             print("Too short, try again\n")
             continue
         if len(text) > 100:
@@ -195,6 +200,8 @@ def write_header(options):
         lines = file.readlines()
 
     new_code = []
+    update_count = 0
+
 
     for line in lines:
         # if the line is empty, skip
@@ -205,12 +212,17 @@ def write_header(options):
 
         # modify the enum
         if "// ADD NEW TECHNIQUE ENUM NAME HERE" in line:
-            new_code.append("\t\t" + options.enum_name + ",\n")
+            if options.is_gpl:
+                new_code.append("/* GPL */ " + options.enum_name + ",\n")
+            else:
+                new_code.append("\t\t" + options.enum_name + ",\n")
+            update_count += 1
 
 
         # append the technique function to the function list section
         if "// ADD NEW TECHNIQUE FUNCTION HERE" in line:
             full_technique = []
+            new_code.append("\n")
 
             # manage the category string of the technique
             category_list = []
@@ -255,31 +267,59 @@ def write_header(options):
                 technique_code = technique_file.readlines()
                 full_technique = full_technique + technique_code
 
+
+
             # add the GPL specifier for every line 
             if options.is_gpl:
                 for i in range(len(full_technique)):
-                    full_technique[i] = "/* GPL */ " + full_technique[i]
+                    full_technique[i] = "/* GPL */     " + full_technique[i]
 
             # commit the full technique in the buffer 
-            for technique_line in full_technique:
-                new_code.append("\t" + technique_line)
+            preprocessors = ["#endif", "#elif", "#else", "#if"]
+            if options.is_gpl:
+                for technique_line in full_technique:
+                    if all(sub in technique_line for sub in preprocessors):
+                        new_code.append(technique_line.lstrip())
+                    else:
+                        new_code.append(technique_line)
+            else:
+                for technique_line in full_technique:
+                    if all(sub in technique_line for sub in preprocessors):
+                        new_code.append(technique_line.lstrip())
+                    else:
+                        new_code.append("\t" + technique_line)
+
 
             # extra lines
             new_code.append("\n\n")
+            update_count += 1
 
 
         # modify the technique table with the new technique appended
         if "// ADD NEW TECHNIQUE STRUCTURE HERE" in line:
-            new_code.append(
-                "\t" + 
-                "{ VM::" + 
-                options.enum_name + 
-                ", { " + 
-                str(options.score) + 
-                ", VM::" + 
-                options.function_name +
-                ", false } },\n"
-            )
+            if options.is_gpl:
+                new_code.append(
+                    "/* GPL */ " + 
+                    "{ VM::" + 
+                    options.enum_name + 
+                    ", { " + 
+                    str(options.score) + 
+                    ", VM::" + 
+                    options.function_name +
+                    ", false } },\n"
+                )
+            else:
+                new_code.append(
+                    "\t" + 
+                    "{ VM::" + 
+                    options.enum_name + 
+                    ", { " + 
+                    str(options.score) + 
+                    ", VM::" + 
+                    options.function_name +
+                    ", false } },\n"
+                )
+            update_count += 1
 
 
         # modify the VM::flag_to_string function with the new technique
@@ -291,10 +331,14 @@ def write_header(options):
                 options.enum_name + 
                 "\";\n"
             )
+            update_count += 1
 
         # add the line in the buffer array
         new_code.append(line)
 
+    if update_count != 4:
+        raise ValueError("Not all sections have been update, try to check if the search key values have been modified")
+
 
     # commit the new changes from the buffer array
     with open("../src/vmaware.hpp", "w") as file:
@@ -405,7 +449,7 @@ def write_docs(options):
 
             query = "| " + " | ".join(query_list) + " |"
 
-            new_docs.append(query)
+            new_docs.append(query + "\n")
 
 
         # add the line in the buffer array