Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Runtime Bug/Crash] Linux: audio related crash when trying to play a mission #141

Closed
3 of 21 tasks
bryanperris opened this issue Apr 21, 2024 · 61 comments · Fixed by #158
Closed
3 of 21 tasks

[Runtime Bug/Crash] Linux: audio related crash when trying to play a mission #141

bryanperris opened this issue Apr 21, 2024 · 61 comments · Fixed by #158
Assignees
Labels
64bit Issue or pull relates to 64bit bug Something isn't working

Comments

@bryanperris
Copy link
Contributor

Build Version

1fd8876

Operating System Environment

  • Microsoft Windows (32-bit)
  • Microsoft Windows (64-bit)
  • Mac OS X
  • Linux: Fedora 39

CPU Environment

  • x86 (32-bit Intel/AMD)
  • x86_64 (64-bit Intel/AMD)
  • ARM (32-bit)
  • ARM64 (64-bit; sometimes called AArch64)
  • Other (RISC V, PPC...)

Game Environment

Start a new campaign on any level for Descent 3: Retribution

Description

After starting a mission, you wait at the loading screen, then the game crashes

Regression Status

Present since first commit

Steps to Reproduce

Just start playing any level from Descent 3: Retribution, then it crashes on the loading screen before the actual gameplay starts.

  • Game modes affected:
    • Single player
    • Multiplayer competitive
      • Anarchy
      • Hyper-Anarchy
      • Robo-Anarchy
      • Team Anarchy
      • Capture the Flag
      • Bounty
      • Entropy
      • Hoard
      • Monsterball
    • Multiplayer cooperative

Here are more details provided by GDB:
Thread 1 "Descent3" received signal SIGSEGV, Segmentation fault. 0x000000000067747e in hlsSystem::ComputePlayInfo (this=0x1339f20 <Sound_system>, sound_obj_index=1, virtual_pos=0x7fffffffbd70, virtual_vel=0x7fffffffbd64, adjusted_volume=0x7fffffffbd60) at /home/bperris/base/dev/Descent3/sndlib/hlsoundlib.cpp:925 925 sound_seg = BOA_INDEX(sound_seg); Missing separate debuginfos, use: dnf debuginfo-install xorg-x11-drv-nvidia-libs-550.67-1.fc39.x86_64

@bryanperris bryanperris added the bug Something isn't working label Apr 21, 2024
@JeodC
Copy link
Collaborator

JeodC commented Apr 21, 2024

@DanielGibson
Copy link
Contributor

can you also get a backtrace from gdb? (enter bt in the gdb commandline after the crash)

@bryanperris
Copy link
Contributor Author

bryanperris commented Apr 21, 2024

#0  0x000000000067747e in hlsSystem::ComputePlayInfo (this=0x1339f20 <Sound_system>, sound_obj_index=2, virtual_pos=0x7fffffffbd70, virtual_vel=0x7fffffffbd64, 
    adjusted_volume=0x7fffffffbd60) at /home/bperris/base/dev/Descent3/sndlib/hlsoundlib.cpp:925
#1  0x0000000000677f1d in hlsSystem::Emulate3dSound (this=0x1339f20 <Sound_system>, sound_obj_index=2)
    at /home/bperris/base/dev/Descent3/sndlib/hlsoundlib.cpp:1024
#2  0x00000000006789b8 in hlsSystem::Play3dSound (this=0x1339f20 <Sound_system>, sound_index=126, cur_pos=0x7fffffffbe40, cur_obj=0x8fe8b0 <Objects+39312>, 
    priority=1, volume=1, flags=0, offset=0) at /home/bperris/base/dev/Descent3/sndlib/hlsoundlib.cpp:1157
#3  0x0000000000678269 in hlsSystem::Play3dSound (this=0x1339f20 <Sound_system>, sound_index=126, priority=1, cur_obj=0x8fe8b0 <Objects+39312>, volume=1, flags=0, 
    offset=0) at /home/bperris/base/dev/Descent3/sndlib/hlsoundlib.cpp:1065
#4  0x0000000000412429 in AIInit (obj=0x8fe8b0 <Objects+39312>, ai_class=0 '\000', ai_type=2 '\002', ai_movement=1 '\001')
    at /home/bperris/base/dev/Descent3/Descent3/AImain.cpp:3741
#5  0x0000000000545c7d in ObjInitGeneric (objp=0x8fe8b0 <Objects+39312>, reinit=false) at /home/bperris/base/dev/Descent3/Descent3/ObjInit.cpp:810
#6  0x0000000000546b97 in ObjInitTypeSpecific (objp=0x8fe8b0 <Objects+39312>, reinitializing=false) at /home/bperris/base/dev/Descent3/Descent3/ObjInit.cpp:1117
#7  0x0000000000546ed3 in ObjInit (objp=0x8fe8b0 <Objects+39312>, type=2, id=102, handle=2087, pos=0x7fffffffc034, creation_time=0, parent_handle=-1)
    at /home/bperris/base/dev/Descent3/Descent3/ObjInit.cpp:1206
#8  0x00000000004c21c6 in ReadObject (ifile=0x27de7e0, objp=0x8fe8b0 <Objects+39312>, handle=2087, fileversion=127)
    at /home/bperris/base/dev/Descent3/Descent3/LoadLevel.cpp:1667
#9  0x00000000004c9b58 in LoadLevel (filename=0x169ad50 "Level2.d3l", cb_fn=0x0) at /home/bperris/base/dev/Descent3/Descent3/LoadLevel.cpp:3816
#10 0x00000000004dbd6f in LoadMissionLevel (level=2) at /home/bperris/base/dev/Descent3/Descent3/Mission.cpp:1296
#11 0x000000000049201d in LoadAndStartCurrentLevel () at /home/bperris/base/dev/Descent3/Descent3/gamesequence.cpp:1706
#12 0x0000000000491011 in GameSequencer () at /home/bperris/base/dev/Descent3/Descent3/gamesequence.cpp:1199
#13 0x0000000000472187 in PlayGame () at /home/bperris/base/dev/Descent3/Descent3/game.cpp:832
#14 0x0000000000464a1c in MainLoop () at /home/bperris/base/dev/Descent3/Descent3/descent.cpp:621
#15 0x00000000004648b8 in Descent3 () at /home/bperris/base/dev/Descent3/Descent3/descent.cpp:571
#16 0x00000000005e6df1 in oeD3LnxApp::run (this=0x7fffffffd4a0) at /home/bperris/base/dev/Descent3/Descent3/lnxmain.cpp:255
#17 0x00000000005e6d25 in main (argc=1, argv=0x7fffffffd6a8) at /home/bperris/base/dev/Descent3/Descent3/lnxmain.cpp:755

@winterheart
Copy link
Collaborator

Same as #7, corruptions on loading level.

@bryanperris
Copy link
Contributor Author

Same as #7, corruptions on loading level.

That could be the reason I couldn't send out the guidebot or open a door on level 2

@MaddTheSane
Copy link
Contributor

This is also present on macOS targeting arm64.

@JeodC
Copy link
Collaborator

JeodC commented Apr 22, 2024

Only Windows is being forced to build in 32bit. For Linux, at least, Currently CI builds 64-bit executables and d3-linux.hog. In order to build 32-bit you need add -DBITS=-m32 to cmake configure command.. I'm not certain if it's similar for mac. I suggest trying a 32bit build and see if you get the same issue.

@MaddTheSane
Copy link
Contributor

One problem with trying to get 32-bit binaries working on a modern version of macOS: It doesn't support 32-bit binaries, either AARCH32 or i386.
Support for 32-bit binaries was dropped in 10.15, and Apple Silicon Macs were released with 11.0.

@DanielGibson
Copy link
Contributor

The problem here seems to be that in hlsSystem::ComputePlayInfo() (where the crash happens), sound_seg is -1, which then causes the crash in sound_seg = BOA_INDEX(sound_seg);

BOA_INDEX() is the following macro:
#define BOA_INDEX(x) ((ROOMNUM_OUTSIDE(x) ? (TERRAIN_REGION(x) + Highest_room_index + 1) : x))

in the end, BOA_INDEX(sound_seg) expands to:
(((((sound_seg) & 0x80000000) != 0) ? (((Terrain_seg[0x7FFFFFFF & sound_seg].flags & (32 + 64 + 128)) >> 5) + Highest_room_index + 1) : sound_seg))

0x7FFFFFFF & -1 should be 0x7FFFFFFF (2'147'483'647), which is a way too high index for terrain_segment Terrain_seg[TERRAIN_WIDTH * TERRAIN_DEPTH] = Terrain_seg[256 * 256] = Terrain_seg[65536]

I just provoked the crash with starting a new game, specifically the "Pilot Training" mission.
In my case, sound_obj_index was 0, m_sound_objects[sound_obj_index].m_link_info.object_handle was 8200 and the object returned by ObjGet(m_sound_objects[sound_obj_index].m_link_info.object_handle); had pos = { 2059.34961, -723.258789, 2469.10718 } and roomnum = -1, which is where sound_seg = -1 came from.

No idea why roomnum is -1.

Adding

  if(sound_seg == -1)
    return false;

right before the sound_seg = BOA_INDEX(sound_seg); line prevents the crash, but I have no idea if this is the correct fix, or if the real bug is much earlier when -1 set as the object's roomnum, wherever that happens, and should be fixed there

@JeodC
Copy link
Collaborator

JeodC commented Apr 23, 2024

The problem here seems to be that in hlsSystem::ComputePlayInfo() (where the crash happens), sound_seg is -1, which then causes the crash in sound_seg = BOA_INDEX(sound_seg);

BOA_INDEX() is the following macro: #define BOA_INDEX(x) ((ROOMNUM_OUTSIDE(x) ? (TERRAIN_REGION(x) + Highest_room_index + 1) : x))

in the end, BOA_INDEX(sound_seg) expands to: (((((sound_seg) & 0x80000000) != 0) ? (((Terrain_seg[0x7FFFFFFF & sound_seg].flags & (32 + 64 + 128)) >> 5) + Highest_room_index + 1) : sound_seg))

0x7FFFFFFF & -1 should be 0x7FFFFFFF (2'147'483'647), which is a way too high index for terrain_segment Terrain_seg[TERRAIN_WIDTH * TERRAIN_DEPTH] = Terrain_seg[256 * 256] = Terrain_seg[65536]

I just provoked the crash with starting a new game, specifically the "Pilot Training" mission. In my case, sound_obj_index was 0, m_sound_objects[sound_obj_index].m_link_info.object_handle was 8200 and the object returned by ObjGet(m_sound_objects[sound_obj_index].m_link_info.object_handle); had pos = { 2059.34961, -723.258789, 2469.10718 } and roomnum = -1, which is where sound_seg = -1 came from.

No idea why roomnum is -1.

Adding

  if(sound_seg == -1)
    return false;

right before the sound_seg = BOA_INDEX(sound_seg); line prevents the crash, but I have no idea if this is the correct fix, or if the real bug is much earlier when -1 set as the object's roomnum, wherever that happens, and should be fixed there

I was looking into this earlier tonight, seems I was on the right track after all. It gets initialized at

objp->roomnum = -1;

Along with that are a few other inits with val -1, maybe they need to be changed as well.

@DanielGibson
Copy link
Contributor

DanielGibson commented Apr 23, 2024

But probably it should be set to a proper value elsewhere?
Otherwise, why does it not crash on Win32? The calculations in the macro should be the same, so the index would be far behind the end of the array..
On the other hand, on 32bit Terrain_seg[0x7FFFFFFF] would overflow the address space - sizeof(terrain_segment) is 20 so that would result in the address (char*)&Terrain_seg[0] + 0x7FFFFFFF * 20; that multiplication should overflow and be truncated to 0xFFFFFFEC (if I'm not mistaken), so we'd basically get (char*)&Terrain_seg[0] + 0xFFFFFFEC. That addition would overflow again, and we'd probably end up 20 bytes before the start of the Terrain_seg array - with some luck at least some valid data is there (possibly VisibleTerrainZ or some other global variable), so accessing it doesn't outright crash.
On 64bit, neither the multiplication nor the addition will overflow, because the address space is big enough - so we'll very likely end up at an address that's currently not valid.

Either way, the solution definitely is not to initialize roomnum to a different value in the constructor or something.

Either the code only worked by accident on Win32 (by overflowing to an address that's valid) and adding if(sound_seg == -1) return false; is the correct solution, or roomnum should've been set to another value while loading the level or something, and for some reason wasn't.

@JeodC
Copy link
Collaborator

JeodC commented Apr 23, 2024

Well, it's read in at

objp->roomnum = roomnum;
where objp->roomnum = roomnum.

@DanielGibson
Copy link
Contributor

DanielGibson commented Apr 23, 2024

Yeah, but is that done for all objects, or is this specific object left out for some reason?
Or did cf_ReadInt(ifile); return -1?

By the way, I wrote a hacky testprogram (that's not really valid C, the compiler might optimize void* p2 = &blas[0x7FFFFFFF]; to whatever because it's clearly not within the array, but at least when building without optimizations it worked on my PC so it's good enough for demonstration):

#include <stdio.h>

struct bla { char data[20]; };

struct bla blas[256*256];

int main()
{
	void* p = &blas[0];
	void* p2 = &blas[0x7FFFFFFF];
	printf("p = %p p2 = %p diff = %td\n", p, p2, p2-p);
	return 0;
}

When compiled for 32bit (gcc -o test -m32 test.c), it prints something like:
p = 0x5947f040 p2 = 0x5947f02c diff = -20
for 64bit (gcc -o test test.c) it prints:
p = 0x6151a5ee7040 p2 = 0x615ba5ee702c diff = 42949672940

So my theory that on 32bit platforms one lands 20 bytes before the array seems to hold (didn't test this on Windows with MSVC though - I would expect it to behave the same, but I always forget if some platforms treat pointers as signed or some stupid shit like that)

DanielGibson added a commit to DanielGibson/Descent3 that referenced this issue Apr 23, 2024
sound_seg was -1, which crashes in BOA_INDEX(sound_seg) on 64bit
platforms (by pure luck it doesn't seem to crash on 32bit platforms)
@DanielGibson
Copy link
Contributor

Created a pull request with the fix mentioned above: #158
Needs reviewing from someone who actually understands how the Descent3 source is supposed to work (i.e. @kevinbentley).

@jcoby
Copy link
Contributor

jcoby commented Apr 23, 2024

I did some digging around and we are loading objects with negative room numbers.

From the training mission:
Loaded Object 20 type 6 in room -2147483648 (type 6 is a OBJ_VIEWER)

Retribution level 1 has lots of them:

Loaded Object 2 type 7 in room -2147442039
Loaded Object 16 type 7 in room -2147441782
Loaded Object 12 type 7 in room -2147441526
Loaded Object 12 type 7 in room -2147439982

(type 7 is a OBJ_POWERUP)

Use this patch along with #157 (once it lands) to see these outputs:

index 6434fbd..31361b7 100644
--- a/Descent3/LoadLevel.cpp
+++ b/Descent3/LoadLevel.cpp
@@ -1,5 +1,5 @@
 /*
-* Descent 3
+* Descent 3
 * Copyright (C) 2024 Parallax Software
 *
 * This program is free software: you can redistribute it and/or modify
@@ -1572,7 +1572,7 @@ void ConvertObject(int *type, int *id) {
 // returns 1 if read ok
 int ReadObject(CFILE *ifile, object *objp, int handle, int fileversion) {
   int type, id, old_id, i;
-  int roomnum;
+  int roomnum = 0;
   float door_shields;
   char tempname[OBJ_NAME_LEN + 1] = "";

@@ -1650,6 +1650,7 @@ int ReadObject(CFILE *ifile, object *objp, int handle, int fileversion) {

   // Get the room number
   roomnum = cf_ReadInt(ifile);
+  mprintf((0, "Loaded Object %d type %d in room %d\n", id, type, roomnum));

   // For old files, check if this object is on terrain
   if (fileversion < 49) {

@DanielGibson
Copy link
Contributor

DanielGibson commented Apr 23, 2024

I think negative room numbers are ok, that encodes if it's an (indoor) room or an (outdoor) cell, as far as I can tell, see ROOMNUM_OUTSIDE(), ROOMNUM_CELLNUM_FLAG (which is 0x80000000 = -2147483648 aka INT32_MIN) etc in Descent3/object_external.h

But -1 specifically apparently means "uninitialized", and -1 & ROOMNUM_CELLNUM_MASK = -1 & 0x7fffffff = 0x7fffffff = 2147483647, which is the highest possible (signed 32bit) index.
On the other hand, if for example you have room -2147442039, CELLNUM(-2147442039) = -2147442039 & ROOMNUM_CELLNUM_MASK = -2147442039 & 0x7fffffff = 41609 which is a reasonably small number (remember, we have 256*256 = 65536 terrain segments, which I guess are cells)

@bryanperris
Copy link
Contributor Author

This is the problem I was investigating last weekend, and yes inside rooms are positive, and negative are outside except for the case of -1 which means the object has no room assigned, unlinked. I couldn't figure out why in a 64-bit build, unlinked objects are being passed around, so all I could do it make the sanity check to prevent crash when unlinking an object that has already been unlinked. I cannot also play level 2, as the first door just doesn't open for me, it is a sign things are not properly setup. This happens when loading a demo file, and I know the demo file hasn't changed in a long time, so it confirms this strange bug causes chain of strange things to happen.

@bryanperris
Copy link
Contributor Author

@JeodC
Copy link
Collaborator

JeodC commented Apr 23, 2024

Seems we could do with both a sanity check for unlinked objects and some additional logging for objects as they're loaded, which will help us diagnose problems with incorrect room assignments.

@jcoby
Copy link
Contributor

jcoby commented Apr 23, 2024

Doing some more digging and the issue is that ObjInit (eventually) calls hlsSystem::Play3dSound (AImain.cpp:3741 in AIInit) but the roomnum isn't set until after ObjInit is called because all of the init code assumes roomnum is -1 during init. Moving the code to set roomnum earlier causes a whole other set of problems.

Commenting out the Play3dSound call seems to fix the problem but I don't know why it's there.

bt:

  * frame #0: 0x000000010033fbd4 Descent3`Debug_ErrorBox(type=2, topstring="Descent 3 Assertion Failed", title="Assertion failed: (0)\n\nFile /Users/jcoby/src/Descent3/sndlib/hlsoundlib.cpp at line 921.", bottomstring="Continue?") at lnxdebug.cpp:88:10
    frame #1: 0x00000001002b46f8 Descent3`AssertionFailed(expstr="0", file="/Users/jcoby/src/Descent3/sndlib/hlsoundlib.cpp", line=921) at error.cpp:222:12
    frame #2: 0x0000000100318640 Descent3`hlsSystem::ComputePlayInfo(this=0x0000000100f12668, sound_obj_index=0, virtual_pos=0x000000016fdfb5a4, virtual_vel=0x000000016fdfb598, adjusted_volume=0x000000016fdfb594) at hlsoundlib.cpp:921:7
    frame #3: 0x0000000100317ff0 Descent3`hlsSystem::Emulate3dSound(this=0x0000000100f12668, sound_obj_index=0) at hlsoundlib.cpp:1033:15
    frame #4: 0x0000000100319ef4 Descent3`hlsSystem::Play3dSound(this=0x0000000100f12668, sound_index=291, cur_pos=0x000000016fdfb670, cur_obj=0x0000000100550e50, priority=1, volume=1, flags=0, offset=0) at hlsoundlib.cpp:1166:17
    frame #5: 0x00000001003197ac Descent3`hlsSystem::Play3dSound(this=0x0000000100f12668, sound_index=291, priority=1, cur_obj=0x0000000100550e50, volume=1, flags=0, offset=0) at hlsoundlib.cpp:1074:10
    frame #6: 0x0000000100015d54 Descent3`AIInit(obj=0x0000000100550e50, ai_class='\0', ai_type='\x02', ai_movement='\x01') at AImain.cpp:3741:49
    frame #7: 0x0000000100195c48 Descent3`ObjInitGeneric(objp=0x0000000100550e50, reinit=false) at ObjInit.cpp:810:5
    frame #8: 0x0000000100196ef0 Descent3`ObjInitTypeSpecific(objp=0x0000000100550e50, reinitializing=false) at ObjInit.cpp:1117:12
    frame #9: 0x0000000100197298 Descent3`ObjInit(objp=0x0000000100550e50, type=2, id=106, handle=8200, pos=0x000000016fdfb9d8, creation_time=0, parent_handle=-1) at ObjInit.cpp:1206:10
    frame #10: 0x00000001000f62f0 Descent3`ReadObject(ifile=0x000060000071b270, objp=0x0000000100550e50, handle=8200, fileversion=127) at LoadLevel.cpp:1668:3
    frame #11: 0x00000001000ffdc8 Descent3`LoadLevel(filename="trainingmission.d3l", cb_fn=0x0000000000000000) at LoadLevel.cpp:3817:11

patch:

index 09b16ec..0bdea95 100644
--- a/Descent3/AImain.cpp
+++ b/Descent3/AImain.cpp
@@ -1,5 +1,5 @@
 /*
-* Descent 3
+* Descent 3
 * Copyright (C) 2024 Parallax Software
 *
 * This program is free software: you can redistribute it and/or modify
@@ -3738,8 +3738,8 @@ bool AIInit(object *obj, ubyte ai_class, ubyte ai_type, ubyte ai_movement) {
     // Accounts for sound loading
     if ((obj->handle & HANDLE_COUNT_MASK) != 0) {
       ai_info->last_played_sound_index = Object_info[obj->id].anim[ai_movement].elem[anim].anim_sound_index;
-      ai_info->anim_sound_handle = Sound_system.Play3dSound(
-          Object_info[obj->id].anim[ai_movement].elem[anim].anim_sound_index, SND_PRIORITY_LOW, obj);
+      // ai_info->anim_sound_handle = Sound_system.Play3dSound(
+      //     Object_info[obj->id].anim[ai_movement].elem[anim].anim_sound_index, SND_PRIORITY_LOW, obj);
     } else {
       ai_info->last_played_sound_index = -1;
     }

@bryanperris
Copy link
Contributor Author

Seems we could do with both a sanity check for unlinked objects and some additional logging for objects as they're loaded, which will help us diagnose problems with incorrect room assignments.

Yeah probably be the best to have a modern logging system implemented, eg: debug, notice, warning, error, then use trace level messages on the whole level setup system to track every detail of what is really going on, then logs can be compared between windows and linux.

@JeodC
Copy link
Collaborator

JeodC commented Apr 23, 2024

Seems we could do with both a sanity check for unlinked objects and some additional logging for objects as they're loaded, which will help us diagnose problems with incorrect room assignments.

Yeah probably be the best to have a modern logging system implemented, eg: debug, notice, warning, error, then use trace level messages on the whole level setup system to track every detail of what is really going on, then logs can be compared between windows and linux.

So...imgui?

@bryanperris
Copy link
Contributor Author

Nothing related to graphics, just using a simple backend logging library like log4c, where you simply log some text with a chosen verbosity level, then there is usually a config file to control the loggers: logger for the command line, logger for file, etc.

@jcoby
Copy link
Contributor

jcoby commented Apr 23, 2024

Console logging is available. See pr #157. Or add -DMONO to your cmake config.

The problem doesn't appear to be incorrect room assignments either (see my comment above). It's that the sound system is trying to play a sound before the object is fully initialized and assigned its room number.

I think this is done to play looping sounds but I also can't figure out why this would work on windows and not linux/mac.

@bryanperris
Copy link
Contributor Author

I do understand there is some built in logging to the console functionality, but it would better using a robust logging system where things are organized by log level and category. With all these level script going on, and level setup, be nice to turn on trace logs to see what in detail is happening like which source file and function is doing what, and timestamps too. When people file a bug ticket, they can give us better logging information. The built in logger only offers 2 levels and it only logs to console.

@bryanperris
Copy link
Contributor Author

Hard to be 100% sure, but I'm quite confident that this fix doesn't break other things, because in the end it behaves the same as it would when BOA_IsSoundAudible() returns false (because the sound is in a room too far away) two lines later.

(And if it does break other things, those were broken to begin with and need to be fixed)

I commented out calls to compute play info, and see issues with the in-game scripting, not even guidebot wants to come out, so wondering if there are other wacky memory accesses going on that just works out for windows.

@jcoby
Copy link
Contributor

jcoby commented Apr 23, 2024

Hard to be 100% sure, but I'm quite confident that this fix doesn't break other things, because in the end it behaves the same as it would when BOA_IsSoundAudible() returns false (because the sound is in a room too far away) two lines later.
(And if it does break other things, those were broken to begin with and need to be fixed)

I commented out calls to compute play info, and see issues with the in-game scripting, not even guidebot wants to come out, so wondering if there are other wacky memory accesses going on that just works out for windows.

If you apply the patch from #160 does guidebot work for you? It fixes the scripts on aarch64 for me.

@bryanperris
Copy link
Contributor Author

bryanperris commented Apr 23, 2024

Yes it does, I get the intro animation now too, and the door opens. I didn't get the guidebot menu at all before.

@winterheart winterheart added the 64bit Issue or pull relates to 64bit label Apr 23, 2024
@DanielGibson
Copy link
Contributor

I think we need to short circuit the Play3dSound call if roomnum is -1. If we don't it's going to push the invalid sound onto the sound buffer and it won't get updated to play later on since the link has been created but roomnum will always be -1 unless there is some other code that sets the sound's roomnum later

What do you mean exactly with pushes onto the sound buffer?
If you mean the stuff that's initialized in the m_sound_objects[i] "sound slot", I think that is all undone by this code here:

Descent3/sndlib/hlsoundlib.cpp

Lines 1157 to 1160 in 86a6c13

f_audible = Emulate3dSound(i);
if ((!f_audible) && (!(m_sound_objects[i].m_obj_type_flags & SIF_LOOPING))) {
StopSound(i, SKT_STOP_IMMEDIATELY);
return -1;

If Emulate3dSound(i) returns false (it does when ComputePlayInfo() returns false), StopSound(i, SKT_STOP_IMMEDIATELY); is called, which will do the following:
m_sound_objects[sound_obj_index].m_obj_type_flags = SIF_UNUSED;
This means that m_sound_objects[i] is basically unlinked, and next time Play3dSound() is called, the same i will be found in the "find free sound slot." loop, and will be used as fresh "sound slot".

Or is some other global state also changed that is not undone?

(Short circuiting it probably still is a good idea, because why do all that work if we already know in the first Play3dSound() that we have no valid roomnum..)

@jcoby
Copy link
Contributor

jcoby commented Apr 23, 2024

If you mean the stuff that's initialized in the m_sound_objects[i] "sound slot", I think that is all undone by this code here:

That's what I mean yes.

If I'm reading that bit of code correctly (no guarantees I am) it only stops sounds that aren't audible AND aren't looping (SIF_LOOPING). If we try to play something with a looping sound emitter it won't work unless there is a valid roomnum and I can't find any code that updates a sound's roomnum once it's put into m_sound_objects.

Short circuiting the Play3dSound avoids that problem. I don't know where to find a looping sound attached to an object though.

DanielGibson added a commit to DanielGibson/Descent3 that referenced this issue Apr 23, 2024
sound_seg was -1, which crashes in BOA_INDEX(sound_seg) on 64bit
platforms (by pure luck it doesn't seem to crash on 32bit platforms)

It makes sense to catch this problem much earlier in Play3dSound(),
to avoid executing all the superfluous
@DanielGibson
Copy link
Contributor

You're right, I guess it could happen that we have a looping sound there that's neither properly initialized nor stopped.

I just updated my PR to add an early out, see https://github.com/DescentDevelopers/Descent3/pull/158/files

@JeodC JeodC linked a pull request Apr 23, 2024 that will close this issue
@JeodC
Copy link
Collaborator

JeodC commented Apr 23, 2024

@DanielGibson Are you in the DDN Discord by any chance?

@DanielGibson
Copy link
Contributor

I am now (as caedes), can't promise I'll be too active though

@bryanperris
Copy link
Contributor Author

We are in the same boat as life is busy.

@winterheart
Copy link
Collaborator

I think that issue may be solved with these changes

--- a/Descent3/terrain.h
+++ b/Descent3/terrain.h
@@ -213,7 +213,7 @@ extern terrain_tex_segment Terrain_tex_seg[TERRAIN_TEX_WIDTH * TERRAIN_TEX_DEPTH
 // first object to render after cell has been rendered (only used for SW renderer)
 extern short Terrain_seg_render_objs[];
 
-#define TERRAIN_REGION(x) ((Terrain_seg[0x7FFFFFFF & x].flags & TFM_REGION_MASK) >> 5)
+#define TERRAIN_REGION(x) ((Terrain_seg[0x7FFF & x].flags & TFM_REGION_MASK) >> 5)
 
 extern terrain_sky Terrain_sky;

@DanielGibson
Copy link
Contributor

DanielGibson commented Apr 24, 2024

Why would that be a solution?

That just turns -1 into 0x7FFF, which might be a valid index within the array (and thus wouldn't crash), but that element of the array is still totally unrelated to roomnum -1 which simply is invalid/unassigned/unlinked

@bryanperris
Copy link
Contributor Author

just to make note here: the size of the terrain grid is based on 2 defined constants:
W = 256
H = 256

So I don't think a fixed mask makes sense.

@DanielGibson
Copy link
Contributor

Yes, that's the other side: Terrain_seg[] has 65536 elements, 0x7FFF is 32767, so with the "fix" suggested by winterheart, TERRAIN_REGION(x) would stop working for all indices between 32768 and 65536.

Anyway, room number -1 is a special value with the special meaning of "unassigned" and thus has no corresponding entry in Terrain_seg[] or any other array, so there is no mask that turns it into a correct value.

@winterheart
Copy link
Collaborator

winterheart commented Apr 24, 2024

OK, here some explanations.

Limits of C array indexes is PTRDIFF_MAX that bound to size_t and INT_MAX. On 32-bit systems PTRDIFF_MAX equals (2^32-1) or 0x7fffffff, but on 64-bit systems PTRDIFF_MAX is (2^64-1) or 0x7fffffffffffffff. So long story short, when you try access to array element with PTRDIFF_MAX index, this effectively will result of a[-1] or pointer to array itself. Accessing to it is likely undefined behavior, but we lucky and in result we receive NULL on elements that can be interpreted as 0.
But on 64-bit systems accessing to array with 0x7fffffff (32-bit PTRDIFF_MAX) is ordinary index, but it inaccessible due array limits (256*256), with out-of-bound runtime error and SEGSERV.

In code

#define ROOMNUM_CELLNUM_FLAG 0x80000000
#define ROOMNUM_CELLNUM_MASK 0x7fffffff

is INT_MAX and PTRDIFF_MAX for 32-bit systems.

So, to solve this issue, we need just check objp->roomnum to bound limits (0, 65535) and return zero if we out of limits like this:

int BOA_INDEX(int x) {
  if ((x < 0) || (x > (TERRAIN_WIDTH * TERRAIN_DEPTH))) {
    return (Highest_room_index + 1);
  }
  else {
    return x;
  }
}

@DanielGibson
Copy link
Contributor

How do you manage to have such a high density of wrong information, are you an LLM?

Limits of C array indexes is PTRDIFF_MAX that bound to size_t and INT_MAX.

No. ptrdiff_t is a signed type (so on 32bit systems PTRDIFF_MAX is 0x7FFFFFFF, around 2 million) whoile size_t is unsigned (so on 32bit systems the maximum value, SIZE_MAX, is 0xFFFFFFFF, around 4 million). INT_MAX is the maximum value of int, which is a signed type and 32bits of size on most currently relevant 32 and 64bit platforms, so it's usually 0x7FFFFFFF.
On 64bit platforms, PTRDIFF_MAX is 0x7FFFFFFFFFFFFFFF and SIZE_MAX is 0xFFFFFFFFFFFFFFFF, while INT_MAX (at least on x86_64 and arm64) still is 0x7FFFFFFF.

on 64-bit systems PTRDIFF_MAX is (2^64-1) or 0x7fffffffffffffff

no, it's 0x7FFFFFFFFFFFFFFF, which is (2^63-1)

So long story short, when you try access to array element with PTRDIFF_MAX index, this effectively will result of a[-1] or pointer to array itself.

That arr[PTRDIFF_MAX] ending up being equivalent to arr[-1] is not universally true - it's only true if the element size is a multiple of 2.
If the array's element size is for example 1 (array of bytes) or 3 or any other odd value, arr[PTRDIFF_MAX] will be a very high value, around 2 gigabytes behind the start of the array.

Furthermore, a[-1] is not "pointer to array itself", it's random memory that's array_element_size bytes before the start of the array.

So, to solve this issue, we need just check objp->roomnum to bound limits (0, 65535) and return zero if we out of limits

No, returning 0 doesn't make it right. If anything, BOA_INDEX() should return -1 when called with -1, because some other functions (like BOA_IsSoundAudible()) will interpret that as "invalid value" and abort early.

Furthermore, the code you posted doesn't even do what you wrote above (really, are you an LLM?!), and even apart from that is not equivalent to what the BOA_INDEX macro does:
#define BOA_INDEX(x) ((ROOMNUM_OUTSIDE(x) ? (TERRAIN_REGION(x) + Highest_room_index + 1) : x))
(note that TERRAIN_REGION(x) part, for example)

@DanielGibson
Copy link
Contributor

DanielGibson commented Apr 24, 2024

If I were to turn the BOA_INDEX() macro into a function (and I'm not saying it should be done!), I'd turn it into something like this (untested):

inline int BOA_INDEX(int x) {
	// x == -1 should always be handled before calling BOA_INDEX()
	if(x == -1) {
		// in debug builds, make x == -1 exit D3 with an error,
		// so the problem becomes obvious and the surrounding code can be fixed
		ASSERT(x != -1 && "Make sure not to call BOA_INDEX() with -1!");
		// otherwise (in release builds), return -1 so hopefully
		// the code that uses the result of this function detects
		// the invalid value and handles it accordingly.
		return -1;
	}
	if(ROOMNUM_OUTSIDE(x)) {
		ASSERT((0x7FFFFFFF & x) < sizeof(Terrain_seg)/sizeof(Terrain_seg[0]
			&& "x negative, but not a valid index for Terrain_seg[]!");
		return TERRAIN_REGION(x) + Highest_room_index + 1;
	}
	return x;
}

@winterheart
Copy link
Collaborator

winterheart commented Apr 24, 2024

Daniel, I've tried this case on 32-bit and 64-bit mode, and this is hows code works. If you not comfortable with my explanations, you can attribute this to the fact that before this post I was engaged in continuous debugging until 4 AM, which I could not have done.

Still, -1 is valid value for BOA_INDEX, and TERRAIN_REGION(-1) should return zero to handle UB, as this is expected behavior in 32-bit system. Here another proposed variant (as inlined function):

inline int TERRAIN_REGION(int x) {
  if (x == -1) return 0;
  return ((Terrain_seg[0x7FFFFFFF & x].flags & TFM_REGION_MASK) >> 5);
}

@winterheart
Copy link
Collaborator

As for your variant, BOA_INDEX(-1) shouldn't return -1 because in that case in 32 bit system inner TERRAIN_REGION returns zero, and BOA_INDEX returns Highest_room_index + 1.

@JeodC
Copy link
Collaborator

JeodC commented Apr 24, 2024

Looks to me that returning -1 explicitly indicates to downstream code that the room number is invalid. If we return 0, we may prevent a crash, but as it introduces undefined behavior we may inadvertently cause more subtle bugs to be introduced. Returning -1 is also consistent across architectures. I think we should avoid a band-aid solution here.

@DanielGibson
Copy link
Contributor

Which part of "that code was always buggy" did you not understand?

The goal is not to emulate the broken behavior on 32bit (that just by chance happened to not crash), but to fix the code (even if it appeared to work on 32bit so far, there is no guarantee that it doesn't break with other compilers or whatever - or that it didn't cause actual bugs that were just more subtle than crashes).

And fixing means handling index -1 as one should according to its meaning of "unassigned value", which does not mean "just use whatever is at index 0".

@winterheart
Copy link
Collaborator

Daniel, please calm your tone.

Right after assigning sound_seg and ear_seg in code already has correctness check of values:

  if (!BOA_IsSoundAudible(sound_seg, ear_seg))
    return false;

If there incorrect data (and Highest_room_index + 1 is incorrect data) hlsSystem::ComputePlayInfo will immediately returns with false and stops any sound processing with invalid data. So there already sanity checks in place.

@DanielGibson
Copy link
Contributor

Highest_room_index + 1 is incorrect data

As far as I can tell from BOA_IsSoundAudible()s code, this assumption is wrong.

hlsSystem::ComputePlayInfo will immediately returns with false

and thus, so is this one.

JeodC added a commit that referenced this issue Apr 24, 2024
Fix crash in hlsSystem::ComputePlayInfo(), #141
JeodC pushed a commit that referenced this issue Apr 28, 2024
sound_seg was -1, which crashes in BOA_INDEX(sound_seg) on 64bit
platforms (by pure luck it doesn't seem to crash on 32bit platforms)

It makes sense to catch this problem much earlier in Play3dSound(),
to avoid executing all the superfluous
JeodC added a commit that referenced this issue Apr 28, 2024
Fix crash in hlsSystem::ComputePlayInfo(), #141
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
64bit Issue or pull relates to 64bit bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants