Potential error in the RecFusion SDK 2.2.0 while closing many sensors

Hi, I have encountered a potential error while closing many sensors (Orbbec Astra) with the Sensor::close() function:

[Core.Platform] ***** Encountered fatal exception: Access Violation, PID 5684 *****
 0# ImFusion::Platform::getLastWindowsErrorMessage in ImFusionCore
 1# ImFusion::Platform::getLastWindowsErrorMessage in ImFusionCore
 2# RtlDeleteAce in ntdll
 3# RtlRaiseException in ntdll
 4# KiUserExceptionDispatcher in ntdll
 5# 0x00007FF87AB8CB48 in OpenNI2
 6# oniWaitForAnyStream in OpenNI2
 7# ImFusion::OpenNI2RGBDStream::captureThread in ImFusionRGBD
 8# ImFusion::OpenNI2RGBDStream::IR in ImFusionRGBD
 9# ImFusion::FileSensor::~FileSensor in ImFusionRGBD
10# configthreadlocale in ucrtbase
11# BaseThreadInitThunk in KERNEL32
12# RtlUserThreadStart in ntdll

If you read closely the error trace, you can see that thread management is involved RtlUserThreadStart (hence the error occurs with more than 1 sensor?); furthermore you see that the ~FileSensor destructor is involved (related to closing the sensor?) and at the same time that some streaming is involved oniWaitForAnyStream/ImFusion::OpenNI2RGBDStream::captureThread (maybe internally started with the Sensor::start()-like code?).

This error happens ONLY when working with >=2 sensors. Furthermore, I have found that calling Sensor::stop() before Sensor::close() does not generate this error. However, the main problem is that I am not using the listener-based capturing and not calling the Sensor::start() function. I have seen this behaviour on two laptop configurations (Windows 10/Windows 10 IoT; Intel i3/i5; with and without external graphic card). Do you consider this behaviour a bug, as the API silently suggests that start() and stop() functions should be used only in tandem with the listener-based capturing?

Hello Patryk,

this is actually a bug on our side. Stop should have been internally called by close() function. In future version of SDK we will deprecate start and stop functions. But for now, please use stop before calling close

Best regards,
Olga

1 Like

Hello Olga, thank you for this information!

Best regards,
Patryk

Dear @gavranovic, do you know when you will release the new SDK? I am unsure how many people have the same problems, but I encounter many “access violations” exceptions in the SDK code. Honestly, I cannot say if this is the environment I work in (although I work on several machines), the code I write (sometimes it is very simple), or the SDK indeed. A new SDK would be a good thing to try for me :slight_smile:

Hi Patryk,

we should release a new SDK maximum by the end of the year. Can you please give me more information about the issues that you’re encountering? Maybe small sampe code which reproduces these problems?

Thanks,
Olga

1 Like

Hi @gavranovic,

I had some problems with a parallel reconstruction using listeners (more info in another thread; code example: https://github.com/iniphy/recfusion-parallel-acquisition). I think part of my problems come from the fact that I am using the virtual machine with Windows for the development. Sometimes, I am not able to reproduce errors on the “native” machine. I will keep you posted if I have something more concrete :slight_smile:

Best regards,
Patryk

Hi @gavranovic,

I have another example of a potential issue. Sometimes, I get another memory access violation in the SDK:

Image recorder thread terminated
[RGBDMultiSequence] Timestamps of at least one sequence are all zero. Frames will be matched by position instead of timestamps
2022-08-18 17:38:32.422 INFO  [9440] [ParallelReconstruction::SaveMeshToDiskStorage@71] Reconstructed mesh is correct (vertices=15518; triangles=24963)
[Core.Platform] ***** Encountered fatal exception: Access Violation, PID 4948 *****
0# ImFusion::Platform::getLastWindowsErrorMessage in ImFusionCore
1# ImFusion::Platform::getLastWindowsErrorMessage in ImFusionCore
2# RtlDeleteAce in ntdll
3# RtlRaiseException in ntdll
4# KiUserExceptionDispatcher in ntdll
5# ImFusion::SurfaceReconstructionData::setActive in ImFusionRGBD

So, the reconstruction passed correctly. The Reconstruction::good() method returns true, the Mesh is non-empty (as shown on the listing above). This error happens in the following code:

LOG_INFO << "Reconstructed mesh is correct (vertices=" << mesh_3d.vertexCount()
         << "; triangles=" << mesh_3d.triangleCount() << ")";
// Perform the post-processing on the mesh
auto decimate_status = mesh_3d.decimate(decimation_min_len, decimation_max_len, false);
LOG_WARNING_IF(!decimate_status) << "Mesh decimation has failed";
if (!mesh_3d.save(string(file_path).c_str(), Mesh::PLY)) {
    throw ScanningError(ErrorCode::RECONSTRUCTED_MESH_SAVE_FAILED);
}

I am pretty sure that the Mesh::save() is not a problem, as I am working without a license on the virtual machine, I assume almost nothing happens there (you just check for the license being there and quit otherwise?). So, maybe the Mesh::decimate()? Does it use ImFusion::SurfaceReconstructionData::setActive()?

This error is highly non-deterministic. After a production-PC reset I won’t get it for many reconstruction, then on another PC run I will get it quite often. Could it be related to the memory allocation/memory layout given by the OS? Maybe something is wrong on the crossection of your memory management and, I don’t know, Address Space layout randomization? Or maybe it is just me again :smiley:

Looking forward to hearing from you!
Patryk

Hello Patryk,

as you guessed, Mesh::save doesn’t do anything except checking the license. I also doubt that the problem is caused by Mesh::decimate(). ImFusion::SurfaceReconstructionData::setActive() is being called inside of Reconstruction class. Could it be that you are using your Reconstruction object from different threads simultaneously? Or you always access it in a serial manner?

Best,
Olga

Hi @gavranovic,

So, I am using the Reconstruction class on two threads but in a serial manner. First, on the spawned thread, I only call Reconstruction::addFrame(), then I synchronize this thread with the main thread (with join()) and then I acces the Reconstruction::good() followed by Reconstruction::getMesh() function, and these two calls happen on the main thread only.

PS. I am dumb because I wasn’t reading closely, as ImFusion::SurfaceReconstructionData::setActive() indeed points to a some kind of a reconstruction data, and decimate() is the Mesh class method :smiley: Thank you for your helpful hint, it must be something wrong with the way I touch reconstruction class.

Hi @gavranovic,

I restructured the code. The reconstruction object is created and destroyed in the same function called on one thread. Unfortunately, the error still persists. Could you guide me in solving this? Maybe you could cast more light on the nature of the ImFusion::SurfaceReconstructionData::setActive() why, when, how many times it is called?

Hello,

I am bumping the topic as we have started to experience the above-mentioned error in the ImFusion::SurfaceReconstructionData::setActive in ImFusionRGBD. We are using Window IoT 10 with AMD processor and GPU, and 3 Orbbec Astra sensors. As we haven’t experience this particular error in a long time on Intel platform, I am starting to suspect something on the connection between the GPU, .OS and OpenCL. The error as seen in the shell:

[Core.Platform] ***** Encountered fatal exception: Access Violation, PID 1308 *****
0# ImFusion::Platform::getLastWindowsErrorMessage in ImFusionCore
1# ImFusion::Platform::getLastWindowsErrorMessage in ImFusionCore
2# RtlDeleteAce in ntdll
3# RtlRaiseException in ntdll
4# KiUserExceptionDispatcher in ntdll
5# ImFusion::SurfaceReconstructionData::setActive in ImFusionRGBD

The RecFusion SDK produces the following info at the initialization:

[GL.ContextManager] OpenGL: 4.6.14802 Core Profile Context 22.1.2 30.0.14023.3004
[GL.ContextManager] Vendor: ATI Technologies Inc.
[GL.ContextManager] GPU:    AMD Radeon RX 6500 XT
[GL] Neither GL_NVX_gpu_memory_info nor GL_ATI_meminfo extensions available, can not query total GPU memory information.
[GL.ContextManager] Memory: 0 bytes of 0 bytes available
[Framework] ImFusionLib built without FreeType, OpenGL font rendering support disabled
QCoreApplication::applicationFilePath: Please instantiate the QApplication object first
[Framework] Available Plugins: ImFusionRGB-D, ImFusionStream, ImFusionVision.

In the Windows Event Viewer we can observe the following error description:

Faulting application name: PhysioServer.exe, version: 0.0.0.0, time stamp: 0x65e7979b
Faulting module name: ImFusionRGBD.dll, version: 2.2.0.0, time stamp: 0x61c20b15
Exception code: 0xc0000005
Fault offset: 0x0000000000173ca0
Faulting process id: 0xcac
Faulting application start time: 0x01da851c1c0478d2
Faulting application path: C:\Users\Skaner8\Desktop\PhysioServer_release_RF220\PhysioServer.exe
Faulting module path: C:\Users\Skaner8\Desktop\PhysioServer_release_RF220\plugins\ImFusionRGBD.dll
Report Id: f7e4ba22-2665-4efc-b3fd-b38b3f747388
Faulting package full name: 
Faulting package-relative application ID: 

Exception code: 0xc0000005 indicates memory access violation (see Redirecting or https://www.autodesk.com/support/technical/article/caas/sfdcarticles/sfdcarticles/A-fatal-error-has-occurred-0xc0000005-EXCEPTION-ACCESS-VIOLATION-and-crash-when-saving-the-file-in-FeatureCAM.html ; also we did try to run sfc with no results).

Detailed description in XML:

<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
  <Provider Name="Application Error" /> 
  <EventID Qualifiers="0">1000</EventID> 
  <Version>0</Version> 
  <Level>2</Level> 
  <Task>100</Task> 
  <Opcode>0</Opcode> 
  <Keywords>0x80000000000000</Keywords> 
  <TimeCreated SystemTime="2024-04-02T16:39:39.7281350Z" /> 
  <EventRecordID>3030</EventRecordID> 
  <Correlation /> 
  <Execution ProcessID="0" ThreadID="0" /> 
  <Channel>Application</Channel> 
  <Computer>DESKTOP-R8RPQSI</Computer> 
  <Security /> 
  </System>
  <EventData>
  <Data>PhysioServer.exe</Data> 
  <Data>0.0.0.0</Data> 
  <Data>65e7979b</Data> 
  <Data>ImFusionRGBD.dll</Data> 
  <Data>2.2.0.0</Data> 
  <Data>61c20b15</Data> 
  <Data>c0000005</Data> 
  <Data>0000000000173ca0</Data> 
  <Data>cac</Data> 
  <Data>01da851c1c0478d2</Data> 
  <Data>C:\Users\Skaner8\Desktop\PhysioServer_release_RF220\PhysioServer.exe</Data> 
  <Data>C:\Users\Skaner8\Desktop\PhysioServer_release_RF220\plugins\ImFusionRGBD.dll</Data> 
  <Data>f7e4ba22-2665-4efc-b3fd-b38b3f747388</Data> 
  <Data /> 
  <Data /> 
  </EventData>
  </Event>

Could you help us in the debugging of this problem?

Can you please create a process dump file and send it to us? You can follow the following description:

Download procdump from ProcDump - Sysinternals | Microsoft Learn

Unzip the folder

Open a console inside the unzipped folder

Run ./procdump64 -accepteula to accept the terms and conditions of usage

Run the application and immediately run procdump to monitor the process: ./path/to/ApplicationName.exe & ./procdump64 -mp -e ApplicationName.exe

You can also run procdump first, and wait for the process to be launched with: procdump64 -e -mp -w ApplicationName.exe and launch the process separately

This will generate a MiniPlus dump(-mp) when there is an unhandled exception or crash (-e). Check ./procdump64 for more options. The file will be inside the procdump unzipped folder.

With this we can further debug into the issue. From the info you provided the crash seems to be coming from trying to set the sensor height. Perhaps one of the sensors reported a zero value, but without the crash dump I cannot say for sure.

Please also make sure to install the latest GPU driver and, if installed, removed the Windows OpenCL compatibility pack