Potential error in the RecFusion SDK 2.2.0 while closing many sensors

Hi, I have encountered a potential error while closing many sensors (Orbbec Astra) with the Sensor::close() function:

[Core.Platform] ***** Encountered fatal exception: Access Violation, PID 5684 *****
 0# ImFusion::Platform::getLastWindowsErrorMessage in ImFusionCore
 1# ImFusion::Platform::getLastWindowsErrorMessage in ImFusionCore
 2# RtlDeleteAce in ntdll
 3# RtlRaiseException in ntdll
 4# KiUserExceptionDispatcher in ntdll
 5# 0x00007FF87AB8CB48 in OpenNI2
 6# oniWaitForAnyStream in OpenNI2
 7# ImFusion::OpenNI2RGBDStream::captureThread in ImFusionRGBD
 8# ImFusion::OpenNI2RGBDStream::IR in ImFusionRGBD
 9# ImFusion::FileSensor::~FileSensor in ImFusionRGBD
10# configthreadlocale in ucrtbase
11# BaseThreadInitThunk in KERNEL32
12# RtlUserThreadStart in ntdll

If you read closely the error trace, you can see that thread management is involved RtlUserThreadStart (hence the error occurs with more than 1 sensor?); furthermore you see that the ~FileSensor destructor is involved (related to closing the sensor?) and at the same time that some streaming is involved oniWaitForAnyStream/ImFusion::OpenNI2RGBDStream::captureThread (maybe internally started with the Sensor::start()-like code?).

This error happens ONLY when working with >=2 sensors. Furthermore, I have found that calling Sensor::stop() before Sensor::close() does not generate this error. However, the main problem is that I am not using the listener-based capturing and not calling the Sensor::start() function. I have seen this behaviour on two laptop configurations (Windows 10/Windows 10 IoT; Intel i3/i5; with and without external graphic card). Do you consider this behaviour a bug, as the API silently suggests that start() and stop() functions should be used only in tandem with the listener-based capturing?

Hello Patryk,

this is actually a bug on our side. Stop should have been internally called by close() function. In future version of SDK we will deprecate start and stop functions. But for now, please use stop before calling close

Best regards,
Olga

1 Like

Hello Olga, thank you for this information!

Best regards,
Patryk

Dear @gavranovic, do you know when you will release the new SDK? I am unsure how many people have the same problems, but I encounter many “access violations” exceptions in the SDK code. Honestly, I cannot say if this is the environment I work in (although I work on several machines), the code I write (sometimes it is very simple), or the SDK indeed. A new SDK would be a good thing to try for me :slight_smile:

Hi Patryk,

we should release a new SDK maximum by the end of the year. Can you please give me more information about the issues that you’re encountering? Maybe small sampe code which reproduces these problems?

Thanks,
Olga

1 Like

Hi @gavranovic,

I had some problems with a parallel reconstruction using listeners (more info in another thread; code example: https://github.com/iniphy/recfusion-parallel-acquisition). I think part of my problems come from the fact that I am using the virtual machine with Windows for the development. Sometimes, I am not able to reproduce errors on the “native” machine. I will keep you posted if I have something more concrete :slight_smile:

Best regards,
Patryk

Hi @gavranovic,

I have another example of a potential issue. Sometimes, I get another memory access violation in the SDK:

Image recorder thread terminated
[RGBDMultiSequence] Timestamps of at least one sequence are all zero. Frames will be matched by position instead of timestamps
2022-08-18 17:38:32.422 INFO  [9440] [ParallelReconstruction::SaveMeshToDiskStorage@71] Reconstructed mesh is correct (vertices=15518; triangles=24963)
[Core.Platform] ***** Encountered fatal exception: Access Violation, PID 4948 *****
0# ImFusion::Platform::getLastWindowsErrorMessage in ImFusionCore
1# ImFusion::Platform::getLastWindowsErrorMessage in ImFusionCore
2# RtlDeleteAce in ntdll
3# RtlRaiseException in ntdll
4# KiUserExceptionDispatcher in ntdll
5# ImFusion::SurfaceReconstructionData::setActive in ImFusionRGBD

So, the reconstruction passed correctly. The Reconstruction::good() method returns true, the Mesh is non-empty (as shown on the listing above). This error happens in the following code:

LOG_INFO << "Reconstructed mesh is correct (vertices=" << mesh_3d.vertexCount()
         << "; triangles=" << mesh_3d.triangleCount() << ")";
// Perform the post-processing on the mesh
auto decimate_status = mesh_3d.decimate(decimation_min_len, decimation_max_len, false);
LOG_WARNING_IF(!decimate_status) << "Mesh decimation has failed";
if (!mesh_3d.save(string(file_path).c_str(), Mesh::PLY)) {
    throw ScanningError(ErrorCode::RECONSTRUCTED_MESH_SAVE_FAILED);
}

I am pretty sure that the Mesh::save() is not a problem, as I am working without a license on the virtual machine, I assume almost nothing happens there (you just check for the license being there and quit otherwise?). So, maybe the Mesh::decimate()? Does it use ImFusion::SurfaceReconstructionData::setActive()?

This error is highly non-deterministic. After a production-PC reset I won’t get it for many reconstruction, then on another PC run I will get it quite often. Could it be related to the memory allocation/memory layout given by the OS? Maybe something is wrong on the crossection of your memory management and, I don’t know, Address Space layout randomization? Or maybe it is just me again :smiley:

Looking forward to hearing from you!
Patryk

Hello Patryk,

as you guessed, Mesh::save doesn’t do anything except checking the license. I also doubt that the problem is caused by Mesh::decimate(). ImFusion::SurfaceReconstructionData::setActive() is being called inside of Reconstruction class. Could it be that you are using your Reconstruction object from different threads simultaneously? Or you always access it in a serial manner?

Best,
Olga

Hi @gavranovic,

So, I am using the Reconstruction class on two threads but in a serial manner. First, on the spawned thread, I only call Reconstruction::addFrame(), then I synchronize this thread with the main thread (with join()) and then I acces the Reconstruction::good() followed by Reconstruction::getMesh() function, and these two calls happen on the main thread only.

PS. I am dumb because I wasn’t reading closely, as ImFusion::SurfaceReconstructionData::setActive() indeed points to a some kind of a reconstruction data, and decimate() is the Mesh class method :smiley: Thank you for your helpful hint, it must be something wrong with the way I touch reconstruction class.

Hi @gavranovic,

I restructured the code. The reconstruction object is created and destroyed in the same function called on one thread. Unfortunately, the error still persists. Could you guide me in solving this? Maybe you could cast more light on the nature of the ImFusion::SurfaceReconstructionData::setActive() why, when, how many times it is called?