Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Inventory::SyncLoop to Avoid Blocking Agent Stop #445

Open
cborla opened this issue Dec 18, 2024 · 7 comments · May be fixed by #502
Open

Improve Inventory::SyncLoop to Avoid Blocking Agent Stop #445

cborla opened this issue Dec 18, 2024 · 7 comments · May be fixed by #502
Assignees
Labels
level/task Task issue module/agent module/inventory Inventory module type/enhancement Enhancement issue

Comments

@cborla
Copy link
Member

cborla commented Dec 18, 2024

Description

The current implementation of the Inventory::SyncLoop function in the Inventory module may cause unnecessary delays during the agent shutdown process. Specifically, the Scan() method is called after the stop condition (m_stopping) is triggered, leading to potential delays if Scan() is a time-consuming operation.

Problem Details

  • The Scan() function is executed after the condition variable (m_cv.wait_for) returns, even if m_stopping is true.
  • This behavior delays the shutdown of the agent when Scan() involves intensive operations or long-running tasks.
  • This could lead to poor user experience during shutdown and inefficient resource handling.

Steps to Reproduce

  • Start the agent with the Inventory module enabled.
  • Trigger a stop request for the agent while SyncLoop is running.
  • Observe that Scan() executes even after the stop request, delaying the shutdown.

Proposed Solution

Update the SyncLoop function to verify the m_stopping condition after the wait operation and before calling Scan(). This ensures that no unnecessary Scan() operation is performed after a stop request is issued.

Option

void Inventory::SyncLoop()
{
    LogInfo("Module started.");

    if (m_scanOnStart && !m_stopping)
    {
        Scan();
    }

    while (!m_stopping)
    {
        {
            std::unique_lock<std::mutex> lock{m_mutex};
            m_cv.wait_for(lock,
                std::chrono::milliseconds{m_intervalValue}, [&]() { return m_stopping; });
        }

        if (!m_stopping)
        {
            Scan();
        }
    }
    std::unique_lock<std::mutex> lock{m_mutex};
    m_spDBSync.reset(nullptr);
}
@jr0me
Copy link
Member

jr0me commented Dec 19, 2024

As a note for when this issue goes into development, the first scan

    if (m_scanOnStart && !m_stopping)
    {
        Scan();
    }

Can happen while another threads calls Stop() on the Inventory, perhaps this scan should also try to capture the mutex before checking m_stopping, and block the Stop call until the scan finishes.

@xkazinx xkazinx self-assigned this Jan 14, 2025
@wazuhci wazuhci moved this from Backlog to In progress in XDR+SIEM/Release 5.0.0 Jan 14, 2025
@xkazinx xkazinx linked a pull request Jan 16, 2025 that will close this issue
@xkazinx
Copy link
Member

xkazinx commented Jan 16, 2025

Is there something to do in regards to this comment?

@cborla
Copy link
Member Author

cborla commented Jan 16, 2025

Is there something to do in regards to this comment?

Hi @xkazinx
The ideal would be to investigate the case that @jr0me raises, make a test and according to that we see which solution is more convenient.

@wazuhci wazuhci moved this from In progress to Pending review in XDR+SIEM/Release 5.0.0 Jan 17, 2025
@wazuhci wazuhci moved this from Pending review to In review in XDR+SIEM/Release 5.0.0 Jan 21, 2025
@xkazinx
Copy link
Member

xkazinx commented Jan 23, 2025

  • Worked fully on environment setup for this issue:
    • Had an Ubuntu 24 installation where initializing apps stopped working.
    • Ended up doing a clean installation, and after updating, the same thing happening.
    • Tried to install it again, without updating, and when booting the pendrive, the PC would reboot automatically.
    • Went to Windows, installed some VMs on VMWare and it started to freeze after the OS initialization.
    • Reinstalled VMWare, and the VM started to throw initialization errors due to it not probably being reinstalled with errors.
    • Installed VirtualBox, with a VM, which worked, but I couldn't ping from host machine to the VM.
    • Finally, created an Ubuntu22 pendrive, installed the OS, installed VMWare and it seems to be working, which I have to finish configuring to continue.

@xkazinx
Copy link
Member

xkazinx commented Jan 23, 2025

  • After the Ubuntu 22 installation, an Ubuntu 24 VM with an indexer, manager and dashboard was installed, along with another VM with an agent, meant for development, where various issues appeared in the build process, where it now builds after solving them.
  • Environments are now ready for the further development of the issue.

@wazuhci wazuhci moved this from In review to In progress in XDR+SIEM/Release 5.0.0 Jan 24, 2025
@xkazinx
Copy link
Member

xkazinx commented Jan 24, 2025

  • A runtime error was encountered where the SSL certificates weren't functioning between the agent and the mock-server which, after an investigation, it was found that it was due to the config file not being read. After fixing it, the time taken for the process to finish after interrupting the execution varies a lot between different tests made by @LucioDonda @cborla and me:

  • Further investigation has to be made to determine why there is such a different time frame between the tests.

@LucioDonda
Copy link
Member

Based on the previous comment by @xkazinx and the analysis done, the lowest limit can have plenty of variation depending on the host, but IMO the stop signal can't cut Scan() process in the middle.

  • Based on the proposed solution:
        if (!m_stopping)
        {
            Scan();
        }

the m_stopping check is redundant because Scan() calls TryCatchTask that do this same check inside:

void Inventory::TryCatchTask(const std::function<void()>& task) const
{
    try
    {
        if (!m_stopping)
        {
            task();
        }
        else
        {
            LogTrace("No Scanning during stopping");
        }
    }

Regarding the comment of @jr0me this is what's actually happening and that's why we cannot stop the scan earlier.

c.c. @cborla

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
level/task Task issue module/agent module/inventory Inventory module type/enhancement Enhancement issue
Projects
Status: In progress
Development

Successfully merging a pull request may close this issue.

4 participants