Skip to content

Missing test results on concurrency runner #4416

@wlldrygla

Description

@wlldrygla

Description of the bug/issue

Missing Test Results on Concurrency Runner

When running tests with the concurrency runner, test results are occasionally lost. This issue appears to be a race condition related to the timing of inter-process communication (IPC) and promise resolution in the main and worker processes. The problem seems to happen about 1 in 10 times.

Detailed Description

In the worker process, the testSuiteFinished function calls this.sendReportToParentWorker(), which uses process.send() to transmit the test report. This is a non-blocking "fire and forget" operation.

Simultaneously, the main process is waiting on a promise (piscina.run().then(...)) that resolves when the worker task completes. The core of the problem is that this promise resolves immediately after the worker task finishes, but before the IPC message with the test report has had a chance to arrive.

This creates a race:

The worker process sends the report via process.send().

The piscina.run() promise in the main process resolves.

The main process proceeds with its logic, assuming all data has been received.

The IPC message with the report arrives too late to be processed by the main process's listening logic, causing the report to be lost.

Logs:

I added some logs to runWorkerTask when the message of testsuite_finished arrives and the promise solves

The problem seems to be, that sometimes, the promise solves before the arrival of the message from ICP (10 ms) and this make the report step have less results than runned tests.

NORMAL BEHAVIOR:
[testsuite_finished - TRIGGER - before] Task  chrome: tests/TEST_MENU/TEST-CASE.ts  2025-09-04T14:12:22.086Z
[testsuite_finished - TRIGGER - after] Task  chrome: tests/TEST_MENU/TEST-CASE.ts  2025-09-04T14:12:22.089Z
[THEN] Task  chrome: tests/TEST_MENU/TEST-CASE.ts  2025-09-04T14:12:22.089Z
┌ ────────────────── ✖  chrome: tests/TEST_MENU/TEST-CASE.ts
...test logs
[END] Task  chrome: tests/TEST_MENU/TEST_CASE.ts  finished with SUCCESS 2025-09-04T14:25:48.333Z
...report

ISSUE:
[THEN] Task  chrome: tests/TEST_MENU/TEST_CASE.ts  2025-09-04T14:25:48.307Z
┌ ────────────────── ✖  chrome: tests/TEST_MENU/TEST-CASE.ts
...test logs
[END] Task  chrome: tests/TEST_MENU/TEST_CASE.ts  finished with SUCCESS 2025-09-04T14:25:48.333Z
after promise on runMultipleTestWorkers 2025-09-04T14:25:48.333Z
Before reportResults 2025-09-04T14:25:48.336Z
Running reportResults 2025-09-04T14:25:48.336Z

  ✨ PASSED. 52 total assertions (1m 30s)
[testsuite_finished - TRIGGER - before] Task  chrome: tests/TEST_MENU/TEST-CASE.ts  2025-09-04T14:25:48.343Z
 Wrote HTML report file to: /home/will/nightwatch/PROJECT-NAME/reports/nightwatch-html-report/index.html
[testsuite_finished - TRIGGER - after] Task  chrome: tests/TEST_MENU/TEST-CASE.ts  2025-09-04T14:12:22.350Z

After reportResults 2025-09-04T14:25:48.457Z

The runWorkerTask set promises and receive the message of testsuite_finished

The green line shows the flow that Node does, the two red ones, shows the race condition, because Node keep running after the ICP message is sent, but it still not arrived

Image

Steps to reproduce

Since we cannot reproduce the issue on demand due to its nature, its a race condition and there is no way to delay the ICP message, we've instrumented the test runner with detailed logs to capture the behavior when the problem occurs. Based on these logs, we've verified the following:

Promise and async Behavior are not the cause: We have fixed all known async and Promise-related issues, including ensuring all await calls are correctly placed. The problem persists, indicating it's not a standard test-execution-flow issue.

The Test Completes Successfully: The logs show that the test case itself passes and all test hooks (afterEach, after) execute and complete successfully. This proves that the test is not hanging or failing before the reporting stage.

I tried force unfinished promises, but even with this, Nightwatch just go to next steps when the promise is solved.

Log Evidence

Below are examples of two scenarios where reports are missing, but the logs show the test has fully completed in the worker process.

Scenario 1: All Logs Return, but Report is Missing

The logs show a full test run, including all commands and hooks, completing successfully. The worker process successfully deletes the browser session and completes its [after] hook. The parent process logs a PASSED result, but the report for that specific test suite is not generated or is missing.

chrome ✔ chrome [Test case Three] Get email (15.167s)                                                                                                                                                           │
│   chrome → Running [after]:                                                                                                                                                                                                        │
│   chrome → Running command: end ()                                                                                                                                                                                                                 │
│   chrome → Running command: session ('delete', [Function])                                                                                                                                                                                         │
│   chrome   Request DELETE http                                                                                                             │
│   chrome   Response 200 DELETE http:                                                                                                   │
│   chrome { value: null }                                                                                                                                                                                                                           │
│   chrome → Completed command: end () (312ms)                                                                                                                                                                                                       │
│   chrome → Completed command: session ('delete', [Function]) (310ms)                                                                                                                                                                               │
│   chrome → Completed [after].                                                                                                                                                                                                                      │
│                                                                                                                                                                                                                                                    │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

  ✨ PASSED. 51 total assertions (1m 38s)
 Wrote HTML report file to: /home/will/nightwatch/e2e/reports/nightwatch-html-report/index.html

 Wrote Rerun Json report file to: /home/will/nightwatch/e2e/reports/minimal_report.json
 Wrote JSON report file to: /home/will/nightwatch/e2e/reports/CHROME_139.0.7258.66__test_case_one.json
 Wrote JSON report file to: /home/will/nightwatch/e2e/reports/CHROME_139.0.7258.66__test_case_two.json
 Wrote XML report file to: /home/will/nightwatch/e2e/reports/CHROME_139.0.7258.66__test_case_one.xml
 Wrote XML report file to: /home/will/nightwatch/e2e/reports/CHROME_139.0.7258.66__test_case_two.xml

Missing the third test

Scenario 2: Logs are Cut Off, but Container Shows Full Execution

In this scenario, the returned process's log output from worker stops abruptly after the Request DELETE http line, this made me think that even the messages with logs from child worker someties arrive after the promise end. However, we can confirm from the container logs that the worker thread continued to execute and completed the test and teardown.

chrome → Completed command: url          
chrome → Running [afterEach]:                                                                                                                                                                                                   
chrome → Completed [afterEach].                                                                                                                                                                                                 
chrome ✔ chrome [Test case two] Step (7.49s)                                                                                                                                             
chrome → Running [after]:                                                                                                                                                                                                       
chrome BEFORE BROWSER END  - test_case_two                                                                                                                                                                                                    
chrome → Running command: end ()                                                                                                                                                                                                
chrome → Running command: session ('delete', [Function])                                                                                                                                                                        
chrome   Request DELETE http                                                                                                
                                                                                                                                                                                                                                │└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

  ✨ PASSED. 52 total assertions (1m 30s)
 Wrote HTML report file to: /home/will/nightwatch/e2e/reports/nightwatch-html-report/index.html

 Wrote Rerun Json report file to: /home/will/nightwatch/e2e/reports/minimal_report.json
 Wrote XML report file to: /home/will/nightwatch/e2e/reports/CHROME_139.0.7258.66__test_case_one.xml
 Wrote XML report file to: /home/will/nightwatch/e2e/reports/CHROME_139.0.7258.66__test_case_three.xml
 Wrote JSON report file to: /home/will/nightwatch/e2e/reports/CHROME_139.0.7258.66__test_case_one.json
 Wrote JSON report file to: /home/will/nightwatch/e2e/reports/CHROME_139.0.7258.66__test_case_three.json

Conclusion from Investigation

The data from the logs strongly suggests the problem is not within the test execution itself but in the communication phase. Specifically, the test worker process successfully completes its work and sends the test report message. However, the parent process does not receive this message in time, leading to the reports been generated without one or more tests. The missing assertions in the final report (should be 83 assertions) further support the hypothesis that some test results never reach the parent process. 🕵️‍♂️

Running local on my machine, with 3 tests, I achieved to reproduce this running a lot of times.
In the schedulers, with 30+ tests, running all day each hour, 2~3 times a day we see 1 test missing
In the scheduler with 80+ tests, we already saw more then 4 tests missed.

Sample test

Command to run

Verbose Output

NORMAL BEHAVIOR:
[testsuite_finished - TRIGGER - before] Task  chrome: tests/TEST_MENU/TEST-CASE.ts  2025-09-04T14:12:22.086Z
[testsuite_finished - TRIGGER - after] Task  chrome: tests/TEST_MENU/TEST-CASE.ts  2025-09-04T14:12:22.089Z
[THEN] Task  chrome: tests/TEST_MENU/TEST-CASE.ts  2025-09-04T14:12:22.089Z
┌ ────────────────── ✖  chrome: tests/TEST_MENU/TEST-CASE.ts
...test logs
[END] Task  chrome: tests/TEST_MENU/TEST_CASE.ts  finished with SUCCESS 2025-09-04T14:25:48.333Z
...report

ISSUE:
[THEN] Task  chrome: tests/TEST_MENU/TEST_CASE.ts  2025-09-04T14:25:48.307Z
┌ ────────────────── ✖  chrome: tests/TEST_MENU/TEST-CASE.ts
...test logs
[END] Task  chrome: tests/TEST_MENU/TEST_CASE.ts  finished with SUCCESS 2025-09-04T14:25:48.333Z
after promise on runMultipleTestWorkers 2025-09-04T14:25:48.333Z
Before reportResults 2025-09-04T14:25:48.336Z
Running reportResults 2025-09-04T14:25:48.336Z

  ✨ PASSED. 52 total assertions (1m 30s)
[testsuite_finished - TRIGGER - before] Task  chrome: tests/TEST_MENU/TEST-CASE.ts  2025-09-04T14:25:48.343Z
 Wrote HTML report file to: /home/will/nightwatch/PROJECT-NAME/reports/nightwatch-html-report/index.html
[testsuite_finished - TRIGGER - after] Task  chrome: tests/TEST_MENU/TEST-CASE.ts  2025-09-04T14:12:22.350Z

After reportResults 2025-09-04T14:25:48.457Z




Before the logs on the Nightwatch, I reproduce the issue twice:
One:
chrome ✔ chrome [Test case Three] Get email (15.167s)                                                                                                                                                           │
│   chrome → Running [after]:                                                                                                                                                                                                        │
│   chrome → Running command: end ()                                                                                                                                                                                                                 │
│   chrome → Running command: session ('delete', [Function])                                                                                                                                                                                         │
│   chrome   Request DELETE http                                                                                                             │
│   chrome   Response 200 DELETE http:                                                                                                   │
│   chrome { value: null }                                                                                                                                                                                                                           │
│   chrome → Completed command: end () (312ms)                                                                                                                                                                                                       │
│   chrome → Completed command: session ('delete', [Function]) (310ms)                                                                                                                                                                               │
│   chrome → Completed [after].                                                                                                                                                                                                                      │
│                                                                                                                                                                                                                                                    │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

  ✨ PASSED. 51 total assertions (1m 38s)
 Wrote HTML report file to: /home/will/nightwatch/e2e/reports/nightwatch-html-report/index.html

 Wrote Rerun Json report file to: /home/will/nightwatch/e2e/reports/minimal_report.json
 Wrote JSON report file to: /home/will/nightwatch/e2e/reports/CHROME_139.0.7258.66__test_case_one.json
 Wrote JSON report file to: /home/will/nightwatch/e2e/reports/CHROME_139.0.7258.66__test_case_two.json
 Wrote XML report file to: /home/will/nightwatch/e2e/reports/CHROME_139.0.7258.66__test_case_one.xml
 Wrote XML report file to: /home/will/nightwatch/e2e/reports/CHROME_139.0.7258.66__test_case_two.xml


Two:
chrome → Completed command: url          
chrome → Running [afterEach]:                                                                                                                                                                                                   
chrome → Completed [afterEach].                                                                                                                                                                                                 
chrome ✔ chrome [Test case two] Step (7.49s)                                                                                                                                             
chrome → Running [after]:                                                                                                                                                                                                       
chrome BEFORE BROWSER END  - test_case_two                                                                                                                                                                                                    
chrome → Running command: end ()                                                                                                                                                                                                
chrome → Running command: session ('delete', [Function])                                                                                                                                                                        
chrome   Request DELETE http                                                                                                
                                                                                                                                                                                                                                │└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

  ✨ PASSED. 52 total assertions (1m 30s)
 Wrote HTML report file to: /home/will/nightwatch/e2e/reports/nightwatch-html-report/index.html

 Wrote Rerun Json report file to: /home/will/nightwatch/e2e/reports/minimal_report.json
 Wrote XML report file to: /home/will/nightwatch/e2e/reports/CHROME_139.0.7258.66__test_case_one.xml
 Wrote XML report file to: /home/will/nightwatch/e2e/reports/CHROME_139.0.7258.66__test_case_three.xml
 Wrote JSON report file to: /home/will/nightwatch/e2e/reports/CHROME_139.0.7258.66__test_case_one.json
 Wrote JSON report file to: /home/will/nightwatch/e2e/reports/CHROME_139.0.7258.66__test_case_three.json

Nightwatch Configuration

module.exports = {
  src_folders: "tests",
  globals_path: "globals.js",
  selenium: {
    version2: true,
    host: baseFile.selenoid_host,
    port: baseFile.selenoid_port,
  },
  test_workers: {
    enabled: true,
    workers: "auto",
  },
  test_settings: test_settings,
};

var test_settings = {
  default: {
    globals: variablesFile,
    desiredCapabilities: {
      "selenoid:options": {
        enableVNC: true,
        enableVideo: false,
        use_ssl: false,
        proxy: baseFile.proxy,
      },
    },
    output_folder: "reports/",
    persist_globals: true,
    screenshots: {
      enabled: true,
      on_failure: true,
      path: "screenshots",
    },
    skip_testcases_on_fail: true,
    silent: true,
  },
  end_session_on_fail: false,
  chrome: chromeBrowser([
    "no-sandbox",
    "disable-gpu",
    "headless",
    "window-size=1920,1080",
  ]),
  firefox: firefoxBrowser(),
  safari: safariBrowser(),
  edge: edgeBrowser(),
  iPhone: genericIphoneDevice(),
  android: genericAndroidDevice(),
};

module.exports.test_settings = test_settings;

Nightwatch.js Version

nightwatch-3.12.2

Node Version

v22.18.0

Browser

Chrome 139

Operating System

Linux

Additional Information

Proposed Solution: Acknowledgment Handshake

The proposed solution is to implement a two-way acknowledgment handshake between the child worker and the parent process. This ensures that the child worker only proceeds after it has received a confirmation message from the parent, guaranteeing the test report has been received and processed.

Here is how the implementation would work:

The child worker sends the test report with a unique message ID.

The child worker then waits for a confirmation message from the parent with that same ID.

The parent process, upon receiving the report message, immediately sends back an acknowledgment message.

Once the child worker receives the acknowledgment, it can safely finish its process. A timeout is also included to prevent the child from hanging if the parent process fails.

Code Examples

  1. In the Child Worker Process
    This code modifies the child process to send the report and await a confirmation.
sendReportToParentWorker = async () => {
    return new Promise((resolve, reject) => {
        const messageId = Date.now(); 
        
        const ackListener = (msg) => {
            if (msg.type === 'report_received' && msg.id === messageId) {
                process.removeListener('message', ackListener); 
                resolve(); // Resolve the promise once acknowledged
            }
        };

        process.on('message', ackListener);
        
        process.send({
            type: 'test_report',
            id: messageId,
            payload: this.reporter.testResults.eventDataToEmit
        });
        
        setTimeout(() => {
            process.removeListener('message', ackListener);
                 }, 1000); 
    });
};

The testSuiteFinished method is then updated to await this new function.

testSuiteFinished = async (failures) => {
    // ...
    if (Concurrency.isWorker()) {
        await this.sendReportToParentWorker(); // The child now waits here
    }
    NightwatchEventHub.emit(TestSuiteHook.finished, this.reporter.testResults.eventDataToEmit);
    return failures;
};

By adding the await, the testSuiteFinished function (and the piscina runner) will not complete and resolve its promise until the parent acknowledges receipt of the message, thereby eliminating the race condition.

  1. In the Parent Process
    This code adds a listener to the parent process that handles the incoming report and sends an acknowledgment back.
   switch (result.type) {
        case 'testsuite_finished':
            if (result.id) {
                port2.postMessage({
                    type: 'report_received',
                    id: result.id
                });
            }
            result.itemKey = this.label;
            this.emit('message', result);
            break;

        case 'stdout':
            this.writeToStdOut(result.data);
            break;
    }
};

This change is minimal on the parent's side, as it only needs to react to the incoming message and send a reply, keeping its existing promise-based logic intact.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions