Skip to content

Conversation

borjamunozf
Copy link
Contributor

@borjamunozf borjamunozf commented May 11, 2025

This change addresses item #4283

The following changes are proposed:

The shlex (and CompileDatabase infoByFilePath) is only responsible for an average usage in the heap of ~4GB across our different projects. It also takes an avg of 35-40sg to parse.

  • After digging a bit in the code and benchmarking it, it looks like an almost direct improvement to the split function is to avoid the naive string concatenation and extra heap allocations with a token array and joining at the end.

The purpose of this change

  • Reduce heap usage at least 50-60% in mid to large CompileDatabase
  • Improve response times after the configuring step and processing of the Compile Database.
  • Function signature and functionality respected avoiding possible side effects.

Other Notes/Information

Probably there could be room to improve or optimize further, but I think this is a almost direct and easy change to apply.

Tests seems to pass ok.

Some dummy file to bench:


import { CompilationDatabase, CompileCommand } from './compilationDatabase';
import { fs } from './pr';
import * as v8 from 'v8';
import * as shlex from './shlex';
import { time } from 'console';
import * as util from './util';

const databasePaths = [
    '/home/borjamf/workspace/vscode-cmake-tools/src/compile_commands.json',
];

function logHeapUsage() {
    const heapStats = v8.getHeapStatistics();
    console.log('Heap Statistics:');
    console.log(`Total heap size: ${heapStats.total_heap_size / 1024 / 1024} MB`);
    console.log(`Used heap size: ${heapStats.used_heap_size / 1024 / 1024} MB`);
    console.log(`Heap size limit: ${heapStats.heap_size_limit / 1024 / 1024} MB`);
}

async function buildCompileCommands(databasePaths: string[]): Promise<CompileCommand[]> {
    const database: CompileCommand[] = [];
    for (const path of databasePaths) {
        if (!await fs.exists(path)) {
            continue;
        }

        const fileContent = await fs.readFile(path);
        try {
            const content = JSON.parse(fileContent.toString()) as CompileCommand[];
            database.push(...content);
        } catch (e) {
            return database
        }
    }
    return database;
}

async function main() {
    const cmd_commands: CompileCommand[] = await buildCompileCommands(databasePaths);
    logHeapUsage();

    console.log("Before building the map")
    console.time('Total time spent building infoByFilePath: current-shlex');
    const infoByFilePath = cmd_commands.reduce(
                (acc, cur) => acc.set(cur.file, {
                    directory: cur.directory,
                    file: cur.file,
                    output: cur.output,
                    command: cur.command,
                    arguments: cur.arguments ? cur.arguments : [...shlex.split(cur.command)]
                }),
                new Map<string, CompileCommand>()
    );

    console.timeEnd('Total time spent building infoByFilePath: shlexSplit');

    console.log("After building the map")
    logHeapUsage();
}

main();

Heap stats with current split: 4GB + time 48.452s
imagen

Heap stats after optimized split: 600Mb + time ~16s
imagen

@borjamunozf borjamunozf force-pushed the feat/improv-shlex-v1 branch 2 times, most recently from 8758d75 to e636a80 Compare May 11, 2025 18:50
Copy link
Collaborator

@gcampbell-msft gcampbell-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good news! I hunted through some historical testing I did of this, and this is identical to some changes I had tested out, but you were able to perform some better benchmarking, so I'm excited to take this!

Before approving and merging though, could you please add a CHANGELOG entry? Please credit yourself following the pattern people have used before.

@gcampbell-msft
Copy link
Collaborator

@borjamunozf Oh, and there are some formatting errors that the linter is complaining about, that's why the builds are failing. Could you fix this as well?

It's simple enough that I could fix it, but then we would need a different approver, so it'd be best if you could make those changes. Thanks!

@borjamunozf borjamunozf changed the title feat: improve shlex.split() function heap allocs & perf for large compile commands improv: shlex.split() function heap allocs & perf for large compile commands May 12, 2025
@borjamunozf
Copy link
Contributor Author

@borjamunozf Oh, and there are some formatting errors that the linter is complaining about, that's why the builds are failing. Could you fix this as well?

It's simple enough that I could fix it, but then we would need a different approver, so it'd be best if you could make those changes. Thanks!

There it goes! Thanks :)

@borjamunozf
Copy link
Contributor Author

@microsoft-github-policy-service agree

@gcampbell-msft
Copy link
Collaborator

@microsoft-github-policy-service agree

Could you try again? It seems like the license/cla still hasn't reported.

@borjamunozf
Copy link
Contributor Author

@microsoft-github-policy-service agree

@borjamunozf
Copy link
Contributor Author

Don't know, it seems that it's not picking it up. Not sure if I'm missing something.

Is the only step required as far as I recall, right?

@borjamunozf
Copy link
Contributor Author

It does redirect to my home, but perhaps you can trigger this?

https://cla-assistant.io/check/microsoft/vscode-cmake-tools?pullRequest=4458

Mentioned in the COMMON_ISSUES cla

@gcampbell-msft
Copy link
Collaborator

It does redirect to my home, but perhaps you can trigger this?

https://cla-assistant.io/check/microsoft/vscode-cmake-tools?pullRequest=4458

Mentioned in the COMMON_ISSUES cla

Did you also try that link? I attempted it and nothing seemed to happen

@borjamunozf borjamunozf force-pushed the feat/improv-shlex-v1 branch from 0b05a04 to 369c025 Compare May 13, 2025 22:17
@borjamunozf
Copy link
Contributor Author

Finally. I'm dumb, the commits were pushed with my company email account.
Fixed and seems that CLA is ready.

Sorry for the bother!

@gcampbell-msft gcampbell-msft enabled auto-merge (squash) May 14, 2025 14:34
@gcampbell-msft gcampbell-msft disabled auto-merge May 14, 2025 20:04
@borjamunozf
Copy link
Contributor Author

The macOS pipeline is failing always in the same tests. I have read that you mentioned about a flaky test, could be this?
Should we just wait then?

@gcampbell-msft
Copy link
Collaborator

@borjamunozf I will force merge this, other builds are failing due to this, and we recently tried to fix this but it doesn't seem to have worked. Thanks for this contribution, merging.

@gcampbell-msft gcampbell-msft merged commit 65880d8 into microsoft:main May 16, 2025
3 of 5 checks passed
fajkomix1990 pushed a commit to fajkomix1990/mati---glowny that referenced this pull request Jul 2, 2025
…ommands (microsoft#4458)

* direct improve shlex heap allocs & perf for large compile commands

* Fix lint issues & update CHANGELOG
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants