Skip to content

Daily Perf Improver: Research and Plan #190

@github-actions

Description

@github-actions

Performance Research and Improvement Plan

Executive Summary

FSharp.Control.AsyncSeq has several critical performance issues that significantly impact its usability with large datasets. The primary concerns are memory leaks in append operations and O(n²) performance in recursive patterns. This research provides a comprehensive analysis and improvement plan.

Current Performance Testing Infrastructure

Testing Setup

  • Performance Test File: tests/FSharp.Control.AsyncSeq.Tests/AsyncSeqPerf.fsx
  • Test Framework: Basic F# Script with System.Diagnostics.Stopwatch
  • Test Scale: N=1,000,000 elements
  • Major Gap: No modern benchmarking framework (BenchmarkDotNet)

Historical Performance Results

unfoldIter (N=1,000,000):
- Real: 00:00:08.565, GC gen0: 889, gen1: 2, gen2: 0

unfoldChooseIter (N=1,000,000):  
- Real: 00:00:10.818, GC gen0: 1114, gen1: 3, gen2: 0

replicate (N=1,000,000):
- Real: 00:00:01.530, GC gen0: 156, gen1: 1, gen2: 0

Critical Performance Issues Identified

🔴 HIGH PRIORITY: Memory Leaks in append Operations

  • Issue: Chain of append closures causes memory accumulation (Issue Memory leak when unfolding #35)
  • Impact: O(n) memory usage instead of O(1) for streaming, causes OutOfMemoryException
  • Root Cause: AsyncSeq.fs:append creates generator chains that prevent GC

🔴 HIGH PRIORITY: O(n²) Performance in Recursive Patterns

🟡 MEDIUM PRIORITY: Excessive append Function Calls

🟡 MEDIUM PRIORITY: Parallelism Issues

Performance Characteristics

Bottleneck Analysis

  • Primarily I/O Bound: Designed for async I/O scenarios
  • High CPU Overhead: From continuations and allocations
  • Memory Issues: Major leaks in core operations, high GC pressure
  • Threading: Non-blocking but doesn't effectively utilize multiple cores

Allocation Patterns

  • High Allocation: Each MoveNext() requires "several allocations"
  • GC Pressure: Heavy generation 0 collections (889 gen0 for 1M elements)
  • Long-lived Objects: Continuation chains prevent garbage collection

Typical Workloads

  1. Streaming I/O Processing: Large files, network streams, database cursors
  2. Event Processing: Event streams with backpressure control
  3. ETL Pipelines: Extract-Transform-Load workflows
  4. Paging Data: Consuming paginated APIs
  5. Parallel Processing: Throttled parallel processing

Performance-Critical Operations (Priority Order)

  1. unfoldAsync - Core sequence generation, used by most other functions
  2. append - Used extensively, major memory leak source
  3. collect - Monadic bind, complex state management
  4. mapAsync - Most common transformation operation
  5. iterAsync - Terminal operation, recursion performance issue

Build and Testing Commands

Current Commands

# Build and test
dotnet build -c Release
dotnet test -c Release

# Run performance tests (manual)
dotnet fsi tests/FSharp.Control.AsyncSeq.Tests/AsyncSeqPerf.fsx

# Fable tests (JavaScript compatibility)  
cd tests/fable && npm i && npm test

Recommended Performance Setup

# Add benchmarking framework
dotnet add package BenchmarkDotNet

# Profiling tools
dotnet tool install -g dotMemory.Unit
dotnet tool install -g JetBrains.dotTrace.GlobalTools

# Memory/CPU profiling
dotnet trace collect -- dotnet run YourApp.exe

Performance Improvement Plan

Round 1: Foundation & Critical Fixes

Goal: Establish measurement infrastructure and fix critical memory leaks

  • Set up BenchmarkDotNet for systematic benchmarking
  • Fix memory leaks in append operations (Issue Memory leak when unfolding #35)
  • Add performance regression tests to CI
  • Document baseline performance for all core operations
  • Expected Outcome: Enable reliable performance work, fix crashes with large datasets

Round 2: Core Algorithm Optimization

Goal: Address O(n²) performance issues and optimize hot paths

Round 3: Parallelism & Advanced Features

Goal: Enable true parallelism and advanced optimizations

Round 4: Performance Parity & Advanced Optimizations

Goal: Match or exceed alternative implementations

  • Zero-copy operations where possible
  • Operation fusion to eliminate intermediate collections
  • Advanced lazy evaluation improvements
  • Benchmark against IObservable<T> and seq<'T> alternatives
  • Expected Outcome: Performance parity with competing approaches

Environment Requirements

Development Setup

  • .NET 8.0: Current test target framework
  • Release Configuration: Essential for accurate measurement
  • BenchmarkDotNet: Statistical benchmarking framework
  • Profiling Tools: dotTrace (CPU), dotMemory (memory), PerfView (free alternative)

Measurement Guidelines

  • Use dedicated test machine for reproducible results
  • Long-running tests to surface issues that only appear with sustained load
  • Focus on one major bottleneck per improvement round
  • Systematic before/after measurement for all changes

Success Metrics

Primary Metrics

  • Memory Usage: Constant O(1) memory for streaming operations
  • Throughput: Elements processed per second
  • Latency: Time to first element in streaming scenarios
  • GC Pressure: Reduction in generation 0 collections

Baseline Targets (Round 1)

  • No OutOfMemoryException with any dataset size
  • Linear O(n) performance for all operations
  • 50% reduction in allocations per operation
  • Reliable 1-minute benchmark suite

AI-generated content by Daily Perf Improver may contain mistakes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions