-
Notifications
You must be signed in to change notification settings - Fork 55
Open
Description
Performance Research and Improvement Plan
Executive Summary
FSharp.Control.AsyncSeq has several critical performance issues that significantly impact its usability with large datasets. The primary concerns are memory leaks in append operations and O(n²) performance in recursive patterns. This research provides a comprehensive analysis and improvement plan.
Current Performance Testing Infrastructure
Testing Setup
- Performance Test File:
tests/FSharp.Control.AsyncSeq.Tests/AsyncSeqPerf.fsx
- Test Framework: Basic F# Script with
System.Diagnostics.Stopwatch
- Test Scale: N=1,000,000 elements
- Major Gap: No modern benchmarking framework (BenchmarkDotNet)
Historical Performance Results
unfoldIter (N=1,000,000):
- Real: 00:00:08.565, GC gen0: 889, gen1: 2, gen2: 0
unfoldChooseIter (N=1,000,000):
- Real: 00:00:10.818, GC gen0: 1114, gen1: 3, gen2: 0
replicate (N=1,000,000):
- Real: 00:00:01.530, GC gen0: 156, gen1: 1, gen2: 0
Critical Performance Issues Identified
🔴 HIGH PRIORITY: Memory Leaks in append
Operations
- Issue: Chain of append closures causes memory accumulation (Issue Memory leak when unfolding #35)
- Impact: O(n) memory usage instead of O(1) for streaming, causes OutOfMemoryException
- Root Cause:
AsyncSeq.fs:append
creates generator chains that prevent GC
🔴 HIGH PRIORITY: O(n²) Performance in Recursive Patterns
- Issue: Computation builder creates nested continuations (Issue Unexpected iteration performance drop when recursive loops are used. #57)
- Impact: 4-second operation becomes 18+ seconds with 4x more data
- Root Cause: AsyncSeq computation expressions with recursive
yield!
🟡 MEDIUM PRIORITY: Excessive append
Function Calls
- Issue: 3,000 element processing → 13 million append calls (Issue AsyncSeq.append called enormous number of times #50)
- Impact: Affects all sequence operations
- Root Cause: Implementation relies heavily on append for composition
🟡 MEDIUM PRIORITY: Parallelism Issues
- Issue:
mapAsyncParallel
doesn't provide true parallelism (Issues Parallel sequence runs consequentially (non-parallel) #74, Lack of parallelism in AsyncSeq.cache #95) - Impact: Sequential processing instead of parallel
- Root Cause: Implementation serializes work instead of parallelizing
Performance Characteristics
Bottleneck Analysis
- Primarily I/O Bound: Designed for async I/O scenarios
- High CPU Overhead: From continuations and allocations
- Memory Issues: Major leaks in core operations, high GC pressure
- Threading: Non-blocking but doesn't effectively utilize multiple cores
Allocation Patterns
- High Allocation: Each
MoveNext()
requires "several allocations" - GC Pressure: Heavy generation 0 collections (889 gen0 for 1M elements)
- Long-lived Objects: Continuation chains prevent garbage collection
Typical Workloads
- Streaming I/O Processing: Large files, network streams, database cursors
- Event Processing: Event streams with backpressure control
- ETL Pipelines: Extract-Transform-Load workflows
- Paging Data: Consuming paginated APIs
- Parallel Processing: Throttled parallel processing
Performance-Critical Operations (Priority Order)
unfoldAsync
- Core sequence generation, used by most other functionsappend
- Used extensively, major memory leak sourcecollect
- Monadic bind, complex state managementmapAsync
- Most common transformation operationiterAsync
- Terminal operation, recursion performance issue
Build and Testing Commands
Current Commands
# Build and test
dotnet build -c Release
dotnet test -c Release
# Run performance tests (manual)
dotnet fsi tests/FSharp.Control.AsyncSeq.Tests/AsyncSeqPerf.fsx
# Fable tests (JavaScript compatibility)
cd tests/fable && npm i && npm test
Recommended Performance Setup
# Add benchmarking framework
dotnet add package BenchmarkDotNet
# Profiling tools
dotnet tool install -g dotMemory.Unit
dotnet tool install -g JetBrains.dotTrace.GlobalTools
# Memory/CPU profiling
dotnet trace collect -- dotnet run YourApp.exe
Performance Improvement Plan
Round 1: Foundation & Critical Fixes
Goal: Establish measurement infrastructure and fix critical memory leaks
- Set up BenchmarkDotNet for systematic benchmarking
- Fix memory leaks in
append
operations (Issue Memory leak when unfolding #35) - Add performance regression tests to CI
- Document baseline performance for all core operations
- Expected Outcome: Enable reliable performance work, fix crashes with large datasets
Round 2: Core Algorithm Optimization
Goal: Address O(n²) performance issues and optimize hot paths
- Eliminate O(n²) behavior in recursive patterns (Issue Unexpected iteration performance drop when recursive loops are used. #57)
- Optimize
unfoldAsync
implementation for better memory usage - Add fast paths for common operations using
AsyncSeqOp
- Reduce per-operation allocations by 50%
- Expected Outcome: 5-10x performance improvement for typical workloads
Round 3: Parallelism & Advanced Features
Goal: Enable true parallelism and advanced optimizations
- Fix
mapAsyncParallel
to achieve true parallelism (Issues Parallel sequence runs consequentially (non-parallel) #74, Lack of parallelism in AsyncSeq.cache #95) - Implement efficient
AsyncSeq.cache
for multiple consumers - Add vectorization support for mathematical operations
- Optimize continuation chains to prevent deep nesting
- Expected Outcome: True parallel processing, better resource utilization
Round 4: Performance Parity & Advanced Optimizations
Goal: Match or exceed alternative implementations
- Zero-copy operations where possible
- Operation fusion to eliminate intermediate collections
- Advanced lazy evaluation improvements
- Benchmark against
IObservable<T>
andseq<'T>
alternatives - Expected Outcome: Performance parity with competing approaches
Environment Requirements
Development Setup
- .NET 8.0: Current test target framework
- Release Configuration: Essential for accurate measurement
- BenchmarkDotNet: Statistical benchmarking framework
- Profiling Tools: dotTrace (CPU), dotMemory (memory), PerfView (free alternative)
Measurement Guidelines
- Use dedicated test machine for reproducible results
- Long-running tests to surface issues that only appear with sustained load
- Focus on one major bottleneck per improvement round
- Systematic before/after measurement for all changes
Success Metrics
Primary Metrics
- Memory Usage: Constant O(1) memory for streaming operations
- Throughput: Elements processed per second
- Latency: Time to first element in streaming scenarios
- GC Pressure: Reduction in generation 0 collections
Baseline Targets (Round 1)
- No OutOfMemoryException with any dataset size
- Linear O(n) performance for all operations
- 50% reduction in allocations per operation
- Reliable 1-minute benchmark suite
AI-generated content by Daily Perf Improver may contain mistakes.
Metadata
Metadata
Assignees
Labels
No labels