Skip to content

Conversation

mkpro118
Copy link
Owner

@mkpro118 mkpro118 commented Sep 14, 2025

Summary by CodeRabbit

  • New Features
    • Adds a full semantic analysis subsystem with rich diagnostics (errors/warnings/infos), suggestions and fix hints.
    • Introduces configurable analyzer options (validation mode, features, concurrency, timeouts, diagnostic limits, target provider).
    • Adds symbol table, type resolver with caching, cycle detection and compatibility checks.
    • Models cross-entity relationships and per-phase results with overall analysis summary.
    • Exposes plugin-friendly traits for custom rules, attribute/relationship analyzers, progress reporting and cancellation.

@mkpro118 mkpro118 self-assigned this Sep 14, 2025
Copy link

coderabbitai bot commented Sep 14, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

Adds a new public semantic_analyzer module and multiple submodules: configuration/options, diagnostics, analysis context/metadata, relationship graph, symbol table, type resolver, and analyzer traits; exposes key types via re-exports and integrates them into the crate public surface. Parser/scanner unchanged.

Changes

Cohort / File(s) Summary
Core export
src/core/mod.rs
Publicly exports the new semantic_analyzer module (pub mod semantic_analyzer;).
Semantic analyzer root & options
src/core/semantic_analyzer/mod.rs
Adds module declarations and re-exports; introduces AnalyzerOptions, ValidationMode, FeatureOptions, ConcurrencyMode, DatabaseProvider and Default for analyzer configuration.
Analysis context & metadata
src/core/semantic_analyzer/context.rs
Adds AnalysisContext, PhaseResult, AnalysisResult, AnalysisSummary, scope stack/types, metadata/statistics, timing, timeout handling, relationship graph and related types, and helpers for current targets and type-resolution stack.
Diagnostics subsystem
src/core/semantic_analyzer/diagnostics.rs
Adds SemanticDiagnostic, DiagnosticSeverity/DiagnosticCode/DiagnosticCategory, fix hints, related spans, metadata store, DiagnosticCollector, factory helpers and filtering utilities.
Symbol table & symbols
src/core/semantic_analyzer/symbol_table.rs
Implements SymbolTable, symbol kinds and metadata (Symbol, ModelSymbol, FieldSymbol, etc.), built-in types/functions initialization, add/lookup/iterate APIs, and SymbolError.
Type resolver & type system
src/core/semantic_analyzer/type_resolver.rs
Adds TypeResolver, caching, dependency graph, cycle detection, ResolvedType/ScalarType/CompositeType, constraints, compatibility checks, stats, and errors.
Analyzer traits & extensibility
src/core/semantic_analyzer/traits.rs
Defines pluggable traits: PhaseAnalyzer, DeclarationAnalyzer, RelationshipAnalyzer, AttributeAnalyzer, CustomRule, CompositeAnalyzer, ProgressReporting, Cancellable, plus AttributeContext and element type enum.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant SA as SemanticAnalyzer (orchestrator)
  participant Ctx as AnalysisContext
  participant ST as SymbolTable
  participant TR as TypeResolver
  participant DC as DiagnosticCollector

  User->>SA: run(schema, AnalyzerOptions)
  SA->>Ctx: create(options, shared ST/TR/RelGraph)
  SA->>ST: populate symbols (models/enums/datasources)
  SA->>TR: init caches/constraints
  loop phases
    SA->>SA: invoke PhaseAnalyzer.analyze(schema, Ctx)
    SA->>TR: resolve_type(type_ref, ST)
    alt resolved
      TR-->>SA: ResolvedType
    else error
      TR-->>SA: TypeResolutionError
      SA->>DC: add diagnostic
    end
    SA->>Ctx: record_phase_timing/metadata
  end
  SA->>Ctx: take_metadata()
  SA-->>User: AnalysisResult { symbol_table, type_resolver, relationship_graph, diagnostics, metadata, analysis_time }
Loading
sequenceDiagram
  autonumber
  participant TR as TypeResolver
  participant Cache as ResolvedTypesCache
  participant ST as SymbolTable

  TR->>Cache: lookup(type_ref)
  alt cache hit
    Cache-->>TR: ResolvedType
  else cache miss
    TR->>TR: resolve_type_uncached(type_ref)
    alt NamedType
      TR->>ST: lookup_global(name)
      ST-->>TR: Symbol or None
      TR->>TR: map to ResolvedType or UnknownType error
    else List/Optional
      TR->>TR: resolve inner type
    end
    TR->>Cache: store(result)
    Cache-->>TR: ResolvedType
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

I hop through scopes with careful cheer,
New symbols bloom and link quite near.
Diagnostics sparkle, messages bright,
Types resolve in morning light.
I thump my paw: "Analysis — done right!" 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The PR title "Semantic analyzer core foundations" succinctly and accurately summarizes the primary change: introduction of a new semantic_analyzer module and its foundational subsystems (context, diagnostics, symbol table, traits, and type resolver). It is concise, focused on the main change, and readable for teammates scanning history. The title avoids noisy details and is not misleading.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between be200c8 and c2f11d4.

📒 Files selected for processing (6)
  • src/core/semantic_analyzer/context.rs (1 hunks)
  • src/core/semantic_analyzer/diagnostics.rs (1 hunks)
  • src/core/semantic_analyzer/mod.rs (1 hunks)
  • src/core/semantic_analyzer/symbol_table.rs (1 hunks)
  • src/core/semantic_analyzer/traits.rs (1 hunks)
  • src/core/semantic_analyzer/type_resolver.rs (1 hunks)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (12)
src/core/semantic_analyzer/mod.rs (2)

27-36: Add Default for FeatureOptions for ergonomic config construction

Deriving or implementing Default avoids verbose struct literals downstream.

Apply this diff:

 #[derive(Debug, Clone)]
 pub struct FeatureOptions {
@@
     pub enable_parallelism: bool,
 }
+
+impl Default for FeatureOptions {
+    fn default() -> Self {
+        Self {
+            validate_experimental: true,
+            performance_warnings: true,
+            enable_parallelism: true,
+        }
+    }
+}

71-79: Future-proof DatabaseProvider with non_exhaustive

Marking the enum as non_exhaustive reduces future breaking changes if new providers are added.

Apply this diff:

-#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, EnumKindName)]
-pub enum DatabaseProvider {
+#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, EnumKindName)]
+#[non_exhaustive]
+pub enum DatabaseProvider {
src/core/semantic_analyzer/traits.rs (1)

181-206: Align CompositeAnalyzer with other analyzer traits: add Send + Sync

Other analyzer traits require Send + Sync. Making CompositeAnalyzer Send + Sync avoids accidental non-thread-safe composites when running in parallel.

Apply this diff:

-pub trait CompositeAnalyzer {
+pub trait CompositeAnalyzer: Send + Sync {
src/core/semantic_analyzer/type_resolver.rs (2)

83-117: Use the last name part and handle empty parts safely in resolve_named_type

Indexing parts[0] is brittle and ignores qualified names. Prefer last() and return a clear error on empty names.

Apply this diff:

-        let type_name = &named_type.name.parts[0].text;
+        let type_name = match named_type.name.parts.last() {
+            Some(p) => p.text.clone(),
+            None => {
+                return Err(TypeResolutionError::UnknownType {
+                    name: "<empty>".to_string(),
+                });
+            }
+        };
@@
-        if let Some(symbol) = symbol_table.lookup_global(type_name) {
+        if let Some(symbol) = symbol_table.lookup_global(&type_name) {
@@
-                    Ok(ResolvedType::Model(type_name.clone()))
+                    Ok(ResolvedType::Model(type_name.clone()))
@@
-                    Ok(ResolvedType::Enum(type_name.clone()))
+                    Ok(ResolvedType::Enum(type_name.clone()))
@@
-                    let scalar_type = Self::map_builtin_to_scalar(type_name)?;
+                    let scalar_type = Self::map_builtin_to_scalar(&type_name)?;
@@
-                    Ok(ResolvedType::Alias {
-                        name: type_name.clone(),
-                        target: alias.target_type.clone(),
-                    })
+                    Ok(ResolvedType::Alias { name: type_name.clone(), target: alias.target_type.clone() })
                 }
             }
         } else {
             Err(TypeResolutionError::UnknownType {
-                name: type_name.clone(),
+                name: type_name,
             })
         }

245-271: Handle Optional→Optional in are_compatible to avoid misleading recursion

Explicitly compare inner types when both sides are Optional.

Apply this diff (place before the existing Optional-accepts-inner arm):

         match (from, to) {
             // Same types are always compatible
             (a, b) if a == b => true,
 
+            // Optional to Optional: compare inner types
+            (ResolvedType::Optional(a), ResolvedType::Optional(b)) => {
+                Self::are_compatible(a, b)
+            }
+
             // Optional can accept the inner type
             (inner, ResolvedType::Optional(opt_inner)) => {
                 Self::are_compatible(inner, opt_inner)
             }
src/core/semantic_analyzer/symbol_table.rs (5)

205-210: contains() only checks global/builtins but doc says “any scope”

Either update the doc or include model/enum scopes in the check.

Apply this diff to match the doc:

 /// Check if a symbol exists in any scope.
 #[must_use]
 pub fn contains(&self, name: &str) -> bool {
-    self.lookup_global(name).is_some()
+    self.lookup_global(name).is_some()
+        || self
+            .model_scopes
+            .values()
+            .any(|scope| scope.contains_key(name))
+        || self
+            .enum_scopes
+            .values()
+            .any(|scope| scope.contains_key(name))
 }

291-324: Detect @@id (block primary key) in addition to @id on fields

Current has_primary_key only checks field-level @id. Include block-level @@id.

Apply this diff:

-                has_primary_key: model.members.iter().any(|m| {
-                    if let crate::core::parser::ast::ModelMember::Field(field) = m {
-                        field.attrs.iter().any(|attr| {
-                            attr.name.parts.len() == 1
-                                && attr.name.parts[0].text == "id"
-                        })
-                    } else {
-                        false
-                    }
-                }),
+                has_primary_key: {
+                    let field_pk = model.members.iter().any(|m| {
+                        if let crate::core::parser::ast::ModelMember::Field(field) = m {
+                            field.attrs.iter().any(|attr| {
+                                attr.name.parts.len() == 1
+                                    && attr.name.parts[0].text == "id"
+                            })
+                        } else {
+                            false
+                        }
+                    });
+                    let block_pk = model.attrs.iter().any(|attr| {
+                        attr.name.parts.len() == 1
+                            && attr.name.parts[0].text == "id"
+                    });
+                    field_pk || block_pk
+                },

244-261: Confirm intended behavior on user-defined names shadowing built-ins

add_global_symbol doesn’t prevent shadowing built-ins (e.g., model named String). Decide whether to forbid this; if so, reject when builtin_symbols contains the name.

If you choose to forbid, minimal change:

-        if let Some(existing) = self.global_symbols.get(&name) {
+        if let Some(existing) = self.global_symbols.get(&name)
+            || self.builtin_symbols.contains_key(&name)
+        {
             return Err(SymbolError::DuplicateSymbol {
                 name,
                 scope: "global".to_string(),
                 existing_span: existing.declaration_span.clone(),
                 new_span: symbol.declaration_span.clone(),
             });
         }

Note: you’ll need a synthetic span for built-ins or adjust the error payload.


469-472: Unused TypeAlias variant: add API or remove until needed

There’s no way to insert a TypeAlias today. Either add add_type_alias or drop the variant to avoid dead surface.

Example API:

 impl SymbolTable {
+    /// Add a type alias to the global scope.
+    pub fn add_type_alias(&mut self, name: &str, target: &str, span: SymbolSpan) -> Result<(), SymbolError> {
+        let symbol = Symbol {
+            name: name.to_string(),
+            symbol_type: SymbolType::TypeAlias(TypeAliasSymbol { target_type: target.to_string() }),
+            declaration_span: span,
+            visibility: Visibility::Public,
+            metadata: SymbolMetadata::new(),
+        };
+        self.add_global_symbol(symbol)
+    }
 }

436-457: Categorize built-in functions more precisely (optional)

Everything is marked FunctionCategory::Generator; consider a separate category for time/uuid helpers to enable finer validation later.

src/core/semantic_analyzer/context.rs (2)

191-196: Avoid allocation in is_resolving_type.

contains(&type_name.to_string()) allocates; compare by iterating refs.

-    pub fn is_resolving_type(&self, type_name: &str) -> bool {
-        self.type_resolution_stack.contains(&type_name.to_string())
-    }
+    pub fn is_resolving_type(&self, type_name: &str) -> bool {
+        self.type_resolution_stack.iter().any(|t| t == type_name)
+    }

233-237: Don’t consume AnalysisContext just to take metadata.

Returning metadata by-value forces callers to drop the whole context. Prefer mem::take to keep using the context after reading metadata.

-    pub fn take_metadata(self) -> AnalysisMetadata {
-        self.metadata
-    }
+    pub fn take_metadata(&mut self) -> AnalysisMetadata {
+        std::mem::take(&mut self.metadata)
+    }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 54003da and be200c8.

📒 Files selected for processing (7)
  • src/core/mod.rs (1 hunks)
  • src/core/semantic_analyzer/context.rs (1 hunks)
  • src/core/semantic_analyzer/diagnostics.rs (1 hunks)
  • src/core/semantic_analyzer/mod.rs (1 hunks)
  • src/core/semantic_analyzer/symbol_table.rs (1 hunks)
  • src/core/semantic_analyzer/traits.rs (1 hunks)
  • src/core/semantic_analyzer/type_resolver.rs (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
src/core/semantic_analyzer/traits.rs (1)
src/core/parser/traits.rs (1)
  • children (400-402)
src/core/semantic_analyzer/mod.rs (5)
tests/scanner_parser/mod.rs (1)
  • std (297-297)
src/core/semantic_analyzer/context.rs (2)
  • default (249-251)
  • default (319-321)
src/core/semantic_analyzer/type_resolver.rs (2)
  • default (294-296)
  • default (468-470)
src/core/semantic_analyzer/symbol_table.rs (2)
  • default (265-267)
  • default (565-567)
src/core/semantic_analyzer/diagnostics.rs (2)
  • default (509-511)
  • default (633-635)
src/core/semantic_analyzer/context.rs (2)
src/core/semantic_analyzer/mod.rs (1)
  • default (82-98)
src/core/semantic_analyzer/diagnostics.rs (2)
  • is_error (136-138)
  • is_fatal (154-162)
src/core/semantic_analyzer/diagnostics.rs (1)
src/core/semantic_analyzer/mod.rs (1)
  • default (82-98)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Test nightly on macos-latest --all-features
  • GitHub Check: Test nightly on windows-latest --all-features
  • GitHub Check: Test beta on windows-latest
  • GitHub Check: Test beta on windows-latest --no-default-features
  • GitHub Check: Test nightly on windows-latest
  • GitHub Check: Test stable on windows-latest --no-default-features
  • GitHub Check: Test nightly on windows-latest --no-default-features
  • GitHub Check: Test beta on windows-latest --all-features
  • GitHub Check: Test stable on windows-latest --all-features
  • GitHub Check: Security Audit
🔇 Additional comments (4)
src/core/mod.rs (1)

3-3: LGTM: public export for semantic_analyzer

Wires the new module cleanly with no side effects visible here.

src/core/semantic_analyzer/mod.rs (1)

39-48: Validate concurrency defaults; prefer Sequential or CPU-aware default

Defaulting to Concurrent { max_threads: 4, threshold: 2 } can surprise users and introduce nondeterminism—use Sequential by default or derive max_threads from available CPUs (e.g., num_cpus::get()). Repo search for AnalyzerOptions::default() and concurrency overrides returned no matches, so manual verification is required to ensure no regressions in determinism-sensitive tests. Sanity-check with:
rg -nP --type=rust -C2 '\bAnalyzerOptions::default()|AnalyzerOptions\s*{[^}]*concurrency'
File: src/core/semantic_analyzer/mod.rs (lines 39-48; also applies to 81-99).

src/core/semantic_analyzer/type_resolver.rs (1)

17-26: TypeRef derives Eq + Hash — confirm inner fields are deterministic

TypeRef is #[derive(Debug, Clone, PartialEq, Eq, Hash)] in src/core/parser/ast/mod.rs (≈line 663); safe as a HashMap key. Verify NamedType and any nested fields do not include non-deterministic data (e.g., spans); if they do, remove those fields from Eq/Hash or use a different cache key.

src/core/semantic_analyzer/context.rs (1)

34-50: Clarify concurrency model for AnalysisContext — avoid accidental shared mutable state.

Fields current_scope, metadata, current_model/current_field/current_enum, and type_resolution_stack in src/core/semantic_analyzer/context.rs are plain mutable (no Mutex/RwLock). If an AnalysisContext instance is shared across analyzers/workers this will race. Either:

  • Guarantee one AnalysisContext per worker (document and enforce), or
  • Protect these fields with Mutex/RwLock or use thread-local/interior-mutability and audit call sites.

Comment on lines +52 to +61
/// Relationship graph for tracking model relationships.
#[derive(Debug, Clone, Default)]
pub struct RelationshipGraph {
/// All relationships indexed by ID
pub relationships: HashMap<String, Relationship>,

/// Relationships by source model
pub model_relationships: HashMap<String, Vec<String>>,
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Preserve RelationshipGraph invariants with an API, not struct field access.

Keeping relationships and model_relationships consistent is easy to get wrong if callers write fields directly. Provide an add/remove API and make fields private.

-#[derive(Debug, Clone, Default)]
-pub struct RelationshipGraph {
-    /// All relationships indexed by ID
-    pub relationships: HashMap<String, Relationship>,
-    /// Relationships by source model
-    pub model_relationships: HashMap<String, Vec<String>>,
-}
+#[derive(Debug, Clone, Default)]
+pub struct RelationshipGraph {
+    relationships: HashMap<String, Relationship>,
+    model_relationships: HashMap<String, Vec<String>>,
+}
+
+impl RelationshipGraph {
+    pub fn add(&mut self, rel: Relationship) {
+        let id = rel.id.clone();
+        self.model_relationships
+            .entry(rel.from_model.clone())
+            .or_default()
+            .push(id.clone());
+        self.relationships.insert(id, rel);
+    }
+    pub fn get(&self, id: &str) -> Option<&Relationship> { self.relationships.get(id) }
+    pub fn by_model(&self, model: &str) -> impl Iterator<Item = &Relationship> {
+        self.model_relationships
+            .get(model)
+            .into_iter()
+            .flatten()
+            .filter_map(|id| self.relationships.get(id))
+    }
+}

If external mutation is required, expose read-only accessors and targeted mutators.

Also applies to: 62-75

🤖 Prompt for AI Agents
In src/core/semantic_analyzer/context.rs around lines 52 to 61 (and similarly
for 62 to 75), the RelationshipGraph exposes its HashMap fields publicly which
allows inconsistent updates; change the struct to make relationships and
model_relationships private, add methods like add_relationship(&mut self, id:
String, rel: Relationship) and remove_relationship(&mut self, id: &str) that
update both maps atomically, and add read-only accessors (e.g.,
get_relationship(&self, id: &str) -> Option<&Relationship>,
model_relationships_for(&self, model: &str) -> Option<&Vec<String>>) so external
code cannot mutate internals directly and consistency between the two maps is
always preserved.

Comment on lines +18 to +26
/// Cache of resolved types
resolved_types: HashMap<TypeRef, ResolvedType>,

/// Type dependency graph for circular dependency detection
type_dependencies: HashMap<String, HashSet<String>>,

/// Type constraints and validation rules
type_constraints: HashMap<String, TypeConstraints>,
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Cache key uses AST TypeRef; consider canonical String key to avoid heavy clones/hash fragility

Hashing/cloning TypeRef can be costly and may fail if TypeRef lacks Eq/Hash. A canonical string key (e.g., "Int", "User[]", "User?") is simpler and stable.

Example refactor:

-    resolved_types: HashMap<TypeRef, ResolvedType>,
+    resolved_types: HashMap<String, ResolvedType>,
@@
-        if let Some(resolved) = self.resolved_types.get(type_ref) {
+        let key = Self::type_key(type_ref);
+        if let Some(resolved) = self.resolved_types.get(&key) {
             return Ok(resolved.clone());
         }
@@
-        self.resolved_types
-            .insert(type_ref.clone(), resolved.clone());
+        self.resolved_types.insert(key, resolved.clone());
@@
     pub fn clear_cache(&mut self) {
-        self.resolved_types.clear();
+        self.resolved_types.clear();
     }

Add this helper in the impl block:

fn type_key(type_ref: &TypeRef) -> String {
    match type_ref {
        TypeRef::Named(n) => n.name.parts.iter().map(|p| p.text.as_str()).collect::<Vec<_>>().join("."),
        TypeRef::List(inner) => format!("{}[]", Self::type_key(&inner.inner)),
    }
}

Also applies to: 49-57, 273-276

🤖 Prompt for AI Agents
In src/core/semantic_analyzer/type_resolver.rs around lines 18-26 (and also
update usages at ~49-57 and ~273-276): the resolved_types cache currently keys
on AST TypeRef which is expensive and fragile; change the cache key to a
canonical String instead. Add a helper fn type_key(type_ref: &TypeRef) -> String
in the impl that canonicalizes all TypeRef variants (Named -> fully qualified
dot-joined name, List -> "<inner_key>[]", Optional/Nullable -> "<inner_key>?" or
whatever nullable variant exists, etc.), then update the HashMap definition to
HashMap<String, ResolvedType> and change all insert/lookup calls to call
type_key(&type_ref) before using the map. Ensure every place that compared or
cloned TypeRef now uses the string key to avoid heavy clones and remove any need
for TypeRef to implement Hash/Eq.

@mkpro118 mkpro118 force-pushed the semantic-analyzer-core-foundations branch from be200c8 to c2f11d4 Compare September 14, 2025 23:00
@mkpro118 mkpro118 merged commit 8f4293b into main Sep 14, 2025
70 of 71 checks passed
@mkpro118 mkpro118 deleted the semantic-analyzer-core-foundations branch September 14, 2025 23:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant