-
Notifications
You must be signed in to change notification settings - Fork 0
Semantic analyzer core foundations #20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Caution Review failedThe pull request is closed. WalkthroughAdds a new public Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant SA as SemanticAnalyzer (orchestrator)
participant Ctx as AnalysisContext
participant ST as SymbolTable
participant TR as TypeResolver
participant DC as DiagnosticCollector
User->>SA: run(schema, AnalyzerOptions)
SA->>Ctx: create(options, shared ST/TR/RelGraph)
SA->>ST: populate symbols (models/enums/datasources)
SA->>TR: init caches/constraints
loop phases
SA->>SA: invoke PhaseAnalyzer.analyze(schema, Ctx)
SA->>TR: resolve_type(type_ref, ST)
alt resolved
TR-->>SA: ResolvedType
else error
TR-->>SA: TypeResolutionError
SA->>DC: add diagnostic
end
SA->>Ctx: record_phase_timing/metadata
end
SA->>Ctx: take_metadata()
SA-->>User: AnalysisResult { symbol_table, type_resolver, relationship_graph, diagnostics, metadata, analysis_time }
sequenceDiagram
autonumber
participant TR as TypeResolver
participant Cache as ResolvedTypesCache
participant ST as SymbolTable
TR->>Cache: lookup(type_ref)
alt cache hit
Cache-->>TR: ResolvedType
else cache miss
TR->>TR: resolve_type_uncached(type_ref)
alt NamedType
TR->>ST: lookup_global(name)
ST-->>TR: Symbol or None
TR->>TR: map to ResolvedType or UnknownType error
else List/Optional
TR->>TR: resolve inner type
end
TR->>Cache: store(result)
Cache-->>TR: ResolvedType
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (6)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (12)
src/core/semantic_analyzer/mod.rs (2)
27-36
: Add Default for FeatureOptions for ergonomic config constructionDeriving or implementing Default avoids verbose struct literals downstream.
Apply this diff:
#[derive(Debug, Clone)] pub struct FeatureOptions { @@ pub enable_parallelism: bool, } + +impl Default for FeatureOptions { + fn default() -> Self { + Self { + validate_experimental: true, + performance_warnings: true, + enable_parallelism: true, + } + } +}
71-79
: Future-proof DatabaseProvider with non_exhaustiveMarking the enum as non_exhaustive reduces future breaking changes if new providers are added.
Apply this diff:
-#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, EnumKindName)] -pub enum DatabaseProvider { +#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, EnumKindName)] +#[non_exhaustive] +pub enum DatabaseProvider {src/core/semantic_analyzer/traits.rs (1)
181-206
: Align CompositeAnalyzer with other analyzer traits: add Send + SyncOther analyzer traits require Send + Sync. Making CompositeAnalyzer Send + Sync avoids accidental non-thread-safe composites when running in parallel.
Apply this diff:
-pub trait CompositeAnalyzer { +pub trait CompositeAnalyzer: Send + Sync {src/core/semantic_analyzer/type_resolver.rs (2)
83-117
: Use the last name part and handle empty parts safely in resolve_named_typeIndexing parts[0] is brittle and ignores qualified names. Prefer last() and return a clear error on empty names.
Apply this diff:
- let type_name = &named_type.name.parts[0].text; + let type_name = match named_type.name.parts.last() { + Some(p) => p.text.clone(), + None => { + return Err(TypeResolutionError::UnknownType { + name: "<empty>".to_string(), + }); + } + }; @@ - if let Some(symbol) = symbol_table.lookup_global(type_name) { + if let Some(symbol) = symbol_table.lookup_global(&type_name) { @@ - Ok(ResolvedType::Model(type_name.clone())) + Ok(ResolvedType::Model(type_name.clone())) @@ - Ok(ResolvedType::Enum(type_name.clone())) + Ok(ResolvedType::Enum(type_name.clone())) @@ - let scalar_type = Self::map_builtin_to_scalar(type_name)?; + let scalar_type = Self::map_builtin_to_scalar(&type_name)?; @@ - Ok(ResolvedType::Alias { - name: type_name.clone(), - target: alias.target_type.clone(), - }) + Ok(ResolvedType::Alias { name: type_name.clone(), target: alias.target_type.clone() }) } } } else { Err(TypeResolutionError::UnknownType { - name: type_name.clone(), + name: type_name, }) }
245-271
: Handle Optional→Optional in are_compatible to avoid misleading recursionExplicitly compare inner types when both sides are Optional.
Apply this diff (place before the existing Optional-accepts-inner arm):
match (from, to) { // Same types are always compatible (a, b) if a == b => true, + // Optional to Optional: compare inner types + (ResolvedType::Optional(a), ResolvedType::Optional(b)) => { + Self::are_compatible(a, b) + } + // Optional can accept the inner type (inner, ResolvedType::Optional(opt_inner)) => { Self::are_compatible(inner, opt_inner) }src/core/semantic_analyzer/symbol_table.rs (5)
205-210
: contains() only checks global/builtins but doc says “any scope”Either update the doc or include model/enum scopes in the check.
Apply this diff to match the doc:
/// Check if a symbol exists in any scope. #[must_use] pub fn contains(&self, name: &str) -> bool { - self.lookup_global(name).is_some() + self.lookup_global(name).is_some() + || self + .model_scopes + .values() + .any(|scope| scope.contains_key(name)) + || self + .enum_scopes + .values() + .any(|scope| scope.contains_key(name)) }
291-324
: Detect @@id (block primary key) in addition to @id on fieldsCurrent has_primary_key only checks field-level @id. Include block-level @@id.
Apply this diff:
- has_primary_key: model.members.iter().any(|m| { - if let crate::core::parser::ast::ModelMember::Field(field) = m { - field.attrs.iter().any(|attr| { - attr.name.parts.len() == 1 - && attr.name.parts[0].text == "id" - }) - } else { - false - } - }), + has_primary_key: { + let field_pk = model.members.iter().any(|m| { + if let crate::core::parser::ast::ModelMember::Field(field) = m { + field.attrs.iter().any(|attr| { + attr.name.parts.len() == 1 + && attr.name.parts[0].text == "id" + }) + } else { + false + } + }); + let block_pk = model.attrs.iter().any(|attr| { + attr.name.parts.len() == 1 + && attr.name.parts[0].text == "id" + }); + field_pk || block_pk + },
244-261
: Confirm intended behavior on user-defined names shadowing built-insadd_global_symbol doesn’t prevent shadowing built-ins (e.g., model named String). Decide whether to forbid this; if so, reject when builtin_symbols contains the name.
If you choose to forbid, minimal change:
- if let Some(existing) = self.global_symbols.get(&name) { + if let Some(existing) = self.global_symbols.get(&name) + || self.builtin_symbols.contains_key(&name) + { return Err(SymbolError::DuplicateSymbol { name, scope: "global".to_string(), existing_span: existing.declaration_span.clone(), new_span: symbol.declaration_span.clone(), }); }Note: you’ll need a synthetic span for built-ins or adjust the error payload.
469-472
: Unused TypeAlias variant: add API or remove until neededThere’s no way to insert a TypeAlias today. Either add add_type_alias or drop the variant to avoid dead surface.
Example API:
impl SymbolTable { + /// Add a type alias to the global scope. + pub fn add_type_alias(&mut self, name: &str, target: &str, span: SymbolSpan) -> Result<(), SymbolError> { + let symbol = Symbol { + name: name.to_string(), + symbol_type: SymbolType::TypeAlias(TypeAliasSymbol { target_type: target.to_string() }), + declaration_span: span, + visibility: Visibility::Public, + metadata: SymbolMetadata::new(), + }; + self.add_global_symbol(symbol) + } }
436-457
: Categorize built-in functions more precisely (optional)Everything is marked FunctionCategory::Generator; consider a separate category for time/uuid helpers to enable finer validation later.
src/core/semantic_analyzer/context.rs (2)
191-196
: Avoid allocation in is_resolving_type.contains(&type_name.to_string()) allocates; compare by iterating refs.
- pub fn is_resolving_type(&self, type_name: &str) -> bool { - self.type_resolution_stack.contains(&type_name.to_string()) - } + pub fn is_resolving_type(&self, type_name: &str) -> bool { + self.type_resolution_stack.iter().any(|t| t == type_name) + }
233-237
: Don’t consume AnalysisContext just to take metadata.Returning metadata by-value forces callers to drop the whole context. Prefer mem::take to keep using the context after reading metadata.
- pub fn take_metadata(self) -> AnalysisMetadata { - self.metadata - } + pub fn take_metadata(&mut self) -> AnalysisMetadata { + std::mem::take(&mut self.metadata) + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
src/core/mod.rs
(1 hunks)src/core/semantic_analyzer/context.rs
(1 hunks)src/core/semantic_analyzer/diagnostics.rs
(1 hunks)src/core/semantic_analyzer/mod.rs
(1 hunks)src/core/semantic_analyzer/symbol_table.rs
(1 hunks)src/core/semantic_analyzer/traits.rs
(1 hunks)src/core/semantic_analyzer/type_resolver.rs
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
src/core/semantic_analyzer/traits.rs (1)
src/core/parser/traits.rs (1)
children
(400-402)
src/core/semantic_analyzer/mod.rs (5)
tests/scanner_parser/mod.rs (1)
std
(297-297)src/core/semantic_analyzer/context.rs (2)
default
(249-251)default
(319-321)src/core/semantic_analyzer/type_resolver.rs (2)
default
(294-296)default
(468-470)src/core/semantic_analyzer/symbol_table.rs (2)
default
(265-267)default
(565-567)src/core/semantic_analyzer/diagnostics.rs (2)
default
(509-511)default
(633-635)
src/core/semantic_analyzer/context.rs (2)
src/core/semantic_analyzer/mod.rs (1)
default
(82-98)src/core/semantic_analyzer/diagnostics.rs (2)
is_error
(136-138)is_fatal
(154-162)
src/core/semantic_analyzer/diagnostics.rs (1)
src/core/semantic_analyzer/mod.rs (1)
default
(82-98)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Test nightly on macos-latest --all-features
- GitHub Check: Test nightly on windows-latest --all-features
- GitHub Check: Test beta on windows-latest
- GitHub Check: Test beta on windows-latest --no-default-features
- GitHub Check: Test nightly on windows-latest
- GitHub Check: Test stable on windows-latest --no-default-features
- GitHub Check: Test nightly on windows-latest --no-default-features
- GitHub Check: Test beta on windows-latest --all-features
- GitHub Check: Test stable on windows-latest --all-features
- GitHub Check: Security Audit
🔇 Additional comments (4)
src/core/mod.rs (1)
3-3
: LGTM: public export for semantic_analyzerWires the new module cleanly with no side effects visible here.
src/core/semantic_analyzer/mod.rs (1)
39-48
: Validate concurrency defaults; prefer Sequential or CPU-aware defaultDefaulting to Concurrent { max_threads: 4, threshold: 2 } can surprise users and introduce nondeterminism—use Sequential by default or derive max_threads from available CPUs (e.g., num_cpus::get()). Repo search for AnalyzerOptions::default() and concurrency overrides returned no matches, so manual verification is required to ensure no regressions in determinism-sensitive tests. Sanity-check with:
rg -nP --type=rust -C2 '\bAnalyzerOptions::default()|AnalyzerOptions\s*{[^}]*concurrency'
File: src/core/semantic_analyzer/mod.rs (lines 39-48; also applies to 81-99).src/core/semantic_analyzer/type_resolver.rs (1)
17-26
: TypeRef derives Eq + Hash — confirm inner fields are deterministicTypeRef is #[derive(Debug, Clone, PartialEq, Eq, Hash)] in src/core/parser/ast/mod.rs (≈line 663); safe as a HashMap key. Verify NamedType and any nested fields do not include non-deterministic data (e.g., spans); if they do, remove those fields from Eq/Hash or use a different cache key.
src/core/semantic_analyzer/context.rs (1)
34-50
: Clarify concurrency model for AnalysisContext — avoid accidental shared mutable state.Fields current_scope, metadata, current_model/current_field/current_enum, and type_resolution_stack in src/core/semantic_analyzer/context.rs are plain mutable (no Mutex/RwLock). If an AnalysisContext instance is shared across analyzers/workers this will race. Either:
- Guarantee one AnalysisContext per worker (document and enforce), or
- Protect these fields with Mutex/RwLock or use thread-local/interior-mutability and audit call sites.
/// Relationship graph for tracking model relationships. | ||
#[derive(Debug, Clone, Default)] | ||
pub struct RelationshipGraph { | ||
/// All relationships indexed by ID | ||
pub relationships: HashMap<String, Relationship>, | ||
|
||
/// Relationships by source model | ||
pub model_relationships: HashMap<String, Vec<String>>, | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Preserve RelationshipGraph invariants with an API, not struct field access.
Keeping relationships and model_relationships consistent is easy to get wrong if callers write fields directly. Provide an add/remove API and make fields private.
-#[derive(Debug, Clone, Default)]
-pub struct RelationshipGraph {
- /// All relationships indexed by ID
- pub relationships: HashMap<String, Relationship>,
- /// Relationships by source model
- pub model_relationships: HashMap<String, Vec<String>>,
-}
+#[derive(Debug, Clone, Default)]
+pub struct RelationshipGraph {
+ relationships: HashMap<String, Relationship>,
+ model_relationships: HashMap<String, Vec<String>>,
+}
+
+impl RelationshipGraph {
+ pub fn add(&mut self, rel: Relationship) {
+ let id = rel.id.clone();
+ self.model_relationships
+ .entry(rel.from_model.clone())
+ .or_default()
+ .push(id.clone());
+ self.relationships.insert(id, rel);
+ }
+ pub fn get(&self, id: &str) -> Option<&Relationship> { self.relationships.get(id) }
+ pub fn by_model(&self, model: &str) -> impl Iterator<Item = &Relationship> {
+ self.model_relationships
+ .get(model)
+ .into_iter()
+ .flatten()
+ .filter_map(|id| self.relationships.get(id))
+ }
+}
If external mutation is required, expose read-only accessors and targeted mutators.
Also applies to: 62-75
🤖 Prompt for AI Agents
In src/core/semantic_analyzer/context.rs around lines 52 to 61 (and similarly
for 62 to 75), the RelationshipGraph exposes its HashMap fields publicly which
allows inconsistent updates; change the struct to make relationships and
model_relationships private, add methods like add_relationship(&mut self, id:
String, rel: Relationship) and remove_relationship(&mut self, id: &str) that
update both maps atomically, and add read-only accessors (e.g.,
get_relationship(&self, id: &str) -> Option<&Relationship>,
model_relationships_for(&self, model: &str) -> Option<&Vec<String>>) so external
code cannot mutate internals directly and consistency between the two maps is
always preserved.
/// Cache of resolved types | ||
resolved_types: HashMap<TypeRef, ResolvedType>, | ||
|
||
/// Type dependency graph for circular dependency detection | ||
type_dependencies: HashMap<String, HashSet<String>>, | ||
|
||
/// Type constraints and validation rules | ||
type_constraints: HashMap<String, TypeConstraints>, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Cache key uses AST TypeRef; consider canonical String key to avoid heavy clones/hash fragility
Hashing/cloning TypeRef can be costly and may fail if TypeRef lacks Eq/Hash. A canonical string key (e.g., "Int", "User[]", "User?") is simpler and stable.
Example refactor:
- resolved_types: HashMap<TypeRef, ResolvedType>,
+ resolved_types: HashMap<String, ResolvedType>,
@@
- if let Some(resolved) = self.resolved_types.get(type_ref) {
+ let key = Self::type_key(type_ref);
+ if let Some(resolved) = self.resolved_types.get(&key) {
return Ok(resolved.clone());
}
@@
- self.resolved_types
- .insert(type_ref.clone(), resolved.clone());
+ self.resolved_types.insert(key, resolved.clone());
@@
pub fn clear_cache(&mut self) {
- self.resolved_types.clear();
+ self.resolved_types.clear();
}
Add this helper in the impl block:
fn type_key(type_ref: &TypeRef) -> String {
match type_ref {
TypeRef::Named(n) => n.name.parts.iter().map(|p| p.text.as_str()).collect::<Vec<_>>().join("."),
TypeRef::List(inner) => format!("{}[]", Self::type_key(&inner.inner)),
}
}
Also applies to: 49-57, 273-276
🤖 Prompt for AI Agents
In src/core/semantic_analyzer/type_resolver.rs around lines 18-26 (and also
update usages at ~49-57 and ~273-276): the resolved_types cache currently keys
on AST TypeRef which is expensive and fragile; change the cache key to a
canonical String instead. Add a helper fn type_key(type_ref: &TypeRef) -> String
in the impl that canonicalizes all TypeRef variants (Named -> fully qualified
dot-joined name, List -> "<inner_key>[]", Optional/Nullable -> "<inner_key>?" or
whatever nullable variant exists, etc.), then update the HashMap definition to
HashMap<String, ResolvedType> and change all insert/lookup calls to call
type_key(&type_ref) before using the map. Ensure every place that compared or
cloned TypeRef now uses the string key to avoid heavy clones and remove any need
for TypeRef to implement Hash/Eq.
be200c8
to
c2f11d4
Compare
Summary by CodeRabbit