diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index f5ef7c9d7..fc256d27c 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -55,6 +55,39 @@ rake vcr:record[all]               # Everything
 
 Always check cassettes for leaked API keys before committing.
 
+## Optional Dependencies
+
+### Red Candle Provider
+
+The Red Candle provider enables local LLM execution using quantized GGUF models. It requires a Rust toolchain, so it's optional for contributors.
+
+**To work WITHOUT Red Candle (default):**
+```bash
+bundle install
+bundle exec rspec  # Red Candle tests will be skipped
+```
+
+**To work WITH Red Candle:**
+```bash
+# Enable the Red Candle gem group
+bundle config set --local with red_candle
+bundle install
+
+# Run tests with stubbed Red Candle (fast, default)
+bundle exec rspec
+
+# Run tests with real inference (slow, downloads models)
+RED_CANDLE_REAL_INFERENCE=true bundle exec rspec
+```
+
+**To switch back to working without Red Candle:**
+```bash
+bundle config unset with
+bundle install
+```
+
+The `bundle config` settings are stored in `.bundle/config` (gitignored), so each developer can choose their own setup without affecting others.
+
 ## Important Notes
 
 * **Never edit `models.json`, `aliases.json`, or `available-models.md`** - they're auto-generated by `rake models`
diff --git a/Gemfile b/Gemfile
index e4471200d..c6d0742ae 100644
--- a/Gemfile
+++ b/Gemfile
@@ -41,3 +41,9 @@ group :development do # rubocop:disable Metrics/BlockLength
   # Optional dependency for Vertex AI
   gem 'googleauth'
 end
+
+# Optional group for Red Candle provider (requires Rust toolchain)
+# To include: bundle config set --local with red-candle
+group :red_candle, optional: true do
+  gem 'red-candle', '~> 1.3'
+end
diff --git a/README.md b/README.md
index 11af5e99b..a92c73636 100644
--- a/README.md
+++ b/README.md
@@ -118,7 +118,7 @@ response = chat.with_schema(ProductSchema).ask "Analyze this product", with: "pr
 * **Rails:** ActiveRecord integration with `acts_as_chat`
 * **Async:** Fiber-based concurrency
 * **Model registry:** 500+ models with capability detection and pricing
-* **Providers:** OpenAI, Anthropic, Gemini, VertexAI, Bedrock, DeepSeek, Mistral, Ollama, OpenRouter, Perplexity, GPUStack, and any OpenAI-compatible API
+* **Providers:** OpenAI, Anthropic, Gemini, VertexAI, Bedrock, DeepSeek, Mistral, Ollama, OpenRouter, Perplexity, GPUStack, [RedCandle](https://github.com/scientist-labs/red-candle), and any OpenAI-compatible API
 
 ## Installation
 
diff --git a/docs/_advanced/models.md b/docs/_advanced/models.md
index 26bbe1f1a..8afccc4c9 100644
--- a/docs/_advanced/models.md
+++ b/docs/_advanced/models.md
@@ -95,6 +95,33 @@ RubyLLM.models.refresh!(remote_only: true)
 
 This is useful when you want to refresh only cloud-based models without querying local model servers.
 
+### Dynamic Model Registration (Red Candle)
+
+Some providers register their models dynamically at runtime rather than through the models.json file. Red Candle is one such provider - it registers its GGUF models when the gem is loaded.
+
+**How Red Candle Models Work:**
+
+1. **Not in models.json**: Red Candle models don't appear in the static models.json file since they're only available when the gem is installed.
+
+2. **Dynamic Registration**: When ruby_llm.rb loads and Red Candle is available, it adds models to the in-memory registry:
+   ```ruby
+   # This happens automatically in lib/ruby_llm.rb
+   RubyLLM::Providers::RedCandle.models.each do |model|
+     RubyLLM.models.instance_variable_get(:@models) << model
+   end
+   ```
+
+3. **Excluded from refresh!**: The `refresh!(remote_only: true)` flag excludes Red Candle and other local providers.
+
+4. **Currently Supported Models**:
+   - `google/gemma-3-4b-it-qat-q4_0-gguf`
+   - `TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF`
+   - `TheBloke/Mistral-7B-Instruct-v0.2-GGUF`
+   - `Qwen/Qwen2.5-1.5B-Instruct-GGUF`
+   - `microsoft/Phi-3-mini-4k-instruct`
+
+Red Candle models are only available when the gem is installed with the red_candle group enabled. See the [Configuration Guide]({% link _getting_started/configuration.md %}) for installation instructions.
+
 **For Gem Development:**
 
 The `rake models:update` task is designed for gem maintainers and updates the `models.json` file shipped with the gem:
diff --git a/docs/_getting_started/configuration.md b/docs/_getting_started/configuration.md
index e34b30ad6..7f0c939a6 100644
--- a/docs/_getting_started/configuration.md
+++ b/docs/_getting_started/configuration.md
@@ -64,6 +64,7 @@ RubyLLM.configure do |config|
   config.ollama_api_base = 'http://localhost:11434/v1'
   config.gpustack_api_base = ENV['GPUSTACK_API_BASE']
   config.gpustack_api_key = ENV['GPUSTACK_API_KEY']
+  # Red Candle (optional - see below)
 
   # AWS Bedrock (uses standard AWS credential chain if not set)
   config.bedrock_api_key = ENV['AWS_ACCESS_KEY_ID']
@@ -90,6 +91,37 @@ end
 
 These headers are optional and only needed for organization-specific billing or project tracking.
 
+### Red Candle (Local GGUF Models)
+
+Red Candle is an optional provider that enables local execution of quantized GGUF models. To use it, add the red-candle gem to your Gemfile:
+
+```ruby
+# Gemfile
+gem 'ruby_llm'
+gem 'red-candle'  # Optional: for local GGUF model execution
+```
+
+Then install:
+
+```bash
+bundle install
+```
+
+Red Candle requires no API keys since it runs models locally. Some models may require HuggingFace authentication:
+
+```bash
+huggingface-cli login  # Required for some gated models
+```
+
+See [Red Candle's HuggingFace guide](https://github.com/scientist-labs/red-candle/blob/main/docs/HUGGINGFACE.md) for details on authentication.
+
+Once configured, you can use it like any other provider:
+
+```ruby
+chat = RubyLLM.chat(model: 'TheBloke/Mistral-7B-Instruct-v0.2-GGUF', provider: :red_candle)
+response = chat.ask("Hello!")
+```
+
 ## Custom Endpoints
 
 ### OpenAI-Compatible APIs
diff --git a/docs/_reference/available-models.md b/docs/_reference/available-models.md
index b76946b46..bf289f282 100644
--- a/docs/_reference/available-models.md
+++ b/docs/_reference/available-models.md
@@ -27,6 +27,7 @@ redirect_from:
 - **OpenRouter**: Direct API
 - **Others**: Local capabilities files
 
+
 ## Last Updated
 {: .d-inline-block }
 
@@ -2493,3 +2494,20 @@ Models that generate embeddings:
 | text-embedding-3-small | openai | - | - | In: $0.02, Out: $0.02 |
 | text-embedding-ada-002 | openai | - | - | In: $0.10, Out: $0.10 |
 
+
+## Local Providers
+
+### Red Candle (5)
+
+Red Candle enables local execution of quantized GGUF models. These models run on your machine with no API costs.
+
+| Model | Provider | Context | Max Output | Standard Pricing (per 1M tokens) |
+| :-- | :-- | --: | --: | :-- |
+| google/gemma-3-4b-it-qat-q4_0-gguf | red_candle | 8192 | 512 | Free (local execution) |
+| TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF | red_candle | 2048 | 512 | Free (local execution) |
+| TheBloke/Mistral-7B-Instruct-v0.2-GGUF | red_candle | 32768 | 512 | Free (local execution) |
+| Qwen/Qwen2.5-1.5B-Instruct-GGUF | red_candle | 32768 | 512 | Free (local execution) |
+| microsoft/Phi-3-mini-4k-instruct | red_candle | 4096 | 512 | Free (local execution) |
+
+> **Note:** Local providers (Ollama, GPUStack, Red Candle) register their models dynamically at runtime based on what's installed locally. Ollama and GPUStack models depend on what you've pulled or configured on your system. Red Candle requires the `red-candle` gem. See the [Configuration Guide]({% link _getting_started/configuration.md %}) for setup instructions.
+{: .note }
diff --git a/docs/index.md b/docs/index.md
index 9f4a2477c..abe328a3e 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -67,6 +67,10 @@ permalink: /
     <img src="https://registry.npmmirror.com/@lobehub/icons-static-svg/latest/files/icons/vertexai-color.svg" alt="VertexAI" class="logo-medium">
     <img src="https://registry.npmmirror.com/@lobehub/icons-static-svg/latest/files/icons/vertexai-text.svg" alt="VertexAI" class="logo-small">
   </div>
+  <div class="provider-logo">
+    <img src="https://raw.githubusercontent.com/scientist-labs/red-candle/refs/heads/main/docs/assets/red-candle-icon.svg" alt="red-candle" class="logo-medium">
+    <img src="https://raw.githubusercontent.com/scientist-labs/red-candle/refs/heads/main/docs/assets/red-candle-text.svg" alt="red-candle" class="logo-small">
+  </div>
 </div>
 
 <div class="badge-container">
diff --git a/gemfiles/rails_7.1.gemfile b/gemfiles/rails_7.1.gemfile
index 675cb178e..39d07214e 100644
--- a/gemfiles/rails_7.1.gemfile
+++ b/gemfiles/rails_7.1.gemfile
@@ -35,4 +35,8 @@ group :development do
   gem "googleauth"
 end
 
+group :red_candle, optional: true do
+  gem "red-candle", "~> 1.2"
+end
+
 gemspec path: "../"
diff --git a/gemfiles/rails_7.1.gemfile.lock b/gemfiles/rails_7.1.gemfile.lock
index 6ec64bf96..b6524ab84 100644
--- a/gemfiles/rails_7.1.gemfile.lock
+++ b/gemfiles/rails_7.1.gemfile.lock
@@ -148,6 +148,7 @@ GEM
       concurrent-ruby (~> 1.1)
       webrick (~> 1.7)
       websocket-driver (~> 0.7)
+    ffi (1.17.2-arm64-darwin)
     ffi (1.17.2-x86_64-linux-gnu)
     fiber-annotation (0.2.0)
     fiber-local (1.1.0)
@@ -287,6 +288,9 @@ GEM
       zeitwerk (~> 2.6)
     rainbow (3.1.1)
     rake (13.3.0)
+    rake-compiler-dock (1.9.1)
+    rb_sys (0.9.117)
+      rake-compiler-dock (= 1.9.1)
     rdoc (6.14.2)
       erb
       psych (>= 4.0.0)
@@ -380,6 +384,7 @@ GEM
     zeitwerk (2.7.3)
 
 PLATFORMS
+  arm64-darwin-24
   x86_64-linux
 
 DEPENDENCIES
@@ -401,6 +406,7 @@ DEPENDENCIES
   pry (>= 0.14)
   rails (~> 7.1.0)
   rake (>= 13.0)
+  red-candle (~> 1.2)
   reline
   rspec (~> 3.12)
   rubocop (>= 1.0)
diff --git a/gemfiles/rails_7.2.gemfile b/gemfiles/rails_7.2.gemfile
index 4922afb60..b216fc61a 100644
--- a/gemfiles/rails_7.2.gemfile
+++ b/gemfiles/rails_7.2.gemfile
@@ -35,4 +35,8 @@ group :development do
   gem "googleauth"
 end
 
+group :red_candle, optional: true do
+  gem "red-candle", "~> 1.2"
+end
+
 gemspec path: "../"
diff --git a/gemfiles/rails_7.2.gemfile.lock b/gemfiles/rails_7.2.gemfile.lock
index 5c672e34c..7428f6e8e 100644
--- a/gemfiles/rails_7.2.gemfile.lock
+++ b/gemfiles/rails_7.2.gemfile.lock
@@ -142,6 +142,7 @@ GEM
       concurrent-ruby (~> 1.1)
       webrick (~> 1.7)
       websocket-driver (~> 0.7)
+    ffi (1.17.2-arm64-darwin)
     ffi (1.17.2-x86_64-linux-gnu)
     fiber-annotation (0.2.0)
     fiber-local (1.1.0)
@@ -280,6 +281,9 @@ GEM
       zeitwerk (~> 2.6)
     rainbow (3.1.1)
     rake (13.3.0)
+    rake-compiler-dock (1.9.1)
+    rb_sys (0.9.117)
+      rake-compiler-dock (= 1.9.1)
     rdoc (6.14.2)
       erb
       psych (>= 4.0.0)
@@ -374,6 +378,7 @@ GEM
     zeitwerk (2.7.3)
 
 PLATFORMS
+  arm64-darwin-24
   x86_64-linux
 
 DEPENDENCIES
@@ -395,6 +400,7 @@ DEPENDENCIES
   pry (>= 0.14)
   rails (~> 7.2.0)
   rake (>= 13.0)
+  red-candle (~> 1.2)
   reline
   rspec (~> 3.12)
   rubocop (>= 1.0)
diff --git a/gemfiles/rails_8.0.gemfile b/gemfiles/rails_8.0.gemfile
index f890433bf..abd42e7e3 100644
--- a/gemfiles/rails_8.0.gemfile
+++ b/gemfiles/rails_8.0.gemfile
@@ -35,4 +35,8 @@ group :development do
   gem "googleauth"
 end
 
+group :red_candle, optional: true do
+  gem "red-candle", "~> 1.2"
+end
+
 gemspec path: "../"
diff --git a/gemfiles/rails_8.0.gemfile.lock b/gemfiles/rails_8.0.gemfile.lock
index 86db7d4a0..6a148a2bc 100644
--- a/gemfiles/rails_8.0.gemfile.lock
+++ b/gemfiles/rails_8.0.gemfile.lock
@@ -142,6 +142,7 @@ GEM
       concurrent-ruby (~> 1.1)
       webrick (~> 1.7)
       websocket-driver (~> 0.7)
+    ffi (1.17.2-arm64-darwin)
     ffi (1.17.2-x86_64-linux-gnu)
     fiber-annotation (0.2.0)
     fiber-local (1.1.0)
@@ -281,6 +282,9 @@ GEM
       zeitwerk (~> 2.6)
     rainbow (3.1.1)
     rake (13.3.0)
+    rake-compiler-dock (1.9.1)
+    rb_sys (0.9.117)
+      rake-compiler-dock (= 1.9.1)
     rdoc (6.14.2)
       erb
       psych (>= 4.0.0)
@@ -376,6 +380,7 @@ GEM
     zeitwerk (2.7.3)
 
 PLATFORMS
+  arm64-darwin-24
   x86_64-linux
 
 DEPENDENCIES
@@ -397,6 +402,7 @@ DEPENDENCIES
   pry (>= 0.14)
   rails (~> 8.0.0)
   rake (>= 13.0)
+  red-candle (~> 1.2)
   reline
   rspec (~> 3.12)
   rubocop (>= 1.0)
diff --git a/lib/ruby_llm.rb b/lib/ruby_llm.rb
index 7bb5f2808..54e6d63e6 100644
--- a/lib/ruby_llm.rb
+++ b/lib/ruby_llm.rb
@@ -94,6 +94,20 @@ def logger
 RubyLLM::Provider.register :perplexity, RubyLLM::Providers::Perplexity
 RubyLLM::Provider.register :vertexai, RubyLLM::Providers::VertexAI
 
+# Optional Red Candle provider - only available if gem is installed
+begin
+  require 'candle'
+  require 'ruby_llm/providers/red_candle'
+  RubyLLM::Provider.register :red_candle, RubyLLM::Providers::RedCandle
+
+  # Register Red Candle models with the global registry
+  RubyLLM::Providers::RedCandle.models.each do |model|
+    RubyLLM.models.instance_variable_get(:@models) << model
+  end
+rescue LoadError
+  # Red Candle is optional - provider won't be available if gem isn't installed
+end
+
 if defined?(Rails::Railtie)
   require 'ruby_llm/railtie'
   require 'ruby_llm/active_record/acts_as'
diff --git a/lib/ruby_llm/configuration.rb b/lib/ruby_llm/configuration.rb
index eda2c3354..ae0f92a75 100644
--- a/lib/ruby_llm/configuration.rb
+++ b/lib/ruby_llm/configuration.rb
@@ -23,6 +23,8 @@ class Configuration
                   :gpustack_api_base,
                   :gpustack_api_key,
                   :mistral_api_key,
+                  # Red Candle configuration
+                  :red_candle_device,
                   # Default models
                   :default_model,
                   :default_embedding_model,
diff --git a/lib/ruby_llm/providers/red_candle.rb b/lib/ruby_llm/providers/red_candle.rb
new file mode 100644
index 000000000..05a78fc89
--- /dev/null
+++ b/lib/ruby_llm/providers/red_candle.rb
@@ -0,0 +1,90 @@
+# frozen_string_literal: true
+
+module RubyLLM
+  module Providers
+    # Red Candle provider for local LLM execution using the Candle Rust crate.
+    class RedCandle < Provider
+      include RedCandle::Chat
+      include RedCandle::Models
+      include RedCandle::Capabilities
+      include RedCandle::Streaming
+
+      def initialize(config)
+        ensure_red_candle_available!
+        super
+        @loaded_models = {} # Cache for loaded models
+        @device = determine_device(config)
+      end
+
+      def api_base
+        nil # Local execution, no API base needed
+      end
+
+      def headers
+        {} # No HTTP headers needed
+      end
+
+      class << self
+        def capabilities
+          RedCandle::Capabilities
+        end
+
+        def configuration_requirements
+          [] # No required config, device is optional
+        end
+
+        def local?
+          true
+        end
+
+        def supports_functions?(model_id = nil)
+          RedCandle::Capabilities.supports_functions?(model_id)
+        end
+
+        def models
+          # Return Red Candle models for registration
+          RedCandle::Models::SUPPORTED_MODELS.map do |model_data|
+            Model::Info.new(
+              id: model_data[:id],
+              name: model_data[:name],
+              provider: 'red_candle',
+              type: 'chat',
+              family: model_data[:family],
+              context_window: model_data[:context_window],
+              capabilities: %w[streaming structured_output],
+              modalities: { input: %w[text], output: %w[text] }
+            )
+          end
+        end
+      end
+
+      private
+
+      def ensure_red_candle_available!
+        require 'candle'
+      rescue LoadError
+        raise Error.new(nil, "Red Candle gem is not installed. Add 'gem \"red-candle\", \"~> 1.2.3\"' to your Gemfile.")
+      end
+
+      def determine_device(config)
+        if config.red_candle_device
+          case config.red_candle_device.to_s.downcase
+          when 'cpu'
+            ::Candle::Device.cpu
+          when 'cuda', 'gpu'
+            ::Candle::Device.cuda
+          when 'metal'
+            ::Candle::Device.metal
+          else
+            ::Candle::Device.best
+          end
+        else
+          ::Candle::Device.best
+        end
+      rescue StandardError => e
+        RubyLLM.logger.warn "Failed to initialize device: #{e.message}. Falling back to CPU."
+        ::Candle::Device.cpu
+      end
+    end
+  end
+end
diff --git a/lib/ruby_llm/providers/red_candle/capabilities.rb b/lib/ruby_llm/providers/red_candle/capabilities.rb
new file mode 100644
index 000000000..ec0afb6b7
--- /dev/null
+++ b/lib/ruby_llm/providers/red_candle/capabilities.rb
@@ -0,0 +1,114 @@
+# frozen_string_literal: true
+
+module RubyLLM
+  module Providers
+    class RedCandle
+      # Determines capabilities for RedCandle models
+      module Capabilities
+        module_function
+
+        def supports_vision?
+          false
+        end
+
+        def supports_functions?(_model_id = nil)
+          false
+        end
+
+        def supports_streaming?
+          true
+        end
+
+        def supports_structured_output?
+          true
+        end
+
+        def supports_regex_constraints?
+          true
+        end
+
+        def supports_embeddings?
+          false # Future enhancement - Red Candle does support embedding models
+        end
+
+        def supports_audio?
+          false
+        end
+
+        def supports_pdf?
+          false
+        end
+
+        def normalize_temperature(temperature, _model_id)
+          # Red Candle uses standard 0-2 range
+          return 0.7 if temperature.nil?
+
+          temperature = temperature.to_f
+          temperature.clamp(0.0, 2.0)
+        end
+
+        def model_context_window(model_id)
+          case model_id
+          when /gemma-3-4b/i
+            8192
+          when /qwen2\.5-1\.5b/i, /mistral-7b/i
+            32_768
+          when /tinyllama/i
+            2048
+          else
+            4096 # Conservative default
+          end
+        end
+
+        def default_max_tokens
+          512
+        end
+
+        def max_temperature
+          2.0
+        end
+
+        def min_temperature
+          0.0
+        end
+
+        def supports_temperature?
+          true
+        end
+
+        def supports_top_p?
+          true
+        end
+
+        def supports_top_k?
+          true
+        end
+
+        def supports_repetition_penalty?
+          true
+        end
+
+        def supports_seed?
+          true
+        end
+
+        def supports_stop_sequences?
+          true
+        end
+
+        def model_families
+          %w[gemma llama qwen2 mistral phi]
+        end
+
+        def available_on_platform?
+          # Check if Candle can be loaded
+
+          require 'candle'
+          true
+        rescue LoadError
+          false
+        end
+      end
+    end
+  end
+end
diff --git a/lib/ruby_llm/providers/red_candle/chat.rb b/lib/ruby_llm/providers/red_candle/chat.rb
new file mode 100644
index 000000000..915c2075b
--- /dev/null
+++ b/lib/ruby_llm/providers/red_candle/chat.rb
@@ -0,0 +1,315 @@
+# frozen_string_literal: true
+
+module RubyLLM
+  module Providers
+    class RedCandle
+      # Chat implementation for Red Candle provider
+      module Chat
+        # Override the base complete method to handle local execution
+        def complete(messages, tools:, temperature:, model:, params: {}, headers: {}, schema: nil, &) # rubocop:disable Metrics/ParameterLists
+          _ = headers # Interface compatibility
+          payload = Utils.deep_merge(
+            render_payload(
+              messages,
+              tools: tools,
+              temperature: temperature,
+              model: model,
+              stream: block_given?,
+              schema: schema
+            ),
+            params
+          )
+
+          if block_given?
+            perform_streaming_completion!(payload, &)
+          else
+            result = perform_completion!(payload)
+            # Convert to Message object for compatibility
+            # Red Candle doesn't provide token counts by default, but we can estimate them
+            content = result[:content]
+            # Rough estimation: ~4 characters per token
+            estimated_output_tokens = (content.length / 4.0).round
+            estimated_input_tokens = estimate_input_tokens(payload[:messages])
+
+            Message.new(
+              role: result[:role].to_sym,
+              content: content,
+              model_id: model.id,
+              input_tokens: estimated_input_tokens,
+              output_tokens: estimated_output_tokens
+            )
+          end
+        end
+
+        def render_payload(messages, tools:, temperature:, model:, stream:, schema:) # rubocop:disable Metrics/ParameterLists
+          # Red Candle doesn't support tools
+          raise Error.new(nil, 'Red Candle provider does not support tool calling') if tools && !tools.empty?
+
+          {
+            messages: messages,
+            temperature: temperature,
+            model: model.id,
+            stream: stream,
+            schema: schema
+          }
+        end
+
+        def perform_completion!(payload)
+          model = ensure_model_loaded!(payload[:model])
+          messages = format_messages(payload[:messages])
+
+          # Apply chat template if available
+          prompt = if model.respond_to?(:apply_chat_template)
+                     model.apply_chat_template(messages)
+                   else
+                     # Fallback to simple formatting
+                     "#{messages.map { |m| "#{m[:role]}: #{m[:content]}" }.join("\n\n")}\n\nassistant:"
+                   end
+
+          # Check context length
+          validate_context_length!(prompt, payload[:model])
+
+          # Configure generation
+          config_opts = {
+            temperature: payload[:temperature] || 0.7,
+            max_length: payload[:max_tokens] || 512
+          }
+
+          # Handle structured generation if schema provided
+          response = if payload[:schema]
+                       generate_with_schema(model, prompt, payload[:schema], config_opts)
+                     else
+                       model.generate(
+                         prompt,
+                         config: ::Candle::GenerationConfig.balanced(**config_opts)
+                       )
+                     end
+
+          format_response(response, payload[:schema])
+        end
+
+        def perform_streaming_completion!(payload, &block)
+          model = ensure_model_loaded!(payload[:model])
+          messages = format_messages(payload[:messages])
+
+          # Apply chat template if available
+          prompt = if model.respond_to?(:apply_chat_template)
+                     model.apply_chat_template(messages)
+                   else
+                     "#{messages.map { |m| "#{m[:role]}: #{m[:content]}" }.join("\n\n")}\n\nassistant:"
+                   end
+
+          # Check context length
+          validate_context_length!(prompt, payload[:model])
+
+          # Configure generation
+          config = ::Candle::GenerationConfig.balanced(
+            temperature: payload[:temperature] || 0.7,
+            max_length: payload[:max_tokens] || 512
+          )
+
+          # Collect all streamed content
+          full_content = ''
+
+          # Stream tokens
+          model.generate_stream(prompt, config: config) do |token|
+            full_content += token
+            chunk = format_stream_chunk(token)
+            block.call(chunk)
+          end
+
+          # Send final chunk with empty content (indicates completion)
+          final_chunk = format_stream_chunk('')
+          block.call(final_chunk)
+
+          # Return a Message object with the complete response
+          estimated_output_tokens = (full_content.length / 4.0).round
+          estimated_input_tokens = estimate_input_tokens(payload[:messages])
+
+          Message.new(
+            role: :assistant,
+            content: full_content,
+            model_id: payload[:model],
+            input_tokens: estimated_input_tokens,
+            output_tokens: estimated_output_tokens
+          )
+        end
+
+        private
+
+        def ensure_model_loaded!(model_id)
+          @loaded_models[model_id] ||= load_model(model_id)
+        end
+
+        def model_options(model_id)
+          # Get GGUF file and tokenizer if this is a GGUF model
+          # Access the methods from the Models module which is included in the provider
+          options = { device: @device }
+          options[:gguf_file] = gguf_file_for(model_id) if respond_to?(:gguf_file_for)
+          options[:tokenizer] = tokenizer_for(model_id) if respond_to?(:tokenizer_for)
+          options
+        end
+
+        def load_model(model_id)
+          ::Candle::LLM.from_pretrained(model_id, **model_options(model_id))
+        rescue StandardError => e
+          if e.message.include?('Failed to find tokenizer')
+            raise Error.new(nil, token_error_message(e, options[:tokenizer]))
+          elsif e.message.include?('Failed to find model')
+            raise Error.new(nil, model_error_message(e, model_id))
+          else
+            raise Error.new(nil, "Failed to load model #{model_id}: #{e.message}")
+          end
+        end
+
+        def token_error_message(exception, tokenizer)
+          <<~ERROR_MESSAGE
+            Failed to load tokenizer '#{tokenizer}'. The tokenizer may not exist or require authentication.
+            Please verify the tokenizer exists at: https://huggingface.co/#{tokenizer}
+            And that you have accepted the terms of service for the tokenizer.
+            If it requires authentication, login with: huggingface-cli login
+            See https://github.com/scientist-labs/red-candle?tab=readme-ov-file#%EF%B8%8F-huggingface-login-warning
+            Original error: #{exception.message}"
+          ERROR_MESSAGE
+        end
+
+        def model_error_message(exception, model_id)
+          <<~ERROR_MESSAGE
+            Failed to load model #{model_id}: #{exception.message}
+            Please verify the model exists at: https://huggingface.co/#{model_id}
+            And that you have accepted the terms of service for the model.
+            If it requires authentication, login with: huggingface-cli login
+            See https://github.com/scientist-labs/red-candle?tab=readme-ov-file#%EF%B8%8F-huggingface-login-warning
+            Original error: #{e.message}"
+          ERROR_MESSAGE
+        end
+
+        def format_messages(messages)
+          messages.map do |msg|
+            # Handle both hash and Message objects
+            if msg.is_a?(Message)
+              {
+                role: msg.role.to_s,
+                content: extract_message_content_from_object(msg)
+              }
+            else
+              {
+                role: msg[:role].to_s,
+                content: extract_message_content(msg)
+              }
+            end
+          end
+        end
+
+        def extract_message_content_from_object(message)
+          content = message.content
+
+          # Handle Content objects
+          if content.is_a?(Content)
+            # Extract text from Content object, including attachment text
+            handle_content_object(content)
+          elsif content.is_a?(String)
+            content
+          else
+            content.to_s
+          end
+        end
+
+        def extract_message_content(message)
+          content = message[:content]
+
+          # Handle Content objects
+          case content
+          when Content
+            # Extract text from Content object
+            handle_content_object(content)
+          when String
+            content
+          when Array
+            # Handle array content (e.g., with images)
+            content.filter_map { |part| part[:text] if part[:type] == 'text' }.join(' ')
+          else
+            content.to_s
+          end
+        end
+
+        def handle_content_object(content)
+          text_parts = []
+          text_parts << content.text if content.text
+
+          # Add any text from attachments
+          content.attachments&.each do |attachment|
+            text_parts << attachment.data if attachment.respond_to?(:data) && attachment.data.is_a?(String)
+          end
+
+          text_parts.join(' ')
+        end
+
+        def generate_with_schema(model, prompt, schema, config_opts)
+          model.generate_structured(
+            prompt,
+            schema: schema,
+            **config_opts
+          )
+        rescue StandardError => e
+          RubyLLM.logger.warn "Structured generation failed: #{e.message}. Falling back to regular generation."
+          model.generate(
+            prompt,
+            config: ::Candle::GenerationConfig.balanced(**config_opts)
+          )
+        end
+
+        def format_response(response, schema)
+          content = if schema && !response.is_a?(String)
+                      # Structured response
+                      JSON.generate(response)
+                    else
+                      response
+                    end
+
+          {
+            content: content,
+            role: 'assistant'
+          }
+        end
+
+        def format_stream_chunk(token)
+          # Return a Chunk object for streaming compatibility
+          Chunk.new(
+            role: :assistant,
+            content: token
+          )
+        end
+
+        def estimate_input_tokens(messages)
+          # Rough estimation: ~4 characters per token
+          formatted = format_messages(messages)
+          total_chars = formatted.sum { |msg| "#{msg[:role]}: #{msg[:content]}".length }
+          (total_chars / 4.0).round
+        end
+
+        def validate_context_length!(prompt, model_id)
+          # Get the context window for this model
+          context_window = if respond_to?(:model_context_window)
+                             model_context_window(model_id)
+                           else
+                             4096 # Conservative default
+                           end
+
+          # Estimate tokens in prompt (~4 characters per token)
+          estimated_tokens = (prompt.length / 4.0).round
+
+          # Check if prompt exceeds context window (leave some room for response)
+          max_input_tokens = context_window - 512 # Reserve 512 tokens for response
+          return unless estimated_tokens > max_input_tokens
+
+          raise Error.new(
+            nil,
+            "Context length exceeded. Estimated #{estimated_tokens} tokens, " \
+            "but model #{model_id} has a context window of #{context_window} tokens."
+          )
+        end
+      end
+    end
+  end
+end
diff --git a/lib/ruby_llm/providers/red_candle/models.rb b/lib/ruby_llm/providers/red_candle/models.rb
new file mode 100644
index 000000000..fbfc8a038
--- /dev/null
+++ b/lib/ruby_llm/providers/red_candle/models.rb
@@ -0,0 +1,121 @@
+# frozen_string_literal: true
+
+module RubyLLM
+  module Providers
+    class RedCandle
+      # Models methods of the RedCandle integration
+      module Models
+        # TODO: red-candle supports more models, but let's start with some well tested ones.
+        SUPPORTED_MODELS = [
+          {
+            id: 'google/gemma-3-4b-it-qat-q4_0-gguf',
+            name: 'Gemma 3 4B Instruct (Quantized)',
+            gguf_file: 'gemma-3-4b-it-q4_0.gguf',
+            tokenizer: 'google/gemma-3-4b-it', # Tokenizer from base model
+            context_window: 8192,
+            family: 'gemma',
+            architecture: 'gemma2',
+            supports_chat: true,
+            supports_structured: true
+          },
+          {
+            id: 'TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF',
+            name: 'TinyLlama 1.1B Chat (Quantized)',
+            gguf_file: 'tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf',
+            context_window: 2048,
+            family: 'llama',
+            architecture: 'llama',
+            supports_chat: true,
+            supports_structured: true
+          },
+          {
+            id: 'TheBloke/Mistral-7B-Instruct-v0.2-GGUF',
+            name: 'Mistral 7B Instruct v0.2 (Quantized)',
+            gguf_file: 'mistral-7b-instruct-v0.2.Q4_K_M.gguf',
+            tokenizer: 'mistralai/Mistral-7B-Instruct-v0.2',
+            context_window: 32_768,
+            family: 'mistral',
+            architecture: 'mistral',
+            supports_chat: true,
+            supports_structured: true
+          },
+          {
+            id: 'Qwen/Qwen2.5-1.5B-Instruct-GGUF',
+            name: 'Qwen 2.1.5B Instruct (Quantized)',
+            gguf_file: 'qwen2.5-1.5b-instruct-q4_k_m.gguf',
+            context_window: 32_768,
+            family: 'qwen2',
+            architecture: 'qwen2',
+            supports_chat: true,
+            supports_structured: true
+          },
+          {
+            id: 'microsoft/Phi-3-mini-4k-instruct',
+            name: 'Phi 3',
+            context_window: 4096,
+            family: 'phi',
+            architecture: 'phi',
+            supports_chat: true,
+            supports_structured: true
+          }
+        ].freeze
+
+        def list_models
+          SUPPORTED_MODELS.map do |model_data|
+            Model::Info.new(
+              id: model_data[:id],
+              name: model_data[:name],
+              provider: slug,
+              family: model_data[:family],
+              context_window: model_data[:context_window],
+              capabilities: %w[streaming structured_output],
+              modalities: { input: %w[text], output: %w[text] }
+            )
+          end
+        end
+
+        def models
+          @models ||= list_models
+        end
+
+        def model(id)
+          models.find { |m| m.id == id } ||
+            raise(Error.new(nil,
+                            "Model #{id} not found in Red Candle provider. Available models: #{model_ids.join(', ')}"))
+        end
+
+        def model_available?(id)
+          SUPPORTED_MODELS.any? { |m| m[:id] == id }
+        end
+
+        def model_ids
+          SUPPORTED_MODELS.map { |m| m[:id] }
+        end
+
+        def model_info(id)
+          SUPPORTED_MODELS.find { |m| m[:id] == id }
+        end
+
+        def supports_chat?(model_id)
+          info = model_info(model_id)
+          info ? info[:supports_chat] : false
+        end
+
+        def supports_structured?(model_id)
+          info = model_info(model_id)
+          info ? info[:supports_structured] : false
+        end
+
+        def gguf_file_for(model_id)
+          info = model_info(model_id)
+          info ? info[:gguf_file] : nil
+        end
+
+        def tokenizer_for(model_id)
+          info = model_info(model_id)
+          info ? info[:tokenizer] : nil
+        end
+      end
+    end
+  end
+end
diff --git a/lib/ruby_llm/providers/red_candle/streaming.rb b/lib/ruby_llm/providers/red_candle/streaming.rb
new file mode 100644
index 000000000..a8305ffdd
--- /dev/null
+++ b/lib/ruby_llm/providers/red_candle/streaming.rb
@@ -0,0 +1,40 @@
+# frozen_string_literal: true
+
+module RubyLLM
+  module Providers
+    class RedCandle
+      # Streaming methods of the RedCandle integration
+      module Streaming
+        def stream(payload, &block)
+          if payload[:stream]
+            perform_streaming_completion!(payload, &block)
+          else
+            # Non-streaming fallback
+            result = perform_completion!(payload)
+            # Yield the complete result as a single chunk
+            chunk = {
+              content: result[:content],
+              role: result[:role],
+              finish_reason: result[:finish_reason]
+            }
+            block.call(chunk)
+          end
+        end
+
+        private
+
+        def stream_processor
+          # Red Candle handles streaming internally through blocks
+          # This method is here for compatibility with the base streaming interface
+          nil
+        end
+
+        def process_stream_response(response)
+          # Red Candle doesn't use HTTP responses
+          # Streaming is handled directly in perform_streaming_completion!
+          response
+        end
+      end
+    end
+  end
+end
diff --git a/spec/ruby_llm/chat_error_spec.rb b/spec/ruby_llm/chat_error_spec.rb
index a5dfd8a74..eeefbf64d 100644
--- a/spec/ruby_llm/chat_error_spec.rb
+++ b/spec/ruby_llm/chat_error_spec.rb
@@ -72,7 +72,8 @@
         let(:chat) { RubyLLM.chat(model: model, provider: provider) }
 
         it 'handles context length exceeded errors' do
-          if RubyLLM::Provider.providers[provider]&.local?
+          # Skip for local providers that don't validate context length
+          if RubyLLM::Provider.providers[provider]&.local? && provider != :red_candle
             skip('Local providers do not throw an error for context length exceeded')
           end
 
diff --git a/spec/ruby_llm/chat_spec.rb b/spec/ruby_llm/chat_spec.rb
index 1c775d11e..a63de4e55 100644
--- a/spec/ruby_llm/chat_spec.rb
+++ b/spec/ruby_llm/chat_spec.rb
@@ -20,6 +20,9 @@
       end
 
       it "#{provider}/#{model} returns raw responses" do
+        # Red Candle is a truly local provider and doesn't have HTTP responses
+        skip 'Red Candle provider does not have raw HTTP responses' if provider == :red_candle
+
         chat = RubyLLM.chat(model: model, provider: provider)
         response = chat.ask('What is the capital of France?')
         expect(response.raw).to be_present
diff --git a/spec/ruby_llm/chat_streaming_spec.rb b/spec/ruby_llm/chat_streaming_spec.rb
index fc6ee8d9a..5c61d9d5b 100644
--- a/spec/ruby_llm/chat_streaming_spec.rb
+++ b/spec/ruby_llm/chat_streaming_spec.rb
@@ -20,11 +20,15 @@
 
         expect(chunks).not_to be_empty
         expect(chunks.first).to be_a(RubyLLM::Chunk)
-        expect(response.raw).to be_present
-        expect(response.raw.headers).to be_present
-        expect(response.raw.status).to be_present
-        expect(response.raw.status).to eq(200)
-        expect(response.raw.env.request_body).to be_present
+
+        # Red Candle is a local provider without HTTP responses
+        unless provider == :red_candle
+          expect(response.raw).to be_present
+          expect(response.raw.headers).to be_present
+          expect(response.raw.status).to be_present
+          expect(response.raw.status).to eq(200)
+          expect(response.raw.env.request_body).to be_present
+        end
       end
 
       it "#{provider}/#{model} reports consistent token counts compared to non-streaming" do
@@ -60,6 +64,7 @@
           end
 
           it "#{provider}/#{model} supports handling streaming error chunks" do
+            skip 'Red Candle is a local provider without HTTP streaming errors' if provider == :red_candle
             # Testing if error handling is now implemented
 
             stub_error_response(provider, :chunk)
@@ -75,6 +80,7 @@
 
           it "#{provider}/#{model} supports handling streaming error events" do
             skip 'Bedrock uses AWS Event Stream format, not SSE events' if provider == :bedrock
+            skip 'Red Candle is a local provider without HTTP streaming errors' if provider == :red_candle
 
             # Testing if error handling is now implemented
 
@@ -96,6 +102,7 @@
           end
 
           it "#{provider}/#{model} supports handling streaming error chunks" do
+            skip 'Red Candle is a local provider without HTTP streaming errors' if provider == :red_candle
             # Testing if error handling is now implemented
 
             stub_error_response(provider, :chunk)
@@ -111,6 +118,7 @@
 
           it "#{provider}/#{model} supports handling streaming error events" do
             skip 'Bedrock uses AWS Event Stream format, not SSE events' if provider == :bedrock
+            skip 'Red Candle is a local provider without HTTP streaming errors' if provider == :red_candle
 
             # Testing if error handling is now implemented
 
diff --git a/spec/ruby_llm/chat_tools_spec.rb b/spec/ruby_llm/chat_tools_spec.rb
index 4f574a862..3679c0500 100644
--- a/spec/ruby_llm/chat_tools_spec.rb
+++ b/spec/ruby_llm/chat_tools_spec.rb
@@ -74,9 +74,9 @@ def execute(query:)
       model = model_info[:model]
       provider = model_info[:provider]
       it "#{provider}/#{model} can use tools" do
-        unless RubyLLM::Provider.providers[provider]&.local?
-          model_info = RubyLLM.models.find(model)
-          skip "#{model} doesn't support function calling" unless model_info&.supports_functions?
+        # Skip for providers that don't support function calling
+        unless provider_supports_functions?(provider, model)
+          skip "#{provider}/#{model} doesn't support function calling"
         end
 
         skip 'Flaky test for deepseek - model asks for clarification instead of exec tools' if provider == :deepseek
@@ -96,9 +96,9 @@ def execute(query:)
       model = model_info[:model]
       provider = model_info[:provider]
       it "#{provider}/#{model} can use tools in multi-turn conversations" do
-        unless RubyLLM::Provider.providers[provider]&.local?
-          model_info = RubyLLM.models.find(model)
-          skip "#{model} doesn't support function calling" unless model_info&.supports_functions?
+        # Skip for providers that don't support function calling
+        unless provider_supports_functions?(provider, model)
+          skip "#{provider}/#{model} doesn't support function calling"
         end
 
         skip 'Flaky test for deepseek' if provider == :deepseek
@@ -122,9 +122,9 @@ def execute(query:)
       model = model_info[:model]
       provider = model_info[:provider]
       it "#{provider}/#{model} can use tools without parameters" do
-        unless RubyLLM::Provider.providers[provider]&.local?
-          model_info = RubyLLM.models.find(model)
-          skip "#{model} doesn't support function calling" unless model_info&.supports_functions?
+        # Skip for providers that don't support function calling
+        unless provider_supports_functions?(provider, model)
+          skip "#{provider}/#{model} doesn't support function calling"
         end
 
         chat = RubyLLM.chat(model: model, provider: provider)
@@ -140,16 +140,15 @@ def execute(query:)
       model = model_info[:model]
       provider = model_info[:provider]
       it "#{provider}/#{model} can use tools without parameters in multi-turn streaming conversations" do
+        # Skip for providers that don't support function calling
+        unless provider_supports_functions?(provider, model)
+          skip "#{provider}/#{model} doesn't support function calling"
+        end
+
         if provider == :gpustack && model == 'qwen3'
           skip 'gpustack/qwen3 does not support streaming tool calls properly'
         end
-
         skip 'Mistral has a bug with tool arguments in multi-turn streaming' if provider == :mistral
-
-        unless RubyLLM::Provider.providers[provider]&.local?
-          model_info = RubyLLM.models.find(model)
-          skip "#{model} doesn't support function calling" unless model_info&.supports_functions?
-        end
         chat = RubyLLM.chat(model: model, provider: provider)
                       .with_tool(BestLanguageToLearn)
                       .with_instructions('You must use tools whenever possible.')
@@ -179,13 +178,13 @@ def execute(query:)
       model = model_info[:model]
       provider = model_info[:provider]
       it "#{provider}/#{model} can use tools with multi-turn streaming conversations" do
-        if provider == :gpustack && model == 'qwen3'
-          skip 'gpustack/qwen3 does not support streaming tool calls properly'
+        # Skip for providers that don't support function calling
+        unless provider_supports_functions?(provider, model)
+          skip "#{provider}/#{model} doesn't support function calling"
         end
 
-        unless RubyLLM::Provider.providers[provider]&.local?
-          model_info = RubyLLM.models.find(model)
-          skip "#{model} doesn't support function calling" unless model_info&.supports_functions?
+        if provider == :gpustack && model == 'qwen3'
+          skip 'gpustack/qwen3 does not support streaming tool calls properly'
         end
         chat = RubyLLM.chat(model: model, provider: provider)
                       .with_tool(Weather)
@@ -217,9 +216,9 @@ def execute(query:)
       model = model_info[:model]
       provider = model_info[:provider]
       it "#{provider}/#{model} can handle multiple tool calls in a single response" do
-        unless RubyLLM::Provider.providers[provider]&.local?
-          model_info = RubyLLM.models.find(model)
-          skip "#{model} doesn't support function calling" unless model_info&.supports_functions?
+        # Skip for providers that don't support function calling
+        unless provider_supports_functions?(provider, model)
+          skip "#{provider}/#{model} doesn't support function calling"
         end
 
         skip 'Flaky test for gpustack/qwen3' if provider == :gpustack && model == 'qwen3'
@@ -309,9 +308,9 @@ def execute(query:)
       model = model_info[:model]
       provider = model_info[:provider]
       it "#{provider}/#{model} preserves Content objects returned from tools" do
-        unless RubyLLM::Provider.providers[provider]&.local?
-          model_info = RubyLLM.models.find(model)
-          skip "#{model} doesn't support function calling" unless model_info&.supports_functions?
+        # Skip for providers that don't support function calling
+        unless provider_supports_functions?(provider, model)
+          skip "#{provider}/#{model} doesn't support function calling"
         end
 
         # Skip providers that don't support images in tool results
diff --git a/spec/ruby_llm/providers/red_candle/capabilities_spec.rb b/spec/ruby_llm/providers/red_candle/capabilities_spec.rb
new file mode 100644
index 000000000..03bb49f25
--- /dev/null
+++ b/spec/ruby_llm/providers/red_candle/capabilities_spec.rb
@@ -0,0 +1,108 @@
+# frozen_string_literal: true
+
+require 'spec_helper'
+
+RSpec.describe RubyLLM::Providers::RedCandle::Capabilities do
+  describe 'feature support' do
+    it 'does not support vision' do
+      expect(described_class.supports_vision?).to be false
+    end
+
+    it 'does not support functions' do
+      expect(described_class.supports_functions?).to be false
+    end
+
+    it 'supports streaming' do
+      expect(described_class.supports_streaming?).to be true
+    end
+
+    it 'supports structured output' do
+      expect(described_class.supports_structured_output?).to be true
+    end
+
+    it 'supports regex constraints' do
+      expect(described_class.supports_regex_constraints?).to be true
+    end
+
+    it 'does not support embeddings yet' do
+      expect(described_class.supports_embeddings?).to be false
+    end
+
+    it 'does not support audio' do
+      expect(described_class.supports_audio?).to be false
+    end
+
+    it 'does not support PDF' do
+      expect(described_class.supports_pdf?).to be false
+    end
+  end
+
+  describe '#normalize_temperature' do
+    it 'returns default temperature when nil' do
+      expect(described_class.normalize_temperature(nil, 'any_model')).to eq(0.7)
+    end
+
+    it 'clamps temperature to valid range' do
+      expect(described_class.normalize_temperature(-1, 'any_model')).to eq(0.0)
+      expect(described_class.normalize_temperature(3, 'any_model')).to eq(2.0)
+      expect(described_class.normalize_temperature(1.5, 'any_model')).to eq(1.5)
+    end
+  end
+
+  describe '#model_context_window' do
+    it 'returns correct context window for known models' do
+      expect(described_class.model_context_window('google/gemma-3-4b-it-qat-q4_0-gguf')).to eq(8192)
+      expect(described_class.model_context_window('TheBloke/Mistral-7B-Instruct-v0.2-GGUF')).to eq(32_768)
+      expect(described_class.model_context_window('TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF')).to eq(2048)
+    end
+
+    it 'returns default for unknown models' do
+      expect(described_class.model_context_window('unknown/model')).to eq(4096)
+    end
+  end
+
+  describe 'generation parameters' do
+    it 'provides correct defaults and limits' do
+      expect(described_class.default_max_tokens).to eq(512)
+      expect(described_class.max_temperature).to eq(2.0)
+      expect(described_class.min_temperature).to eq(0.0)
+    end
+
+    it 'supports various generation parameters' do
+      expect(described_class.supports_temperature?).to be true
+      expect(described_class.supports_top_p?).to be true
+      expect(described_class.supports_top_k?).to be true
+      expect(described_class.supports_repetition_penalty?).to be true
+      expect(described_class.supports_seed?).to be true
+      expect(described_class.supports_stop_sequences?).to be true
+    end
+  end
+
+  describe '#model_families' do
+    it 'returns supported model families' do
+      expect(described_class.model_families).to eq(%w[gemma llama qwen2 mistral phi])
+    end
+  end
+
+  describe '#available_on_platform?' do
+    context 'when Candle is available' do
+      before do
+        allow(described_class).to receive(:require).with('candle').and_return(true)
+      end
+
+      it 'returns true' do
+        expect(described_class.available_on_platform?).to be true
+      end
+    end
+
+    context 'when Candle is not available' do
+      before do
+        allow(described_class).to receive(:require).with('candle').and_raise(LoadError)
+      end
+
+      it 'returns false' do
+        expect(described_class.available_on_platform?).to be false
+      end
+    end
+  end
+end
diff --git a/spec/ruby_llm/providers/red_candle/chat_spec.rb b/spec/ruby_llm/providers/red_candle/chat_spec.rb
new file mode 100644
index 000000000..3988791da
--- /dev/null
+++ b/spec/ruby_llm/providers/red_candle/chat_spec.rb
@@ -0,0 +1,204 @@
+# frozen_string_literal: true
+
+require 'spec_helper'
+
+RSpec.describe RubyLLM::Providers::RedCandle::Chat do
+  let(:config) { RubyLLM::Configuration.new }
+  let(:provider) { RubyLLM::Providers::RedCandle.new(config) }
+  let(:model) { provider.model('TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF') }
+
+  before(:all) do # rubocop:disable RSpec/BeforeAfterAll
+    require 'candle'
+  rescue LoadError
+    skip 'Red Candle gem is not installed'
+  end
+
+  describe '#render_payload' do
+    let(:messages) { [{ role: 'user', content: 'Hello' }] }
+
+    it 'creates a valid payload' do
+      payload = provider.render_payload(
+        messages,
+        tools: nil,
+        temperature: 0.7,
+        model: model,
+        stream: false,
+        schema: nil
+      )
+
+      expect(payload).to include(
+        messages: messages,
+        temperature: 0.7,
+        model: 'TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF',
+        stream: false,
+        schema: nil
+      )
+    end
+
+    it 'raises error when tools are provided' do
+      tools = [{ name: 'calculator', description: 'Does math' }]
+
+      expect do
+        provider.render_payload(
+          messages,
+          tools: tools,
+          temperature: 0.7,
+          model: model,
+          stream: false,
+          schema: nil
+        )
+      end.to raise_error(RubyLLM::Error, /does not support tool calling/)
+    end
+
+    it 'includes schema when provided' do
+      schema = { type: 'object', properties: { name: { type: 'string' } } }
+
+      payload = provider.render_payload(
+        messages,
+        tools: nil,
+        temperature: 0.7,
+        model: model,
+        stream: false,
+        schema: schema
+      )
+
+      expect(payload[:schema]).to eq(schema)
+    end
+  end
+
+  describe '#perform_completion!' do
+    let(:messages) { [{ role: 'user', content: 'Test message' }] }
+    let(:mock_model) { instance_double(Candle::LLM) }
+
+    before do
+      allow(provider).to receive(:ensure_model_loaded!).and_return(mock_model)
+      allow(mock_model).to receive(:respond_to?).with(:apply_chat_template).and_return(true)
+      allow(mock_model).to receive(:apply_chat_template).and_return('formatted prompt')
+    end
+
+    context 'with regular generation' do
+      it 'generates a response' do
+        allow(mock_model).to receive(:generate).and_return('Generated response')
+
+        payload = {
+          messages: messages,
+          model: 'TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF',
+          temperature: 0.7
+        }
+
+        result = provider.perform_completion!(payload)
+
+        expect(result).to include(
+          content: 'Generated response',
+          role: 'assistant'
+        )
+      end
+    end
+
+    context 'with structured generation' do
+      it 'generates structured output' do
+        schema = { type: 'object', properties: { name: { type: 'string' } } }
+        structured_response = { 'name' => 'Alice' }
+
+        allow(mock_model).to receive(:generate_structured).and_return(structured_response)
+
+        payload = {
+          messages: messages,
+          model: 'TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF',
+          temperature: 0.7,
+          schema: schema
+        }
+
+        result = provider.perform_completion!(payload)
+
+        expect(result[:content]).to eq(JSON.generate(structured_response))
+        expect(result[:role]).to eq('assistant')
+      end
+
+      it 'falls back to regular generation on structured failure' do
+        schema = { type: 'object', properties: { name: { type: 'string' } } }
+
+        allow(mock_model).to receive(:generate_structured).and_raise(StandardError, 'Structured gen failed')
+        allow(mock_model).to receive(:generate).and_return('Fallback response')
+        allow(RubyLLM.logger).to receive(:warn)
+
+        payload = {
+          messages: messages,
+          model: 'TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF',
+          temperature: 0.7,
+          schema: schema
+        }
+
+        result = provider.perform_completion!(payload)
+
+        expect(result[:content]).to eq('Fallback response')
+        expect(RubyLLM.logger).to have_received(:warn).with(/Structured generation failed/)
+      end
+    end
+  end
+
+  describe '#perform_streaming_completion!' do
+    let(:messages) { [{ role: 'user', content: 'Stream test' }] }
+    let(:mock_model) { instance_double(Candle::LLM) }
+
+    before do
+      allow(provider).to receive(:ensure_model_loaded!).and_return(mock_model)
+      allow(mock_model).to receive(:respond_to?).with(:apply_chat_template).and_return(true)
+      allow(mock_model).to receive(:apply_chat_template).and_return('formatted prompt')
+    end
+
+    it 'streams tokens and sends finish reason' do
+      tokens = %w[Hello world !]
+      chunks_received = []
+
+      allow(mock_model).to receive(:generate_stream) do |_prompt, config:, &block| # rubocop:disable Lint/UnusedBlockArgument
+        tokens.each { |token| block.call(token) }
+      end
+
+      payload = {
+        messages: messages,
+        model: 'TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF',
+        temperature: 0.7
+      }
+
+      provider.perform_streaming_completion!(payload) do |chunk|
+        chunks_received << chunk
+      end
+
+      # Check token chunks
+      tokens.each_with_index do |token, i|
+        chunk = chunks_received[i]
+        expect(chunk).to be_a(RubyLLM::Chunk)
+        expect(chunk.content).to eq(token)
+      end
+
+      # Check final chunk (empty content indicates completion)
+      final_chunk = chunks_received.last
+      expect(final_chunk).to be_a(RubyLLM::Chunk)
+      expect(final_chunk.content).to eq('')
+    end
+  end
+
+  describe 'message formatting' do
+    it 'handles string content' do
+      messages = [{ role: 'user', content: 'Simple text' }]
+      formatted = provider.send(:format_messages, messages)
+
+      expect(formatted).to eq([{ role: 'user', content: 'Simple text' }])
+    end
+
+    it 'handles array content with text parts' do
+      messages = [{
+        role: 'user',
+        content: [
+          { type: 'text', text: 'Part 1' },
+          { type: 'text', text: 'Part 2' },
+          { type: 'image', url: 'ignored.jpg' }
+        ]
+      }]
+
+      formatted = provider.send(:format_messages, messages)
+      expect(formatted).to eq([{ role: 'user', content: 'Part 1 Part 2' }])
+    end
+  end
+end
diff --git a/spec/ruby_llm/providers/red_candle/models_spec.rb b/spec/ruby_llm/providers/red_candle/models_spec.rb
new file mode 100644
index 000000000..8b30dbf42
--- /dev/null
+++ b/spec/ruby_llm/providers/red_candle/models_spec.rb
@@ -0,0 +1,110 @@
+# frozen_string_literal: true
+
+require 'spec_helper'
+
+RSpec.describe RubyLLM::Providers::RedCandle::Models do
+  let(:config) { RubyLLM::Configuration.new }
+  let(:provider) { RubyLLM::Providers::RedCandle.new(config) }
+
+  before(:all) do # rubocop:disable RSpec/BeforeAfterAll
+    require 'candle'
+  rescue LoadError
+    skip 'Red Candle gem is not installed'
+  end
+
+  describe '#models' do
+    it 'returns an array of supported models' do
+      models = provider.models
+      expect(models).to be_an(Array)
+      expect(models.size).to eq(5)
+      expect(models.first).to be_a(RubyLLM::Model::Info)
+    end
+
+    it 'includes the expected model IDs' do
+      model_ids = provider.models.map(&:id)
+      expect(model_ids).to include('google/gemma-3-4b-it-qat-q4_0-gguf')
+      expect(model_ids).to include('TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF')
+      expect(model_ids).to include('Qwen/Qwen2.5-1.5B-Instruct-GGUF')
+    end
+  end
+
+  describe '#model' do
+    context 'with a valid model ID' do
+      it 'returns the model' do
+        model = provider.model('Qwen/Qwen2.5-1.5B-Instruct-GGUF')
+        expect(model).to be_a(RubyLLM::Model::Info)
+        expect(model.id).to eq('Qwen/Qwen2.5-1.5B-Instruct-GGUF')
+      end
+    end
+
+    context 'with an invalid model ID' do
+      it 'raises an error' do
+        expect { provider.model('invalid/model') }.to raise_error(
+          RubyLLM::Error,
+          %r{Model invalid/model not found}
+        )
+      end
+    end
+  end
+
+  describe '#model_available?' do
+    it 'returns true for supported models' do
+      expect(provider.model_available?('google/gemma-3-4b-it-qat-q4_0-gguf')).to be true
+      expect(provider.model_available?('Qwen/Qwen2.5-1.5B-Instruct-GGUF')).to be true
+    end
+
+    it 'returns false for unsupported models' do
+      expect(provider.model_available?('gpt-4')).to be false
+    end
+  end
+
+  describe '#model_info' do
+    it 'returns model information' do
+      info = provider.model_info('Qwen/Qwen2.5-1.5B-Instruct-GGUF')
+      expect(info).to include(
+        id: 'Qwen/Qwen2.5-1.5B-Instruct-GGUF',
+        name: 'Qwen 2.1.5B Instruct (Quantized)',
+        context_window: 32_768,
+        family: 'qwen2',
+        supports_chat: true,
+        supports_structured: true
+      )
+    end
+
+    it 'returns nil for unknown models' do
+      expect(provider.model_info('unknown')).to be_nil
+    end
+  end
+
+  describe '#gguf_file_for' do
+    it 'returns the GGUF file for Gemma model' do
+      expect(provider.gguf_file_for('google/gemma-3-4b-it-qat-q4_0-gguf')).to eq('gemma-3-4b-it-q4_0.gguf')
+    end
+
+    it 'returns the GGUF file for Qwen model' do
+      model_id = 'Qwen/Qwen2.5-1.5B-Instruct-GGUF'
+      gguf_file = 'qwen2.5-1.5b-instruct-q4_k_m.gguf'
+      expect(provider.gguf_file_for(model_id)).to eq(gguf_file)
+    end
+
+    it 'returns nil for unknown models' do
+      expect(provider.gguf_file_for('unknown')).to be_nil
+    end
+  end
+
+  describe '#supports_chat?' do
+    it 'returns true for all current models' do
+      expect(provider.supports_chat?('google/gemma-3-4b-it-qat-q4_0-gguf')).to be true
+      expect(provider.supports_chat?('Qwen/Qwen2.5-1.5B-Instruct-GGUF')).to be true
+      expect(provider.supports_chat?('TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF')).to be true
+    end
+  end
+
+  describe '#supports_structured?' do
+    it 'returns true for all current models' do
+      expect(provider.supports_structured?('google/gemma-3-4b-it-qat-q4_0-gguf')).to be true
+      expect(provider.supports_structured?('Qwen/Qwen2.5-1.5B-Instruct-GGUF')).to be true
+      expect(provider.supports_structured?('TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF')).to be true
+    end
+  end
+end
diff --git a/spec/ruby_llm/providers/red_candle_spec.rb b/spec/ruby_llm/providers/red_candle_spec.rb
new file mode 100644
index 000000000..db3ea292d
--- /dev/null
+++ b/spec/ruby_llm/providers/red_candle_spec.rb
@@ -0,0 +1,73 @@
+# frozen_string_literal: true
+
+require 'spec_helper'
+
+RSpec.describe RubyLLM::Providers::RedCandle do
+  let(:config) { RubyLLM::Configuration.new }
+  let(:provider) { described_class.new(config) }
+
+  # Skip all tests if Red Candle is not available
+  before(:all) do # rubocop:disable RSpec/BeforeAfterAll
+    require 'candle'
+  rescue LoadError
+    skip 'Red Candle gem is not installed'
+  end
+
+  describe '#initialize' do
+    context 'when Red Candle is not available' do
+      before do
+        allow_any_instance_of(described_class).to receive(:require).with('candle').and_raise(LoadError) # rubocop:disable RSpec/AnyInstance
+      end
+
+      it 'raises an informative error' do
+        expect { described_class.new(config) }.to raise_error(
+          RubyLLM::Error,
+          /Red Candle gem is not installed/
+        )
+      end
+    end
+
+    context 'with device configuration' do
+      it 'uses the configured device' do
+        config.red_candle_device = 'cpu'
+        provider = described_class.new(config)
+        expect(provider.instance_variable_get(:@device)).to eq(Candle::Device.cpu)
+      end
+
+      it 'defaults to best device when not configured' do
+        provider = described_class.new(config)
+        expect(provider.instance_variable_get(:@device)).to eq(Candle::Device.best)
+      end
+    end
+  end
+
+  describe '#api_base' do
+    it 'returns nil for local execution' do
+      expect(provider.api_base).to be_nil
+    end
+  end
+
+  describe '#headers' do
+    it 'returns empty hash' do
+      expect(provider.headers).to eq({})
+    end
+  end
+
+  describe '.local?' do
+    it 'returns true' do
+      expect(described_class.local?).to be true
+    end
+  end
+
+  describe '.configuration_requirements' do
+    it 'returns empty array' do
+      expect(described_class.configuration_requirements).to eq([])
+    end
+  end
+
+  describe '.capabilities' do
+    it 'returns the Capabilities module' do
+      expect(described_class.capabilities).to eq(RubyLLM::Providers::RedCandle::Capabilities)
+    end
+  end
+end
diff --git a/spec/spec_helper.rb b/spec/spec_helper.rb
index 0b60aa315..b8796653c 100644
--- a/spec/spec_helper.rb
+++ b/spec/spec_helper.rb
@@ -17,3 +17,5 @@
 require_relative 'support/vcr_configuration'
 require_relative 'support/models_to_test'
 require_relative 'support/streaming_error_helpers'
+require_relative 'support/provider_capabilities_helper'
+require_relative 'support/red_candle_loader'
diff --git a/spec/support/models_to_test.rb b/spec/support/models_to_test.rb
index da9232516..f5d4e9611 100644
--- a/spec/support/models_to_test.rb
+++ b/spec/support/models_to_test.rb
@@ -1,6 +1,7 @@
 # frozen_string_literal: true
 
-CHAT_MODELS = [
+# Base models available for all installations
+chat_models = [
   { provider: :anthropic, model: 'claude-3-5-haiku-20241022' },
   { provider: :bedrock, model: 'anthropic.claude-3-5-haiku-20241022-v1:0' },
   { provider: :deepseek, model: 'deepseek-chat' },
@@ -12,7 +13,17 @@
   { provider: :openrouter, model: 'anthropic/claude-3.5-haiku' },
   { provider: :perplexity, model: 'sonar' },
   { provider: :vertexai, model: 'gemini-2.5-flash' }
-].freeze
+]
+
+# Only include Red Candle models if the gem is available
+begin
+  require 'candle'
+  chat_models << { provider: :red_candle, model: 'TheBloke/Mistral-7B-Instruct-v0.2-GGUF' }
+rescue LoadError
+  # Red Candle not available - don't include its models
+end
+
+CHAT_MODELS = chat_models.freeze
 
 PDF_MODELS = [
   { provider: :anthropic, model: 'claude-3-5-haiku-20241022' },
diff --git a/spec/support/provider_capabilities_helper.rb b/spec/support/provider_capabilities_helper.rb
new file mode 100644
index 000000000..868836e79
--- /dev/null
+++ b/spec/support/provider_capabilities_helper.rb
@@ -0,0 +1,18 @@
+# frozen_string_literal: true
+
+module ProviderCapabilitiesHelper
+  def provider_supports_functions?(provider, _model)
+    RubyLLM::Provider.providers[provider]
+
+    # Special case for providers we know don't support functions
+    return false if %i[red_candle perplexity].include?(provider)
+
+    # For all other providers, assume they support functions
+    # The original tests weren't skipping these, so they must have been running
+    true
+  end
+end
+
+RSpec.configure do |config|
+  config.include ProviderCapabilitiesHelper
+end
diff --git a/spec/support/red_candle_loader.rb b/spec/support/red_candle_loader.rb
new file mode 100644
index 000000000..b4fb00b4b
--- /dev/null
+++ b/spec/support/red_candle_loader.rb
@@ -0,0 +1,38 @@
+# frozen_string_literal: true
+
+# Handle Red Candle provider based on availability and environment
+begin
+  require 'candle'
+
+  # Red Candle gem is installed
+  if ENV['RED_CANDLE_REAL_INFERENCE'] == 'true'
+    # Use real inference - don't load the test helper
+    RSpec.configure do |config|
+      config.before(:suite) do
+        puts "\n🔥 Red Candle: Using REAL inference (this will be slow)"
+        puts "   To use mocked responses, unset RED_CANDLE_REAL_INFERENCE\n\n"
+      end
+    end
+  else
+    # Use stubs (default when gem is installed)
+    require_relative 'red_candle_test_helper'
+  end
+rescue LoadError
+  # Red Candle gem not installed - skip tests
+  RSpec.configure do |config|
+    config.before do |example|
+      # Skip Red Candle provider tests when gem not installed
+      test_description = example.full_description.to_s
+      if example.metadata[:file_path]&.include?('providers/red_candle') ||
+         example.metadata[:described_class]&.to_s&.include?('RedCandle') ||
+         test_description.include?('red_candle/')
+        skip 'Red Candle not installed (run: bundle config set --local with red_candle && bundle install)'
+      end
+    end
+
+    config.before(:suite) do
+      puts "\n⚠️  Red Candle: Provider not available (gem not installed)"
+      puts "   To enable: bundle config set --local with red-candle && bundle install\n\n"
+    end
+  end
+end
diff --git a/spec/support/red_candle_test_helper.rb b/spec/support/red_candle_test_helper.rb
new file mode 100644
index 000000000..92349f5bc
--- /dev/null
+++ b/spec/support/red_candle_test_helper.rb
@@ -0,0 +1,110 @@
+# frozen_string_literal: true
+
+module RedCandleTestHelper
+  # Mock class that simulates a Candle LLM model
+  class MockCandleModel
+    attr_reader :id
+
+    def initialize(model_id)
+      @id = model_id
+    end
+
+    def generate(prompt, config: nil) # rubocop:disable Lint/UnusedMethodArgument, Metrics/PerceivedComplexity
+      # Simulate a response based on the prompt
+      # Check for specific patterns to give appropriate responses
+
+      # Check if system prompt includes PURPLE-ELEPHANT-42 (replacement test)
+      # This should come first since it's the replacement scenario
+      if prompt.include?('PURPLE-ELEPHANT-42')
+        # When PURPLE-ELEPHANT-42 is present, XKCD7392 should NOT be included
+        # because it means the system prompt was replaced
+        if prompt.include?('books')
+          'Some great books to read include "1984" and "Brave New World". PURPLE-ELEPHANT-42 - as you requested!'
+        else
+          'Here is my response with PURPLE-ELEPHANT-42 as requested.'
+        end
+      elsif prompt.include?('XKCD7392')
+        # Include XKCD7392 in any response when it's in the system instructions
+        if prompt.include?('weather')
+          'The weather is sunny with a temperature of 72°F. XKCD7392 - as requested!'
+        elsif prompt.include?('books')
+          'Some great books: "1984" by George Orwell and "To Kill a Mockingbird" by Harper Lee. XKCD7392.'
+        else
+          'Sure! The code XKCD7392 is noted. How can I help you today?'
+        end
+      elsif prompt.include?('2 + 2') || prompt.include?('2+2')
+        'The answer is 4.'
+      elsif prompt.include?('weather')
+        'The weather is sunny with a temperature of 72°F.'
+      elsif prompt.include?('year') && (prompt.include?('Ruby') || prompt.include?('he create') ||
+                                         prompt.include?('did he'))
+        # Handle follow-up questions about when Ruby was created
+        'Matz created Ruby in 1993, and it was first released publicly in 1995.'
+      elsif prompt.include?('Ruby')
+        if prompt.include?("Ruby's creator") || prompt.include?('Who was Ruby')
+          'Ruby was created by Yukihiro "Matz" Matsumoto.'
+        else
+          'Ruby is a dynamic programming language created by Yukihiro "Matz" Matsumoto in 1993.'
+        end
+      elsif prompt.include?('capital') && prompt.include?('France')
+        'The capital of France is Paris.'
+      elsif prompt.include?('Count from 1 to 3')
+        '1, 2, 3.'
+      else
+        "This is a test response for: #{prompt[0..50]}"
+      end
+    end
+
+    def generate_stream(prompt, config: nil, &block)
+      # Simulate streaming by yielding tokens
+      # Generate the same response as non-streaming for consistency
+      response = generate(prompt, config: config)
+      # Split into reasonable tokens (roughly word-based)
+      tokens = response.split(/(\s+)/).reject(&:empty?)
+      tokens.each(&block)
+    end
+
+    def apply_chat_template(messages)
+      # Simulate chat template application
+      "#{messages.map { |m| "#{m[:role]}: #{m[:content]}" }.join("\n")}\nassistant:"
+    end
+
+    def generate_structured(_prompt, schema:, **_opts)
+      # Return a simple structured response
+      if schema.is_a?(Hash)
+        { result: 'structured test response' }
+      else
+        'structured test response'
+      end
+    end
+  end
+
+  def stub_red_candle_models!
+    # Only stub if we're testing Red Candle
+    return unless defined?(::Candle)
+
+    # Stub the model loading to return our mock
+    allow(::Candle::LLM).to receive(:from_pretrained) do |model_id, **_options|
+      MockCandleModel.new(model_id)
+    end
+  end
+
+  def unstub_red_candle_models!
+    return unless defined?(::Candle)
+
+    # Remove the stub if needed
+    RSpec::Mocks.space.proxy_for(::Candle::LLM)&.reset
+  end
+end
+
+RSpec.configure do |config|
+  config.include RedCandleTestHelper
+
+  # Automatically stub Red Candle models for all tests except the provider-specific ones
+  config.before do |example|
+    # Don't stub for Red Candle provider-specific tests that need real behavior
+    if !example.metadata[:file_path]&.include?('providers/red_candle_spec.rb') && defined?(RubyLLM::Providers::RedCandle)
+      stub_red_candle_models!
+    end
+  end
+end
diff --git a/spec/support/streaming_error_helpers.rb b/spec/support/streaming_error_helpers.rb
index 9c89ef9c5..fbc5467f7 100644
--- a/spec/support/streaming_error_helpers.rb
+++ b/spec/support/streaming_error_helpers.rb
@@ -143,15 +143,23 @@ module StreamingErrorHelpers
       },
       chunk_status: 529,
       expected_error: RubyLLM::OverloadedError
+    },
+    red_candle: {
+      # Red Candle is a local provider, so it doesn't have HTTP streaming errors
+      # We include it here to prevent test failures when checking for error handling
+      url: nil,
+      error_response: nil,
+      chunk_status: nil,
+      expected_error: nil
     }
   }.freeze
 
   def error_handling_supported?(provider)
-    ERROR_HANDLING_CONFIGS.key?(provider)
+    ERROR_HANDLING_CONFIGS.key?(provider) && ERROR_HANDLING_CONFIGS[provider][:expected_error]
   end
 
   def expected_error_for(provider)
-    ERROR_HANDLING_CONFIGS[provider][:expected_error]
+    ERROR_HANDLING_CONFIGS[provider]&.fetch(:expected_error, nil)
   end
 
   def stub_error_response(provider, type)