Skip to content

Conversation

altmannmarcelo
Copy link
Contributor

This commit adds support for C-style comments supported by MySQL. It parses and consumes the optional version number after the ! character.

@altmannmarcelo altmannmarcelo force-pushed the c_style_comment branch 3 times, most recently from 5089fd5 to 4123881 Compare September 17, 2025 15:26
This commit adds support for C-style comments supported by MySQL.
It parses and consumes the optional version number after the `!`
character and leading whitespace.
}
#[test]
fn tokenize_multiline_comment_with_c_style_comment() {
let sql = String::from("0/*! word */1");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at their docs, I'm wondering if/how we support these examples?

SELECT /*! STRAIGHT_JOIN */ col1 FROM table1,table2
/*!50110 KEY_BLOCK_SIZE=1024 */
SELECT /*! BKA(t1) */ FROM T

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iffyio - Parsing the c_style comment unblocks sqlparser to not discard those as if they were a normal comment. Support for each hint will have to be added in a case by case bases. For example #2033 - MySQL adds a c-style comment if you run SHOW CREATE TABLE:

mysql> SHOW CREATE TABLE b;
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                    |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| b     | CREATE TABLE `b` (
  `ID` int DEFAULT NULL,
  `b` char(1) DEFAULT NULL /*!80023 INVISIBLE */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0,008 sec)

Without the current patch, the invisible keyword will be discarded.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah so to clarify I'm rather wondering regarding the parser behavior for hints that aren't singe words e.g. /*!50110 KEY_BLOCK_SIZE=1024 */ - can we demonstrate the behavior with test cases for such scenarios?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iffyio thanks flagging this. I have fixed the issue and now we properly return individual tokens inside a C-style hint comment.

Documented the C-style comments with an example.
}
#[test]
fn tokenize_multiline_comment_with_c_style_comment() {
let sql = String::from("0/*! word */1");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah so to clarify I'm rather wondering regarding the parser behavior for hints that aren't singe words e.g. /*!50110 KEY_BLOCK_SIZE=1024 */ - can we demonstrate the behavior with test cases for such scenarios?

Added the pending tokens structure to properly return all tokens
inside a c-style hint comment.
Copy link
Contributor

@iffyio iffyio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@altmannmarcelo sorry for the delay in getting back to this, left some comments!

Comment on lines +901 to +903
/// Returns true if the dialect supports hint and C-style comments
/// e.g. `/*! hint */`
fn supports_c_style_hints(&self) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Returns true if the dialect supports hint and C-style comments
/// e.g. `/*! hint */`
fn supports_c_style_hints(&self) -> bool {
/// Returns true if the dialect supports optimizer hints in multiline comments
/// e.g. `/*!50110 KEY_BLOCK_SIZE = 1024*/`
fn supports_multiline_comment_hints(&self) -> bool {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah to clarify the suggested change, c-style comment I think is rather understood as /* */ - i.e. multiline comment in general, the feature here I think is the optimizer hints within comments as mysql calls them

chars: &mut State,
prev_token: Option<&Token>,
) -> Result<Option<Token>, TokenizerError> {
// Return any previously injected tokens first
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

impl wise I'm thinking we can do something like the following that would be less invasive to the tokenizer:

If we take as an example /*!50110 KEY_BLOCK_SIZE = 1024*/ - today that will get parse as a Token::MultilineComment("!50110 KEY_BLOCK_SIZE = 1024") - which we receive here in the tokenizer loop.

Then the idea would be to essentially re-tokenize that string and add to the tokens buffer.

So something like this:

// we check if this is a token containing optimizer hints
match token {
	Token::Whitespace(Whitespace::MultiLineComment(comment)) if self.dialect.supports_multiline_comment_hints() && s.starts_with("!") => {
    // re-tokenize the hints
	buf.extend(tokenize_comment_hints(comment, span)?);
	token => {
		buf.push(TokenWithSpan{ token, span })
	}
}

// here we reuse existing tokenizer machinery to re-tokenize the hints.
fn tokenize_comment_hints(&self, hints: String, span: Span, tokens: &mut Vec<TokenWithSpan>) -> Result<()> {
	let mut state = State {
		peekable: hints.chars().peekable(),
		line: span.start.line,
		col: span.start.column,
	};

	while let Some(token) = self.next_token(&mut state, None)? {
		tokens.push(token);
	}

	Ok(())
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants