-
Notifications
You must be signed in to change notification settings - Fork 653
Add support for C-style comments #2034
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
5089fd5
to
4123881
Compare
This commit adds support for C-style comments supported by MySQL. It parses and consumes the optional version number after the `!` character and leading whitespace.
4123881
to
e5de267
Compare
} | ||
#[test] | ||
fn tokenize_multiline_comment_with_c_style_comment() { | ||
let sql = String::from("0/*! word */1"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at their docs, I'm wondering if/how we support these examples?
SELECT /*! STRAIGHT_JOIN */ col1 FROM table1,table2
/*!50110 KEY_BLOCK_SIZE=1024 */
SELECT /*! BKA(t1) */ FROM T
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iffyio - Parsing the c_style comment unblocks sqlparser to not discard those as if they were a normal comment. Support for each hint will have to be added in a case by case bases. For example #2033 - MySQL adds a c-style comment if you run SHOW CREATE TABLE:
mysql> SHOW CREATE TABLE b;
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| b | CREATE TABLE `b` (
`ID` int DEFAULT NULL,
`b` char(1) DEFAULT NULL /*!80023 INVISIBLE */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0,008 sec)
Without the current patch, the invisible keyword will be discarded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah so to clarify I'm rather wondering regarding the parser behavior for hints that aren't singe words e.g. /*!50110 KEY_BLOCK_SIZE=1024 */
- can we demonstrate the behavior with test cases for such scenarios?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iffyio thanks flagging this. I have fixed the issue and now we properly return individual tokens inside a C-style hint comment.
Documented the C-style comments with an example.
} | ||
#[test] | ||
fn tokenize_multiline_comment_with_c_style_comment() { | ||
let sql = String::from("0/*! word */1"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah so to clarify I'm rather wondering regarding the parser behavior for hints that aren't singe words e.g. /*!50110 KEY_BLOCK_SIZE=1024 */
- can we demonstrate the behavior with test cases for such scenarios?
Added the pending tokens structure to properly return all tokens inside a c-style hint comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@altmannmarcelo sorry for the delay in getting back to this, left some comments!
/// Returns true if the dialect supports hint and C-style comments | ||
/// e.g. `/*! hint */` | ||
fn supports_c_style_hints(&self) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Returns true if the dialect supports hint and C-style comments | |
/// e.g. `/*! hint */` | |
fn supports_c_style_hints(&self) -> bool { | |
/// Returns true if the dialect supports optimizer hints in multiline comments | |
/// e.g. `/*!50110 KEY_BLOCK_SIZE = 1024*/` | |
fn supports_multiline_comment_hints(&self) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah to clarify the suggested change, c-style comment I think is rather understood as /* */
- i.e. multiline comment in general, the feature here I think is the optimizer hints within comments as mysql calls them
chars: &mut State, | ||
prev_token: Option<&Token>, | ||
) -> Result<Option<Token>, TokenizerError> { | ||
// Return any previously injected tokens first |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
impl wise I'm thinking we can do something like the following that would be less invasive to the tokenizer:
If we take as an example /*!50110 KEY_BLOCK_SIZE = 1024*/
- today that will get parse as a Token::MultilineComment("!50110 KEY_BLOCK_SIZE = 1024")
- which we receive here in the tokenizer loop.
Then the idea would be to essentially re-tokenize that string and add to the tokens buffer.
So something like this:
// we check if this is a token containing optimizer hints
match token {
Token::Whitespace(Whitespace::MultiLineComment(comment)) if self.dialect.supports_multiline_comment_hints() && s.starts_with("!") => {
// re-tokenize the hints
buf.extend(tokenize_comment_hints(comment, span)?);
token => {
buf.push(TokenWithSpan{ token, span })
}
}
// here we reuse existing tokenizer machinery to re-tokenize the hints.
fn tokenize_comment_hints(&self, hints: String, span: Span, tokens: &mut Vec<TokenWithSpan>) -> Result<()> {
let mut state = State {
peekable: hints.chars().peekable(),
line: span.start.line,
col: span.start.column,
};
while let Some(token) = self.next_token(&mut state, None)? {
tokens.push(token);
}
Ok(())
}
This commit adds support for C-style comments supported by MySQL. It parses and consumes the optional version number after the
!
character.