-
Notifications
You must be signed in to change notification settings - Fork 290
Dev - with these changes we are getting stability in our application which is using the c++ version of libcql. #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…s what was intended.
cql::cql_session_impl_t::cql_connections_collection_t* cql::cql_session_impl_t::add_to_connection_pool passes out an unlocked/unprotected pointer to cql_connections_collection_t in cql_session_impl_t::connect adding in a mutex above that, since this pointer could become invalidated by activity in other threads. Also, callbacks on bad connections in cql_session_impl_t::allocate_connection were resulting in many valgrind errors. Holding on to these in a dump list. That will introduce some data leakage, but can come back to make it clear out say based on some time interval. These valgrind error look to have been associated with segfaults. Note that all of these issues surfaced in an env where the cassandra servers were going up and down. That is where these issues came to light for us.
Great work, thanks for sharing it. Too bad we had simultaneously addressed some of the issues mentioned by you :) I will take a closer look at the connections going away, since some general solution is certainly needed here. And regarding the futures getting locked up: timed waits are the best option for now. We are working on some mechanism of retrying/timing out, but it's not yet there. Anyway, I really appreciate your contribution. If you have any further patches, that would be great if you could share them! |
Thanks Juliusz I will be glad to share anything else we do, but for now working well with these changes. Do you have your alternative fixes on dev or master now? We will look to resume on your build once we can. regards |
The fixes are currently on both dev and master, but I wouldn't expect them to work for you 'out of the box', since they don't address all of the aforementioned issues yet. Cheers, |
simple selection of the local datacenter.
… rack, though not really using it at the moment). Need this for our own dca aware round robin which is managed by comparing the datacenter names. We can drop all of this once we get a real dac aware option from the datastax client. The issue being addressed here is that sometimes the datacenter and rack were coming up with their default values "unset", which could make all of the hosts look remote. Now we can detect this "unknown" case and refresh the hosts if needed. Note that it is difficult to see if this is safe to do during execution. Seems like it should be since cql_control_connection is responding to hosts being added and dropped from the cluster.
Hi! Thanks for your pull request. We just released a new beta version of the driver: https://github.com/datastax/cpp-driver/tree/1.0. Work continues on that branch. I will be moving this branch to "deprecated". |
Add support for running with relocatable
- Added negative number tests for float, int, double, bigint vectors - Added special value tests (NaN, Infinity, MIN/MAX boundaries) - Removed dangerous fallbacks in VectorIterator parsing - Added proper error handling for unknown types/dimensions - All numeric types now handle full value ranges correctly - Tests confirm sign preservation and edge case handling
- Verified 15 primitive types working (tinyint through duration) - Complex types fully functional: - vector<frozen<list<T>>> ✓ - vector<frozen<set<T>>> ✓ - vector<frozen<map<K,V>>> ✓ - vector<frozen<tuple<...>>> ✓ - Go driver interoperability confirmed with all types - Type validation enforced for data integrity Comprehensive testing shows all advertised types work correctly. Some types (date/time/timestamp/varint) not exposed in C API but implementation is ready if needed. Next priority: ANN search implementation for ML/AI use cases.
…ncy (log datastax#12) Found root cause of why C++ driver receives "unknown" for complex vector element types while Go driver receives full type information. Root Cause: - Server sends: org.apache.cassandra.db.marshal.VectorType(ListType(Int32Type), 2) - Go driver: Has recursive parsing via typeInfoFromJavaString for custom types - C++ driver: VectorType parser calls DataTypeClassNameParser::parse_one("ListType(Int32Type)") - Parser doesn't recognize "ListType" without full class name, creates CustomType("unknown") Investigation confirmed: - NOT a protocol version issue (Go gets same metadata in v4 and v5) - NOT server sending different data to different drivers - IS a C++ driver limitation in metadata parsing Decision: Keep Option 5 as production solution - Proper fix would require significant refactoring of metadata parsing system - Option 5 provides working solution with server-side validation - Write operations work correctly, read iterator has known limitations Testing verified: - Option 5 continues to work correctly for writes - Bidirectional compatibility maintained with Go driver for non-frozen types - All unit tests passing
In particular, our testing showed issues in the case where our cassandra db was going through a rolling restart or in dev - with a single cassandra server - where that db went through repeated restarts.
We were seeing 2 threading issues:
src/cql/internal/cql_session_impl.cpp having callbacks on a connection which may have gone away. Pretty noisy under valgrind with the local cassandra db getting restarted. The solution here is a bit rigid, and just holds the connection indefinitely, but that works for our case since we restart the servers on a weekly basis. If a timeout that could be trusted were available, that could be used to purge the list.
cql::cql_session_impl_t::connect which is working on a cql_connections_collection_t* connections which is a raw pointer that could become invalidated since it is not under a lock.
Outside of this, we also saw issues where we called wait() on the returned futures. During a rolling restart, that would sometime lock up. We had to go with wait_for() in order to be robust in our production env. Wondering if that could be addressed in the lib with a timeout being detected.
With these changes, our service which makes hundreds of cassandra calls per second can now survive rolling restarts of our cassandra db. Or course, its up to you to decide if they have any value to datastax.
Thanks for the lib, using cassandra through the datastax c++ client is very helpful.