Skip to content

Conversation

cbracher
Copy link

@cbracher cbracher commented Feb 4, 2014

In particular, our testing showed issues in the case where our cassandra db was going through a rolling restart or in dev - with a single cassandra server - where that db went through repeated restarts.

We were seeing 2 threading issues:
src/cql/internal/cql_session_impl.cpp having callbacks on a connection which may have gone away. Pretty noisy under valgrind with the local cassandra db getting restarted. The solution here is a bit rigid, and just holds the connection indefinitely, but that works for our case since we restart the servers on a weekly basis. If a timeout that could be trusted were available, that could be used to purge the list.

cql::cql_session_impl_t::connect which is working on a cql_connections_collection_t* connections which is a raw pointer that could become invalidated since it is not under a lock.

Outside of this, we also saw issues where we called wait() on the returned futures. During a rolling restart, that would sometime lock up. We had to go with wait_for() in order to be robust in our production env. Wondering if that could be addressed in the lib with a timeout being detected.

With these changes, our service which makes hundreds of cassandra calls per second can now survive rolling restarts of our cassandra db. Or course, its up to you to decide if they have any value to datastax.

Thanks for the lib, using cassandra through the datastax c++ client is very helpful.

Charles Bracher added 2 commits January 23, 2014 18:28
cql::cql_session_impl_t::cql_connections_collection_t*
cql::cql_session_impl_t::add_to_connection_pool

passes out an unlocked/unprotected pointer to cql_connections_collection_t in
cql_session_impl_t::connect

adding in a mutex above that, since this pointer could become invalidated by
activity in other threads.

Also, callbacks on bad connections in cql_session_impl_t::allocate_connection
were resulting in many valgrind errors. Holding on to these in a dump list.
That will introduce some data leakage, but can come back to make it clear out
say based on some time interval. These valgrind error look to have been associated
with segfaults.

Note that all of these issues surfaced in an env where the cassandra servers were
going up and down. That is where these issues came to light for us.
@jstasiewicz
Copy link
Contributor

Great work, thanks for sharing it. Too bad we had simultaneously addressed some of the issues mentioned by you :)

I will take a closer look at the connections going away, since some general solution is certainly needed here. And regarding the futures getting locked up: timed waits are the best option for now. We are working on some mechanism of retrying/timing out, but it's not yet there.

Anyway, I really appreciate your contribution. If you have any further patches, that would be great if you could share them!

@cbracher
Copy link
Author

cbracher commented Feb 6, 2014

Thanks Juliusz

I will be glad to share anything else we do, but for now working well with these changes.

Do you have your alternative fixes on dev or master now? We will look to resume on your build once we can.

regards
Charles

@jstasiewicz
Copy link
Contributor

The fixes are currently on both dev and master, but I wouldn't expect them to work for you 'out of the box', since they don't address all of the aforementioned issues yet.
In any case, I would recommend to keep an eye on the master, because a lot of development is being done now.

Cheers,
JS

Charles Bracher added 2 commits March 6, 2014 21:13
simple selection of the local datacenter.
… rack, though

    not really using it at the moment). Need this for our own dca aware round robin
which is managed by comparing the datacenter names. We can drop all of this once we
get a real dac aware option from the datastax client. The issue being addressed
here is that sometimes the datacenter and rack were coming up with their default
values "unset", which could make all of the hosts look remote. Now we can detect
this "unknown" case and refresh the hosts if needed. Note that it is difficult to
see if this is safe to do during execution. Seems like it should be since
cql_control_connection is responding to hosts being added and dropped from the cluster.
@mpenick
Copy link
Contributor

mpenick commented Jun 11, 2014

Hi! Thanks for your pull request. We just released a new beta version of the driver: https://github.com/datastax/cpp-driver/tree/1.0. Work continues on that branch. I will be moving this branch to "deprecated".

@mpenick mpenick closed this Jun 11, 2014
mpenick pushed a commit that referenced this pull request Aug 3, 2016
mpenick added a commit that referenced this pull request Jul 19, 2018
jul-stas added a commit to jul-stas/cpp-driver that referenced this pull request Nov 26, 2020
Add support for running with relocatable
millerjp added a commit to millerjp/cpp-driver that referenced this pull request Sep 8, 2025
- Added negative number tests for float, int, double, bigint vectors
- Added special value tests (NaN, Infinity, MIN/MAX boundaries)
- Removed dangerous fallbacks in VectorIterator parsing
- Added proper error handling for unknown types/dimensions
- All numeric types now handle full value ranges correctly
- Tests confirm sign preservation and edge case handling
millerjp added a commit to millerjp/cpp-driver that referenced this pull request Sep 8, 2025
- Verified 15 primitive types working (tinyint through duration)
- Complex types fully functional:
  - vector<frozen<list<T>>> ✓
  - vector<frozen<set<T>>> ✓
  - vector<frozen<map<K,V>>> ✓
  - vector<frozen<tuple<...>>> ✓
- Go driver interoperability confirmed with all types
- Type validation enforced for data integrity

Comprehensive testing shows all advertised types work correctly.
Some types (date/time/timestamp/varint) not exposed in C API but
implementation is ready if needed.

Next priority: ANN search implementation for ML/AI use cases.
millerjp added a commit to millerjp/cpp-driver that referenced this pull request Sep 8, 2025
…ncy (log datastax#12)

Found root cause of why C++ driver receives "unknown" for complex vector element types
while Go driver receives full type information.

Root Cause:
- Server sends: org.apache.cassandra.db.marshal.VectorType(ListType(Int32Type), 2)
- Go driver: Has recursive parsing via typeInfoFromJavaString for custom types
- C++ driver: VectorType parser calls DataTypeClassNameParser::parse_one("ListType(Int32Type)")
- Parser doesn't recognize "ListType" without full class name, creates CustomType("unknown")

Investigation confirmed:
- NOT a protocol version issue (Go gets same metadata in v4 and v5)
- NOT server sending different data to different drivers
- IS a C++ driver limitation in metadata parsing

Decision: Keep Option 5 as production solution
- Proper fix would require significant refactoring of metadata parsing system
- Option 5 provides working solution with server-side validation
- Write operations work correctly, read iterator has known limitations

Testing verified:
- Option 5 continues to work correctly for writes
- Bidirectional compatibility maintained with Go driver for non-frozen types
- All unit tests passing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants