Dev - with these changes we are getting stability in our application which is using the c++ version of libcql. #12

cbracher · 2014-02-04T23:25:44Z

In particular, our testing showed issues in the case where our cassandra db was going through a rolling restart or in dev - with a single cassandra server - where that db went through repeated restarts.

We were seeing 2 threading issues:
src/cql/internal/cql_session_impl.cpp having callbacks on a connection which may have gone away. Pretty noisy under valgrind with the local cassandra db getting restarted. The solution here is a bit rigid, and just holds the connection indefinitely, but that works for our case since we restart the servers on a weekly basis. If a timeout that could be trusted were available, that could be used to purge the list.

cql::cql_session_impl_t::connect which is working on a cql_connections_collection_t* connections which is a raw pointer that could become invalidated since it is not under a lock.

Outside of this, we also saw issues where we called wait() on the returned futures. During a rolling restart, that would sometime lock up. We had to go with wait_for() in order to be robust in our production env. Wondering if that could be addressed in the lib with a timeout being detected.

With these changes, our service which makes hundreds of cassandra calls per second can now survive rolling restarts of our cassandra db. Or course, its up to you to decide if they have any value to datastax.

Thanks for the lib, using cassandra through the datastax c++ client is very helpful.

…s what was intended.

cql::cql_session_impl_t::cql_connections_collection_t* cql::cql_session_impl_t::add_to_connection_pool passes out an unlocked/unprotected pointer to cql_connections_collection_t in cql_session_impl_t::connect adding in a mutex above that, since this pointer could become invalidated by activity in other threads. Also, callbacks on bad connections in cql_session_impl_t::allocate_connection were resulting in many valgrind errors. Holding on to these in a dump list. That will introduce some data leakage, but can come back to make it clear out say based on some time interval. These valgrind error look to have been associated with segfaults. Note that all of these issues surfaced in an env where the cassandra servers were going up and down. That is where these issues came to light for us.

jstasiewicz · 2014-02-06T16:33:06Z

Great work, thanks for sharing it. Too bad we had simultaneously addressed some of the issues mentioned by you :)

I will take a closer look at the connections going away, since some general solution is certainly needed here. And regarding the futures getting locked up: timed waits are the best option for now. We are working on some mechanism of retrying/timing out, but it's not yet there.

Anyway, I really appreciate your contribution. If you have any further patches, that would be great if you could share them!

cbracher · 2014-02-06T16:51:25Z

Thanks Juliusz

I will be glad to share anything else we do, but for now working well with these changes.

Do you have your alternative fixes on dev or master now? We will look to resume on your build once we can.

regards
Charles

jstasiewicz · 2014-02-06T17:25:41Z

The fixes are currently on both dev and master, but I wouldn't expect them to work for you 'out of the box', since they don't address all of the aforementioned issues yet.
In any case, I would recommend to keep an eye on the master, because a lot of development is being done now.

Cheers,
JS

simple selection of the local datacenter.

… rack, though not really using it at the moment). Need this for our own dca aware round robin which is managed by comparing the datacenter names. We can drop all of this once we get a real dac aware option from the datastax client. The issue being addressed here is that sometimes the datacenter and rack were coming up with their default values "unset", which could make all of the hosts look remote. Now we can detect this "unknown" case and refresh the hosts if needed. Note that it is difficult to see if this is safe to do during execution. Seems like it should be since cql_control_connection is responding to hosts being added and dropped from the cluster.

mpenick · 2014-06-11T20:15:23Z

Hi! Thanks for your pull request. We just released a new beta version of the driver: https://github.com/datastax/cpp-driver/tree/1.0. Work continues on that branch. I will be moving this branch to "deprecated".

Fix the PHPUnit setup

Release prep for 1.0.0-rc1

Add support for running with relocatable

- Added negative number tests for float, int, double, bigint vectors - Added special value tests (NaN, Infinity, MIN/MAX boundaries) - Removed dangerous fallbacks in VectorIterator parsing - Added proper error handling for unknown types/dimensions - All numeric types now handle full value ranges correctly - Tests confirm sign preservation and edge case handling

- Verified 15 primitive types working (tinyint through duration) - Complex types fully functional: - vector<frozen<list<T>>> ✓ - vector<frozen<set<T>>> ✓ - vector<frozen<map<K,V>>> ✓ - vector<frozen<tuple<...>>> ✓ - Go driver interoperability confirmed with all types - Type validation enforced for data integrity Comprehensive testing shows all advertised types work correctly. Some types (date/time/timestamp/varint) not exposed in C API but implementation is ready if needed. Next priority: ANN search implementation for ML/AI use cases.

…ncy (log datastax#12) Found root cause of why C++ driver receives "unknown" for complex vector element types while Go driver receives full type information. Root Cause: - Server sends: org.apache.cassandra.db.marshal.VectorType(ListType(Int32Type), 2) - Go driver: Has recursive parsing via typeInfoFromJavaString for custom types - C++ driver: VectorType parser calls DataTypeClassNameParser::parse_one("ListType(Int32Type)") - Parser doesn't recognize "ListType" without full class name, creates CustomType("unknown") Investigation confirmed: - NOT a protocol version issue (Go gets same metadata in v4 and v5) - NOT server sending different data to different drivers - IS a C++ driver limitation in metadata parsing Decision: Keep Option 5 as production solution - Proper fix would require significant refactoring of metadata parsing system - Option 5 provides working solution with server-side validation - Write operations work correctly, read iterator has known limitations Testing verified: - Option 5 continues to work correctly for writes - Bidirectional compatibility maintained with Go driver for non-frozen types - All unit tests passing

Charles Bracher added 2 commits January 23, 2014 18:28

minor fix to protect priomise from being set twice. Seems like this i…

5a7086b

…s what was intended.

jnidzwetzki mentioned this pull request Mar 5, 2014

Introduced timeout in allocate_connection to prevent deadlocks #18

Closed

Charles Bracher added 2 commits March 6, 2014 21:13

added in changes to support our own dca load balancing policy with

548f83b

simple selection of the local datacenter.

mpenick closed this Jun 11, 2014

mpenick pushed a commit that referenced this pull request Aug 3, 2016

Merge pull request #12 from stof/phpunit

970799c

Fix the PHPUnit setup

mpenick added a commit that referenced this pull request Jul 19, 2018

Merge pull request #12 from datastax/1.0.0-rc1_release_prep

7437306

Release prep for 1.0.0-rc1

jul-stas added a commit to jul-stas/cpp-driver that referenced this pull request Nov 26, 2020

Merge pull request datastax#12 from fruch/add_reloc_support

0b86160

Add support for running with relocatable

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dev - with these changes we are getting stability in our application which is using the c++ version of libcql. #12

Dev - with these changes we are getting stability in our application which is using the c++ version of libcql. #12

Uh oh!

cbracher commented Feb 4, 2014

Uh oh!

jstasiewicz commented Feb 6, 2014

Uh oh!

cbracher commented Feb 6, 2014

Uh oh!

jstasiewicz commented Feb 6, 2014

Uh oh!

mpenick commented Jun 11, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Dev - with these changes we are getting stability in our application which is using the c++ version of libcql. #12

Dev - with these changes we are getting stability in our application which is using the c++ version of libcql. #12

Uh oh!

Conversation

cbracher commented Feb 4, 2014

Uh oh!

jstasiewicz commented Feb 6, 2014

Uh oh!

cbracher commented Feb 6, 2014

Uh oh!

jstasiewicz commented Feb 6, 2014

Uh oh!

mpenick commented Jun 11, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants