Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 14 additions & 2 deletions src/sdam/description/topology.rs
Original file line number Diff line number Diff line change
Expand Up @@ -653,9 +653,16 @@ impl TopologyDescription {
|| (topology_max_set_version == server_set_version
&& *topology_max_election_id > server_election_id)
{
// Stale primary.
self.servers.insert(
server_description.address.clone(),
ServerDescription::new(&server_description.address),
ServerDescription::new_from_error(
server_description.address,
Error::invalid_response(
"primary marked stale due to electionId/setVersion \
mismatch",
),
),
);
self.record_primary_state();
return Ok(());
Expand Down Expand Up @@ -688,7 +695,12 @@ impl TopologyDescription {
}

if let ServerType::RsPrimary = self.servers.get(&address).unwrap().server_type {
let description = ServerDescription::new(&address);
let description = ServerDescription::new_from_error(
address.clone(),
Error::invalid_response(
"primary marked stale due to discovery of newer primary",
),
);
self.servers.insert(address, description);
}
}
Expand Down
239 changes: 239 additions & 0 deletions src/test/spec/json/server-discovery-and-monitoring/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,239 @@
# Server Discovery And Monitoring Tests

______________________________________________________________________

The YAML and JSON files in this directory tree are platform-independent tests that drivers can use to prove their
conformance to the Server Discovery And Monitoring Spec.

Additional prose tests, that cannot be represented as spec tests, are described and MUST be implemented.

## Version

Files in the "specifications" repository have no version scheme. They are not tied to a MongoDB server version.

## Format

Each YAML file has the following keys:

- description: A textual description of the test.
- uri: A connection string.
- phases: An array of "phase" objects. A phase of the test optionally sends inputs to the client, then tests the
client's resulting TopologyDescription.

Each phase object has the following keys:

- description: (optional) A textual description of this phase.
- responses: (optional) An array of "response" objects. If not provided, the test runner should construct the client and
perform assertions specified in the outcome object without processing any responses.
- applicationErrors: (optional) An array of "applicationError" objects.
- outcome: An "outcome" object representing the TopologyDescription.

A response is a pair of values:

- The source, for example "a:27017". This is the address the client sent the "hello" or legacy hello command to.
- A hello or legacy hello response, for example `{ok: 1, helloOk: true, isWritablePrimary: true}`. If the response
includes an electionId it is shown in extended JSON like `{"$oid": "000000000000000000000002"}`. The empty response
`{}` indicates a network error when attempting to call "hello" or legacy hello.

An "applicationError" object has the following keys:

- address: The source address, for example "a:27017".
- generation: (optional) The error's generation number, for example `1`. When absent this value defaults to the pool's
current generation number.
- maxWireVersion: The `maxWireVersion` of the connection the error occurs on, for example `9`. Added to support testing
the behavior of "not writable primary" errors on \<4.2 and >=4.2 servers.
- when: A string describing when this mock error should occur. Supported values are:
- "beforeHandshakeCompletes": Simulate this mock error as if it occurred during a new connection's handshake for an
application operation.
- "afterHandshakeCompletes": Simulate this mock error as if it occurred on an established connection for an
application operation (i.e. after the connection pool check out succeeds).
- type: The type of error to mock. Supported values are:
- "command": A command error. Always accompanied with a "response".
- "network": A non-timeout network error.
- "timeout": A network timeout error.
- response: (optional) A command error response, for example `{ok: 0, errmsg: "not primary"}`. Present if and only if
`type` is "command". Note the server only returns "not primary" if the "hello" command has been run on this
connection. Otherwise the legacy error message is returned.

In non-monitoring tests, an "outcome" represents the correct TopologyDescription that results from processing the
responses in the phases so far. It has the following keys:

- topologyType: A string like "ReplicaSetNoPrimary".
- setName: A string with the expected replica set name, or null.
- servers: An object whose keys are addresses like "a:27017", and whose values are "server" objects.
- logicalSessionTimeoutMinutes: null or an integer.
- maxSetVersion: absent or an integer.
- maxElectionId: absent or a BSON ObjectId.
- compatible: absent or a bool.

A "server" object represents a correct ServerDescription within the client's current TopologyDescription. It has the
following keys:

- type: A ServerType name, like "RSSecondary". See [ServerType](../server-discovery-and-monitoring.md#servertype) for
details pertaining to async and multi-threaded drivers.
- error: An optional string that must be a substring of the message on the `ServerDescription.error` object
- setName: A string with the expected replica set name, or null.
- setVersion: absent or an integer.
- electionId: absent, null, or an ObjectId.
- logicalSessionTimeoutMinutes: absent, null, or an integer.
- minWireVersion: absent or an integer.
- maxWireVersion: absent or an integer.
- topologyVersion: absent, null, or a topologyVersion document.
- pool: (optional) A "pool" object.

A "pool" object represents a correct connection pool for a given server. It has the following keys:

- generation: This server's expected pool generation, like `0`.

In monitoring tests, an "outcome" contains a list of SDAM events that should have been published by the client as a
result of processing hello or legacy hello responses in the current phase. Any SDAM events published by the client
during its construction (that is, prior to processing any of the responses) should be combined with the events published
during processing of hello or legacy hello responses of the first phase of the test. A test MAY explicitly verify events
published during client construction by providing an empty responses array for the first phase.

## Use as unittests

### Mocking

Drivers should be able to test their server discovery and monitoring logic without any network I/O, by parsing hello (or
legacy hello) and application error from the test file and passing them into the driver code. Parts of the client and
monitoring code may need to be mocked or subclassed to achieve this.
[A reference implementation for PyMongo 3.10.1 is available here](https://github.com/mongodb/mongo-python-driver/blob/3.10.1/test/test_discovery_and_monitoring.py).

### Initialization

For each file, create a fresh client object initialized with the file's "uri".

All files in the "single" directory include a connection string with one host and no "replicaSet" option. Set the
client's initial TopologyType to Single, however that is achieved using the client's API. (The spec says "The user MUST
be able to set the initial TopologyType to Single" without specifying how.)

All files in the "sharded" directory include a connection string with multiple hosts and no "replicaSet" option. Set the
client's initial TopologyType to Unknown or Sharded, depending on the client's API.

All files in the "rs" directory include a connection string with a "replicaSet" option. Set the client's initial
TopologyType to ReplicaSetNoPrimary. (For most clients, parsing a connection string with a "replicaSet" option
automatically sets the TopologyType to ReplicaSetNoPrimary.) Some of the files in "rs" are post-fixed with "pre-6.0".
These files test the `updateRSFromPrimary` behavior prior to maxWireVersion 17, there should be no special handling
required for these tests.

Set up a listener to collect SDAM events published by the client, including events published during client construction.

### Test Phases

For each phase in the file:

1. Parse the "responses" array. Pass in the responses in order to the driver code. If a response is the empty object
`{}`, simulate a network error.
2. Parse the "applicationErrors" array. For each element, simulate the given error as if it occurred while running an
application operation. Note that it is sufficient to construct a mock error and call the procedure which updates
the topology, e.g. `topology.handleApplicationError(address, generation, maxWireVersion, error)`.

For non-monitoring tests, once all responses are processed, assert that the phase's "outcome" object is equivalent to
the driver's current TopologyDescription.

For monitoring tests, once all responses are processed, assert that the events collected so far by the SDAM event
listener are equivalent to the events specified in the phase.

Some fields such as "logicalSessionTimeoutMinutes", "compatible", and "topologyVersion" were added later and haven't
been added to all test files. If these fields are present, test that they are equivalent to the fields of the driver's
current TopologyDescription or ServerDescription.

For monitoring tests, clear the list of events collected so far.

Continue until all phases have been executed.

## Integration Tests

Integration tests are provided in the "unified" directory and are written in the
[Unified Test Format](../../unified-test-format/unified-test-format.md).

## Prose Tests

The following prose tests cannot be represented as spec tests and MUST be implemented.

### Streaming protocol Tests

Drivers that implement the streaming protocol (multi-threaded or asynchronous drivers) must implement the following
tests. Each test should be run against a standalone, replica set, and sharded cluster unless otherwise noted.

Some of these cases should already be tested with the old protocol; in that case just verify the test cases succeed with
the new protocol.

1. Configure the client with heartbeatFrequencyMS set to 500, overriding the default of 10000. Assert the client
processes hello and legacy hello replies more frequently (approximately every 500ms).

### RTT Tests

Run the following test(s) on MongoDB 4.4+.

1. Test that RTT is continuously updated.
1. Create a client with `heartbeatFrequencyMS=500`, `appName=streamingRttTest`, and subscribe to server events.

2. Run a find command to wait for the server to be discovered.

3. Sleep for 2 seconds. This must be long enough for multiple heartbeats to succeed.

4. Assert that each `ServerDescriptionChangedEvent` includes a non-zero RTT.

5. Configure the following failpoint to block hello or legacy hello commands for 250ms which should add extra latency
to each RTT check:

```javascript
db.adminCommand({
configureFailPoint: "failCommand",
mode: {times: 1000},
data: {
failCommands: ["hello"], // or the legacy hello command
blockConnection: true,
blockTimeMS: 500,
appName: "streamingRttTest",
},
});
```

6. Wait for the server's RTT to exceed 250ms. Eventually the average RTT should also exceed 500ms but we use 250ms to
speed up the test. Note that the
[Server Description Equality](../server-discovery-and-monitoring.md#server-description-equality) rule means that
ServerDescriptionChangedEvents will not be published. This test may need to use a driver specific helper to
obtain the latest RTT instead. If the RTT does not exceed 250ms after 10 seconds, consider the test failed.

7. Disable the failpoint:

```javascript
db.adminCommand({
configureFailPoint: "failCommand",
mode: "off",
});
```

### Heartbeat Tests

1. Test that `ServerHeartbeatStartedEvent` is emitted before the monitoring socket was created
1. Create a mock TCP server (example shown below) that pushes a `client connected` event to a shared array when a
client connects and a `client hello received` event when the server receives the client hello and then closes
the connection:

```javascript
let events = [];
server = createServer(clientSocket => {
events.push('client connected');

clientSocket.on('data', () => {
events.push('client hello received');
clientSocket.destroy();
});
});
server.listen(9999);
```

2. Create a client with `serverSelectionTimeoutMS: 500` and listen to `ServerHeartbeatStartedEvent` and
`ServerHeartbeatFailedEvent`, pushing the event name to the same shared array as the mock TCP server

3. Attempt to connect client to previously created TCP server, catching the error when the client fails to connect

4. Assert that the first four elements in the array are: :

```javascript
['serverHeartbeatStartedEvent', 'client connected', 'client hello received', 'serverHeartbeatFailedEvent']
```
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
"b:27017"
],
"minWireVersion": 0,
"maxWireVersion": 6
"maxWireVersion": 21
}
]
],
Expand Down Expand Up @@ -50,15 +50,16 @@
"b:27017"
],
"minWireVersion": 0,
"maxWireVersion": 6
"maxWireVersion": 21
}
]
],
"outcome": {
"servers": {
"a:27017": {
"type": "Unknown",
"setName": null
"setName": null,
"error": "primary marked stale due to discovery of newer primary"
},
"b:27017": {
"type": "RSPrimary",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ phases: [
setName: "rs",
hosts: ["a:27017", "b:27017"],
minWireVersion: 0,
maxWireVersion: 6
maxWireVersion: 21
}]
],

Expand Down Expand Up @@ -52,7 +52,7 @@ phases: [
setName: "rs",
hosts: ["a:27017", "b:27017"],
minWireVersion: 0,
maxWireVersion: 6
maxWireVersion: 21
}]
],

Expand All @@ -63,7 +63,8 @@ phases: [
"a:27017": {

type: "Unknown",
setName:
setName:,
error: "primary marked stale due to discovery of newer primary"
},

"b:27017": {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
"$oid": "000000000000000000000001"
},
"minWireVersion": 0,
"maxWireVersion": 6
"maxWireVersion": 21
}
]
],
Expand Down Expand Up @@ -67,7 +67,7 @@
"$oid": "000000000000000000000002"
},
"minWireVersion": 0,
"maxWireVersion": 6
"maxWireVersion": 21
}
]
],
Expand All @@ -76,7 +76,8 @@
"a:27017": {
"type": "Unknown",
"setName": null,
"electionId": null
"electionId": null,
"error": "primary marked stale due to discovery of newer primary"
},
"b:27017": {
"type": "RSPrimary",
Expand Down Expand Up @@ -114,7 +115,7 @@
"$oid": "000000000000000000000001"
},
"minWireVersion": 0,
"maxWireVersion": 6
"maxWireVersion": 21
}
]
],
Expand All @@ -123,7 +124,8 @@
"a:27017": {
"type": "Unknown",
"setName": null,
"electionId": null
"electionId": null,
"error": "primary marked stale due to electionId/setVersion mismatch"
},
"b:27017": {
"type": "RSPrimary",
Expand Down
Loading