-
Notifications
You must be signed in to change notification settings - Fork 3.1k
[CI] Add deepep tests to CI #7872
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 30 commits
Commits
Show all changes
32 commits
Select commit
Hold shift + click to select a range
dd5f880
Add deepep tests to CI
ch-wan 6ee70f7
update
ch-wan 2a2020b
fix
ch-wan 574b4cf
update
ch-wan c616f38
move all tests to 8-gpu-runner
ch-wan 0e18a10
update install script
ch-wan dbe5e17
update install script
ch-wan 5ce2c06
update install script
ch-wan 4395325
fix
ch-wan b3b89fe
update
ch-wan cfff6b1
fix
ch-wan aca4af9
fix
ch-wan 5cbcad4
fix
ch-wan 45dd430
update BLOCK_D for ci
ch-wan 1ccdddc
fix cuda graph max bs
ch-wan ed3d555
Merge commit 'b6b6268ccf1d992cac417b995fd250281c17912a' into cheng/ci…
ch-wan 9112f1f
fix
ch-wan f3ab595
try 4 gpu and fix eagle
ch-wan af4b52e
fix
ch-wan 1a09e0c
add large test
ch-wan e0103ae
fix
ch-wan f3c2b84
recover original tests
ch-wan ace18b3
fix name
ch-wan 4e92d4d
update dependency
ch-wan 6b5442b
update dependency
ch-wan 311fc26
format
ch-wan f85c702
recover format
ch-wan f76e0d1
fix dependency
ch-wan ebd4aba
fix
ch-wan f1b659a
Merge branch 'main' into cheng/ci/deepep-test
ch-wan 0598f36
remove throughput check
ch-wan 755611e
Apply suggestions from code review
ch-wan File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,75 @@ | ||
#!/bin/bash | ||
# Install the dependency in CI. | ||
set -euxo pipefail | ||
|
||
bash scripts/ci_install_dependency.sh | ||
|
||
if python3 -c "import deep_ep" >/dev/null 2>&1; then | ||
echo "deep_ep is already installed or importable. Skipping installation." | ||
exit 0 | ||
fi | ||
|
||
export GDRCOPY_HOME=/usr/src/gdrdrv-2.4.4/ | ||
export NVSHMEM_DIR=/opt/nvshmem/install | ||
export LD_LIBRARY_PATH="${NVSHMEM_DIR}/lib:$LD_LIBRARY_PATH" | ||
export PATH="${NVSHMEM_DIR}/bin:$PATH" | ||
export CUDA_HOME=/usr/local/cuda | ||
|
||
# Install system dependencies | ||
apt install -y curl wget git sudo libibverbs-dev rdma-core infiniband-diags openssh-server perftest ibverbs-providers libibumad3 libibverbs1 libnl-3-200 libnl-route-3-200 librdmacm1 build-essential cmake | ||
|
||
# Install GDRCopy | ||
rm -rf /opt/gdrcopy && mkdir -p /opt/gdrcopy | ||
mkdir -p /opt/nvshmem | ||
cd /opt/gdrcopy | ||
git clone https://github.com/NVIDIA/gdrcopy.git . | ||
git checkout v2.4.4 | ||
apt update | ||
apt install -y nvidia-dkms-535 | ||
apt install -y build-essential devscripts debhelper fakeroot pkg-config dkms | ||
apt install -y check libsubunit0 libsubunit-dev | ||
cd packages | ||
CUDA=/usr/local/cuda ./build-deb-packages.sh | ||
dpkg -i gdrdrv-dkms_*.deb | ||
dpkg -i libgdrapi_*.deb | ||
dpkg -i gdrcopy-tests_*.deb | ||
dpkg -i gdrcopy_*.deb | ||
|
||
if [ ! -e "/usr/lib/x86_64-linux-gnu/libmlx5.so" ]; then | ||
ln -s /usr/lib/x86_64-linux-gnu/libmlx5.so.1 /usr/lib/x86_64-linux-gnu/libmlx5.so | ||
fi | ||
apt-get update && apt-get install -y libfabric-dev | ||
|
||
# Clone DeepEP | ||
rm -rf /root/.cache/deepep && git clone https://github.com/deepseek-ai/DeepEP.git /root/.cache/deepep && cd /root/.cache/deepep && git checkout eef7ab50fa5cf0ab1dd3fce4c6493c90bdf290ac | ||
|
||
# Install NVSHMEM | ||
cd /opt/nvshmem | ||
wget https://developer.download.nvidia.com/compute/redist/nvshmem/3.2.5/source/nvshmem_src_3.2.5-1.txz | ||
tar -xf nvshmem_src_3.2.5-1.txz | ||
rm nvshmem && mv nvshmem_src nvshmem | ||
ch-wan marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
cd nvshmem | ||
git apply /root/.cache/deepep/third-party/nvshmem.patch | ||
NVSHMEM_SHMEM_SUPPORT=0 \ | ||
NVSHMEM_UCX_SUPPORT=0 \ | ||
NVSHMEM_USE_NCCL=0 \ | ||
NVSHMEM_MPI_SUPPORT=0 \ | ||
NVSHMEM_IBGDA_SUPPORT=1 \ | ||
NVSHMEM_PMIX_SUPPORT=0 \ | ||
NVSHMEM_TIMEOUT_DEVICE_POLLING=0 \ | ||
NVSHMEM_USE_GDRCOPY=1 \ | ||
cmake -S . -B build/ -DCMAKE_INSTALL_PREFIX=/opt/nvshmem/install -DCMAKE_CUDA_ARCHITECTURES=90 | ||
cd build | ||
make -j$(nproc) install | ||
|
||
# Install DeepEP | ||
cd /root/.cache/deepep && git checkout eef7ab50fa5cf0ab1dd3fce4c6493c90bdf290ac && python3 setup.py install | ||
ch-wan marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
# Verify configuration | ||
echo "=== NCCL Configuration ===" | ||
nvidia-smi topo -m | ||
nvidia-smi nvlink -s | ||
echo "=== Verify GDRCOPY ===" | ||
gdrcopy_copybw | ||
echo "=== Verify NVSHMEM ===" | ||
nvshmem-info -a |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,146 @@ | ||
import unittest | ||
from types import SimpleNamespace | ||
|
||
import requests | ||
|
||
from sglang.srt.utils import kill_process_tree | ||
from sglang.test.few_shot_gsm8k import run_eval as run_eval_few_shot_gsm8k | ||
from sglang.test.test_utils import ( | ||
DEFAULT_DEEPPEP_MODEL_NAME_FOR_TEST, | ||
DEFAULT_TIMEOUT_FOR_SERVER_LAUNCH, | ||
DEFAULT_URL_FOR_TEST, | ||
CustomTestCase, | ||
popen_launch_server, | ||
) | ||
|
||
|
||
class TestDeepseek(CustomTestCase): | ||
@classmethod | ||
def setUpClass(cls): | ||
cls.model = DEFAULT_DEEPPEP_MODEL_NAME_FOR_TEST | ||
cls.base_url = DEFAULT_URL_FOR_TEST | ||
cls.process = popen_launch_server( | ||
cls.model, | ||
cls.base_url, | ||
timeout=DEFAULT_TIMEOUT_FOR_SERVER_LAUNCH, | ||
other_args=[ | ||
"--trust-remote-code", | ||
"--tp", | ||
"8", | ||
"--enable-dp-attention", | ||
"--dp", | ||
"8", | ||
"--moe-dense-tp-size", | ||
"1", | ||
"--enable-dp-lm-head", | ||
"--enable-deepep-moe", | ||
"--enable-two-batch-overlap", | ||
"--ep-num-redundant-experts", | ||
"32", | ||
"--ep-dispatch-algorithm", | ||
"dynamic", | ||
"--eplb-algorithm", | ||
"deepseek", | ||
"--cuda-graph-bs", | ||
"256", | ||
"--max-running-requests", | ||
"2048", | ||
], | ||
) | ||
|
||
@classmethod | ||
def tearDownClass(cls): | ||
kill_process_tree(cls.process.pid) | ||
|
||
def test_gsm8k(self): | ||
args = SimpleNamespace( | ||
num_shots=8, | ||
data_path=None, | ||
num_questions=1250, | ||
parallel=1250, | ||
max_new_tokens=512, | ||
host="http://127.0.0.1", | ||
port=int(self.base_url.split(":")[-1]), | ||
) | ||
metrics = run_eval_few_shot_gsm8k(args) | ||
print(f"Eval accuracy of GSM8K: {metrics=}") | ||
|
||
self.assertGreater(metrics["accuracy"], 0.93) | ||
self.assertGreater(metrics["output_throughput"], 3800) | ||
|
||
|
||
class TestDeepseekMTP(CustomTestCase): | ||
@classmethod | ||
def setUpClass(cls): | ||
cls.model = DEFAULT_DEEPPEP_MODEL_NAME_FOR_TEST | ||
cls.base_url = DEFAULT_URL_FOR_TEST | ||
cls.process = popen_launch_server( | ||
cls.model, | ||
cls.base_url, | ||
timeout=DEFAULT_TIMEOUT_FOR_SERVER_LAUNCH, | ||
other_args=[ | ||
"--trust-remote-code", | ||
"--tp", | ||
"8", | ||
"--enable-dp-attention", | ||
"--dp", | ||
"8", | ||
"--moe-dense-tp-size", | ||
"1", | ||
"--enable-dp-lm-head", | ||
"--enable-deepep-moe", | ||
"--enable-two-batch-overlap", | ||
"--ep-num-redundant-experts", | ||
"32", | ||
"--ep-dispatch-algorithm", | ||
"dynamic", | ||
"--eplb-algorithm", | ||
"deepseek", | ||
"--cuda-graph-bs", | ||
"64", # TODO: increase it to 128 when TBO is supported in draft_extend | ||
"--max-running-requests", | ||
"512", | ||
"--speculative-algorithm", | ||
"NEXTN", | ||
"--speculative-num-steps", | ||
"1", | ||
"--speculative-eagle-topk", | ||
"1", | ||
"--speculative-num-draft-tokens", | ||
"2", | ||
], | ||
) | ||
|
||
@classmethod | ||
def tearDownClass(cls): | ||
kill_process_tree(cls.process.pid) | ||
|
||
def test_gsm8k(self): | ||
args = SimpleNamespace( | ||
num_shots=8, | ||
data_path=None, | ||
num_questions=1250, | ||
parallel=1250, | ||
max_new_tokens=512, | ||
host="http://127.0.0.1", | ||
port=int(self.base_url.split(":")[-1]), | ||
) | ||
metrics = run_eval_few_shot_gsm8k(args) | ||
print(f"Eval accuracy of GSM8K: {metrics=}") | ||
|
||
self.assertGreater(metrics["accuracy"], 0.93) | ||
|
||
server_info = requests.get(self.base_url + "/get_server_info") | ||
avg_spec_accept_length = server_info.json()["internal_states"][0][ | ||
"avg_spec_accept_length" | ||
] | ||
print( | ||
f"###test_gsm8k:\n" | ||
f"accuracy={metrics['accuracy']=:.3f}\n" | ||
f"{avg_spec_accept_length=:.3f}\n" | ||
) | ||
self.assertGreater(avg_spec_accept_length, 1.9) | ||
|
||
|
||
if __name__ == "__main__": | ||
unittest.main() |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider defining
BLOCK_D
based onis_in_ci()
as a constant at the file level to improve readability and maintainability.