-
Notifications
You must be signed in to change notification settings - Fork 410
apply monkey patch to instance instead of class method to avoid conflicts in mixed usage scenario #772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Taking llama as an example, _apply_liger_kernel_to_instance and revert_liger_kernel_to_llama can restore all environment modules.
@shimizust, @Tcc0403 and @vaibhavjindal pls help review. |
@YangKai0616 Thanks! Great catch! Although there are some parts of monkey patch that are difficult to revert, patching lce_forward to any model instance is indeed something we can fix right now. Can you also apply your solution to all models and add By the way, I'm aware that there are multiple scenarios where |
@Tcc0403 Hi, I have modified the code as per your requirements and fixed some bugs along the way. Please review the code. The attached image shows the results of my test_monkey_patch.py test. Since I need to test the functionality of different libraries in the same process, including liger_kernel. It is inevitable to restore the module configuration of the environment. Thx! |
Thank you for the quick updates!
Would you like to share what bugs you ran into? It could be a reflection of our test deisgn. Are they existing bugs before this PR?
Sorry if I got you confused. I was trying to explain the revert functions in our library has some known limitations against patching class modules, so one should not rely on them. However, patching and reverting model instances should leave NO side effects, as you proposed. Any fixes or suggestions working towards this goal are welcome! Your PR can be a great starting point for us to examine our patching methods along with testings. |
@Tcc0403 Hi, the bug encountered was still caused by changes to the global module configuration. During testing, I modified this line in the code. Without adding this patch, this section would be altered, affecting subsequent tests, specifically I think this bug already existed before this PR was submitted. Additionally, the apply_liger_kernel_to_paligemma also applies this patch, but it was not tested. |
Do you think we need to patch
Noted. Thanks for your share. |
I think this is necessary. My test results show that importlib.reload(torch.nn) does not revert the patch's effect on layer_norm. You can see that the newly loaded model is still LigerLayerNorm. This is the same issue as with the previous forward problem. |
Thanks for the test. Let's handle it in this PR as well. |
Hi,please review the latest commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
Hi @vaibhavjindal , pls help approve the execution of workflows. |
@YangKai0616 Fix the coding style with |
@vaibhavjindal and @Tcc0403, Sorry for my oversight. Please review it again. |
@YangKai0616 It seems I'll fix its coding style in another PR. |
Maybe the reason for the error is CI. I didn't change the dev/modal/benchmark.py file in the first CI, but it still went wrong. And I downloaded the code of the main branch, then executed make checkstyle and it reported the same error. |
@YangKai0616 Yes there's our mistake. Thanks for the contribution! |
In some cases, revert_liger_kernel_to_XX can’t restore the replacement of model.forward by monkey_patch. For scenarios involving a passed model instance, modifying the assignment logic of XX_lce_forward can prevent such issues.
Using llama as an example to update the monkey_patch.py. Then revert_liger_kernel_to_llama logic can restore all environment module configurations.
Summary
When I use pytest to test the test_sft_trainer_with_liger and test_train_offloading functions from test_sft_slow.py under the same process, the monkey patch applied by the liger kernel in the first executed function affects the subsequent function's test.
Even though I used the revert_liger_kernel_to_XX logic to restore the module configuration of the environment, I found that model.forward still retains a reference to XX_lce_forward.
When the model passed to apply_liger_kernel_to_XX is an instance, using MethodType to modify only the forward method of the instantiated model can avoid this issue.
I wrote a simple test script to reproduce the bug:
The above code produces the following output:
Obviously,
reload_model: <bound method lce_forward of LlamaForCausalLM(
should not be lce_forward here.Testing Done
After modifying monkey_patch.py, reload_model can correctly output as
reload_model: <bound method LlamaForCausalLM.forward of LlamaForCausalLM(
.XPU or CUDA