-
Notifications
You must be signed in to change notification settings - Fork 205
Open
Description
Hello,
I detected this issue while running parallel unit tests driven by separate processes. Constructing multiple encoders in parallel causes a race condition on the downloaded tokenizer file. I believe a file lock may be needed in public_encodings.rs
, and files should not re-download if the file is already cached, as it is a source of latency.
Additionally, I have a request. We have a constraint for some of our systems whereby they do not have internet access, and we need to pass the tokenizer file in to the Encoding instead of having it downloaded. Would it be possible to surface a python api for this purpose? load_harmony_encoding
would need an overload such as load_harmony_encoding_from_file
.
Thank you.
amirhosseinghanipour, unbraind, lianxintao, sp1cae and jackhamburger
Metadata
Metadata
Assignees
Labels
No labels