-
Notifications
You must be signed in to change notification settings - Fork 86
Closed
Description
Hello, question similar to #84 but solution provided in that issue doesn't work. It looks like for sqlalchemy + asyncpg we need to provide bytes value instead of string.
I have defined table like:
from uuid import UUID
from pgvector.sqlalchemy import BIT
from sqlalchemy import ForeignKey
from sqlalchemy.orm import Mapped, mapped_column
class MediaFileVectors(Base):
__tablename__ = "media_file_vectors"
id: Mapped[UUID] = mapped_column(ForeignKey("xxx.id"), primary_key=True)
original_file_dhash_vh_512: Mapped[BIT | None] = mapped_column(
BIT(length=512), nullable=True
)
When I'm doing insert like:
from uuid import uuid4
from pgvector.utils.bit import Bit
from sqlalchemy.dialects.postgresql import insert
dhash_vh_512_vector: list[bool] = [True, ...]
vector_query_ = insert(MediaFileVectors).values(
{
"media_file_id": uuid4(),
"original_file_dhash_vh_512": Bit(dhash_vh_512_vector).to_text(),
}
)
vector_query = vector_query_.on_conflict_do_update(
index_elements=[MediaFileVectors.media_file_id],
set_={
"original_file_dhash_vh_512": vector_query_.excluded.original_file_dhash_vh_512,
},
).returning(MediaFileVectors)
since Bit(dhash_vh_512_vector).to_text()
gives me string value like 1012...
I'm getting:
sqlalchemy.exc.DBAPIError: (sqlalchemy.dialects.postgresql.asyncpg.Error) <class 'asyncpg.exceptions.DataError'>: invalid input for query argument $2: '101010101010101010101010101010101010101... (a bytes-like object is required, not 'str')
[SQL: INSERT INTO media_file_vectors (media_file_id, original_file_dhash_vh_512) VALUES ($1::UUID, $2) ON CONFLICT (media_file_id) DO UPDATE SET original_file_dhash_vh_512 = excluded.original_file_dhash_vh_512 RETURNING media_file_vectors.media_file_id, media_file_vectors.original_file_dhash_vh_512]
[parameters: (UUID('13a6ed84-f4db-46af-bb6f-811cd7fef298'), '10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101 ... (214 characters truncated) ... 01010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010')]
(Background on this error at: https://sqlalche.me/e/20/dbapi)
/usr/local/lib/python3.12/site-packages/sqlalchemy/dialects/postgresql/asyncpg.py:780: DBAPIError
Error suggest that I should pass bytes directly instead. So when changing value from string to Bit(dhash_vh_512_vector).to_binary()
I'm getting other error:
sqlalchemy.exc.DBAPIError: (sqlalchemy.dialects.postgresql.asyncpg.Error) <class 'asyncpg.exceptions.StringDataLengthMismatchError'>: bit string length 544 does not match type bit(512)
[SQL: INSERT INTO media_file_vectors (media_file_id, original_file_dhash_vh_512) VALUES ($1::UUID, $2) ON CONFLICT (media_file_id) DO UPDATE SET original_file_dhash_vh_512 = excluded.original_file_dhash_vh_512 RETURNING media_file_vectors.media_file_id, media_file_vectors.original_file_dhash_vh_512]
[parameters: (UUID('1dd0b94b-654f-4eb3-82dd-5336017564bf'), b'\x00\x00\x02\x00\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa')]
(Background on this error at: https://sqlalche.me/e/20/dbapi)
/usr/local/lib/python3.12/site-packages/sqlalchemy/dialects/postgresql/asyncpg.py:780: DBAPIError
After changing BIT vector length to 544 it passes but I'm not sure why I'm getting 544 vector length.
What is the correct way of inserting such value?
Metadata
Metadata
Assignees
Labels
No labels