Skip to content

Inserting binary vectors using SQLalchemy + asyncpg #110

@kamilglod

Description

@kamilglod

Hello, question similar to #84 but solution provided in that issue doesn't work. It looks like for sqlalchemy + asyncpg we need to provide bytes value instead of string.

I have defined table like:

from uuid import UUID
from pgvector.sqlalchemy import BIT
from sqlalchemy import ForeignKey
from sqlalchemy.orm import Mapped, mapped_column


class MediaFileVectors(Base):
    __tablename__ = "media_file_vectors"

    id: Mapped[UUID] = mapped_column(ForeignKey("xxx.id"), primary_key=True)
    original_file_dhash_vh_512: Mapped[BIT | None] = mapped_column(
        BIT(length=512), nullable=True
    )

When I'm doing insert like:

from uuid import uuid4
from pgvector.utils.bit import Bit
from sqlalchemy.dialects.postgresql import insert

dhash_vh_512_vector: list[bool] = [True, ...]
vector_query_ = insert(MediaFileVectors).values(
    {
        "media_file_id": uuid4(),
        "original_file_dhash_vh_512": Bit(dhash_vh_512_vector).to_text(),
    }
)
vector_query = vector_query_.on_conflict_do_update(
    index_elements=[MediaFileVectors.media_file_id],
    set_={
        "original_file_dhash_vh_512": vector_query_.excluded.original_file_dhash_vh_512,
    },
).returning(MediaFileVectors)

since Bit(dhash_vh_512_vector).to_text() gives me string value like 1012... I'm getting:

sqlalchemy.exc.DBAPIError: (sqlalchemy.dialects.postgresql.asyncpg.Error) <class 'asyncpg.exceptions.DataError'>: invalid input for query argument $2: '101010101010101010101010101010101010101... (a bytes-like object is required, not 'str')
[SQL: INSERT INTO media_file_vectors (media_file_id, original_file_dhash_vh_512) VALUES ($1::UUID, $2) ON CONFLICT (media_file_id) DO UPDATE SET original_file_dhash_vh_512 = excluded.original_file_dhash_vh_512 RETURNING media_file_vectors.media_file_id, media_file_vectors.original_file_dhash_vh_512]
[parameters: (UUID('13a6ed84-f4db-46af-bb6f-811cd7fef298'), '10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101 ... (214 characters truncated) ... 01010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010')]
(Background on this error at: https://sqlalche.me/e/20/dbapi)

/usr/local/lib/python3.12/site-packages/sqlalchemy/dialects/postgresql/asyncpg.py:780: DBAPIError

Error suggest that I should pass bytes directly instead. So when changing value from string to Bit(dhash_vh_512_vector).to_binary() I'm getting other error:

sqlalchemy.exc.DBAPIError: (sqlalchemy.dialects.postgresql.asyncpg.Error) <class 'asyncpg.exceptions.StringDataLengthMismatchError'>: bit string length 544 does not match type bit(512)
[SQL: INSERT INTO media_file_vectors (media_file_id, original_file_dhash_vh_512) VALUES ($1::UUID, $2) ON CONFLICT (media_file_id) DO UPDATE SET original_file_dhash_vh_512 = excluded.original_file_dhash_vh_512 RETURNING media_file_vectors.media_file_id, media_file_vectors.original_file_dhash_vh_512]
[parameters: (UUID('1dd0b94b-654f-4eb3-82dd-5336017564bf'), b'\x00\x00\x02\x00\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa')]
(Background on this error at: https://sqlalche.me/e/20/dbapi)

/usr/local/lib/python3.12/site-packages/sqlalchemy/dialects/postgresql/asyncpg.py:780: DBAPIError

After changing BIT vector length to 544 it passes but I'm not sure why I'm getting 544 vector length.

What is the correct way of inserting such value?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions