Thanks for the feedback. I've updated that paragraph to clarify the product placement is for Llama. I don't really think Meta needs our help to much...
But let's take splitting as an example. Does it happen in the Python part or the Postgres part? Is it a feature of the Python SDK or is it a feature of pgml? I couldn't understand this from the docs.
Most managers and organizations miss the fact that engineers are often motivated by solving puzzles in ways like this, because it's fun. If you want to accomplish big challenges, quickly, make it fun for the engineer. Working long hours on projects like this doesn't burn engineers out, it sharpens their skills, broadens their knowledge and grows the organizations capabilities long term.
This sort of cultural difference of exploration and letting work be fun is one of the big things that accounts for the differences in velocity between big co's and little co's. Does your work give you energy to the point where not only you love doing it, you want to tell everyone else about it?
This. And also "shiny new thing" vs. "don't touch if it works" -- oftentimes it's fun to rewrite some part of a project even if it works ok. As a manager, I just need to limit the blast scope, and make sure the refactoring reaches its end, not leave it half-baked.
You can do all of that in a single SQL query, with pgml.embed() and then pgml.train() a custom reranker with xgboost, to pgml.predict() the conversion score of a search result based on click-through-rate, or other objective.
If you'd like free hosting, feel free to reach out. I'm one of the founders at postgresml.org.
This is an SDK built to interact with PostgresML, which provides ML & AI _inside_ a Postgres database. Clients in this case don't perform inference, rather the server does. You could run the open source server locally, or connect to one running in the cloud.
If anyone would like a free PostgresML T-shirt, we just did our first run. Feel free to email me with your shipping info and size. It'd also be nice to get to know you a bit if your email address isn't obvious.
Quantization allows PostgresML to fit larger models in less RAM. These algorithms perform inference significantly faster on NVIDIA, Apple and Intel hardware. Half-precision floating point and quantized optimizations are now available for your favorite LLMs downloaded from Huggingface.
We've been working on a Python SDK[1] for PostgresML to make it easier for application developers to get the performance and scalability benefits of integrated memory for LLMs, by combining embedding generation, vector recall and LLM tasks from HuggingFace in a single database query.
This work builds on our previous efforts that give a 10x performance improvement from generating the LLM embedding[2] from input text along with tuning vector recall[3] in a single process to avoid excessive network transit.
We'd love your feedback on our roadmap[4] for this extension, if you have other use cases for an ML application database. So far, we've implemented our best practices for scalable vector storage to provide an example reference implementation for interacting with an ML application database based on Postgres.
Full disclaimer, I work on PostgresML, and I'm not a MindsDB expert.
They both do ML in the database, but we've been at least as focused on scalability for Postgres workloads as ML functionality, with our PgCat project. It's a pooler that gives us load balancing, sharding, failover etc to handle large clusters of many machines for application and inference workloads to scale horizontally.
OTOH MindsDB interconnects with just about every data source out there, which you may consider an advantage.