I prize local execution for confidentiality and trust reasons, but this mindset seems to put me in a distinct minority. I've just accepted that I will end up having to build my own solutions from time to time.
That said, 'local' can mean a number of different things. On-device local, LAN local, intranet local... You get the idea. I chose to go with an approach of: 'assume resources are constrained and build for that'.
The result was a local-first agentic system (https://github.com/dibrale/Regions) that uses explicit resource sharing and execution patterns to make use of arbitrarily distributed compute. That way, local can be whatever I want it to be, so long as there's an endpoint.
Existing agentic LLM frameworks didn't quite give me the mix of parallelization, modularity and control I wanted, so I decided to write my own. The structure I came up with is good for having many small LLM runners and more deterministic components interact with one another in a deliberate, resource-managed way using concurrency groups.
I ended up adding a lot of docstrings in case I have to step away from my hobbies and dust this code off at a later point, but maybe the framework can be useful for someone else in the meantime? I've added demos and put it up under the MIT license should you wish to try it out.
I apologize in advance if the framework is difficult to get running - my testing options are limited. The GUI works, but is basically vibecoded because I don't know React, so forgive me if that component is particularly temperamental. Ideally, I would like to continue servicing this code and improving my own skills, so any comments on this project - both positive and excoriating- are greatly appreciated.
If I recall correctly, the difference between the SAM models is just a parameter number versus accuracy tradeoff. I have the parameter numbers listed under 'Installation', but the relative quality of the models would be task-dependent and subjective.
I would think that part of the motivation for releasing the smaller models in addition to the larger ones would be use in video image segmentation and mobile filters. The smaller models might actually be more fit for purpose with regard with regard to those applications than the biggest one. However, I'd reccommend the biggest model (vit_h) for desktop or laptop image processing.
That said, 'local' can mean a number of different things. On-device local, LAN local, intranet local... You get the idea. I chose to go with an approach of: 'assume resources are constrained and build for that'.
The result was a local-first agentic system (https://github.com/dibrale/Regions) that uses explicit resource sharing and execution patterns to make use of arbitrarily distributed compute. That way, local can be whatever I want it to be, so long as there's an endpoint.