ML is still mostly used by tech companies, but non-technical people are often those that would benefit the most from it, e.g. realtors training small models to get better house estimates, marketing folks optimizing their ad campaigns, Shopify sellers sourcing new products, etc.
Existing no-code ML solutions are quite expensive and way too complex for small companies: long sign-up flow, require you to read 3 tutorials and go through 26 steps, spin up cloud servers, learn a lot of jargon... and the UX is often optimized for technical people, not digital marketers or Shopify seller.
ML Console takes a fresh shot at this problem: 100% client-side ML training. This is enabled by modern web technologies (WASM, WebGL), which allow us to process data and train models with a minor performance overhead.
AFAIK, this is the fastest way to train an ML model compared to all other solutions (cloud-based or not):
1) No lengthy boilerplate code for every new project.
2) No downloads.
3) No sign-ups or credit-card checks.
4) No lengthy tutorials.
5) No need to spin-up cloud instances and share data back-and-forth between the client and the backend.
Best of all, this means we never see your data, as all computations run locally. This also allows us to be cheaper than all our competitors => for free :) we'll provide a subscription service offering more advanced features later on.
We're still very early stage, so any feedback would be greatly appreciated!
For this to be usable to me (level of knowledge: I can do all of what's done on this page in Python / R, but don't have a PhD in stats or anything), I would need:
- Some sense of how training and validation is done
- Model weights
- Something that helps interpret fitting / overfitting
I'm not sure it's super useful for someone below my level of knowledge (or maybe at my level but just don't know Python?). It seems like a random marketing person, shopify person, etc. would need:
- Better understanding of when to use regression vs. classification
- Help interpreting of whether the MAE / loss is good or bad
- Some automatic way to prevent overfitting
- Guidance on what consitutes good data and how to structure it for input
- Examples of how it might be applied to their use case
- Knowledge of how often models fail / how much they should be indexing on the model
1) The target user is: a freelance marketer, small/medium enterprise without dedicated data scientists, technical people (e.g. engineers) from other fields, or without extensive stats/ml knowledge.
2) RE:lack of hand-holding.
This is likely the biggest challenge for this project: showing people how to think about ML, and how to use it to derive value for their business, without going through hours of training or lengthy tutorials.
> - Guidance on what consitutes good data and how to structure it for input
> - Examples of how it might be applied to their use case
Most users are stuck at this initial phase (preparing the data).
One thing I'm working now is adding use-case focused guides: short article explaining how a realtor would go about building a model to help them roughly value houses (including data collection). I hope this helps with these two points.
> Help interpreting of whether the MAE / loss is good or bad
There's a couple things I'm working on that might help:
1) Show metric improvement relative to a baseline (e.g. MAE for a model that always predicts the mean).
2) Show both train and test curves. The current curve is only on test data.
> Better understanding of when to use regression vs. classification
> Some sense of how training and validation is done
I'm currently redesigning the UX around a step-by-step flow (for initial users at least), that should give a bit of room to explain things along the way (e.g. what classification/regression means for total beginners).
> Some automatic way to prevent overfitting
Medium-term models there'll be a mode to continuously train models to tune hyperparameters, that should help avoid overfitting. Until then it's mostly handpicked parameters (including regularization), and having tested this on Kaggle challenges it still sometimes beats my hand-written ML code :)
> Model weights
You can already download the model weights (download icon next to the model name) ; or do you mean feature importances? That's a planned feature, but it's not straightforward to implement in a generic way so might take a month or two before it's shipped.
Ah, for what it’s worth I looked for a while and didn’t see the download icon until you told me it was there, and I consider myself pretty good at picking up new user interfaces relative to your average user (e.g. became competent at Photoshop, Excel, etc. without handholding). It wasn’t intuitive that’s where the model weights would be stored so my brain never looked for a download icon.
A bigger issue is I realized the weights downloaded do not include the data preprocessing... So proper model export will unfortunately take more work before it's fully ready to use.
As a random "marketing" person, I agree. I can blindly fumble my way to train a model, but a little more hand-holding would help me understand context and instill more confidence in the results I magically get by clicking things.
Agreed, although the app is still much easier to use than other ML apps I've tried, it's still a bit too confusing/"magical" for non-technical folks.
As mentioned on the parent comment, this is the #1 priority, and I hope a redesigned flow (e.g. step by step from loading the data, picking the target, then features, and explaining things at each step) will help here. I'll also have a page with concrete use cases, including marketing.
Please reach out (email on the website) if you have some ML use case you'd like to solve with ML Console, happy to help you prepare your data etc.
This is wonderful. I am ordinary joe and I find this more than useful. this is 'the' smartest idea for this stage of machine learning workings. we are there I guess. just push some buttons and get the model than use it why-ever you need it for. simple as that. somebody please take all acedemics away leave the stage for more practical people. I mean of course all academics deserve to be appreciated and we do but it is time for ordinary people. I don't want to understand every tiny detail about how machine learning works.. this tools seem helpful and it is free and it uses the the most convenient tool we have: browser! I say wow.
A thousand times yes! :) This is my entire bet with this project: People are gate-keeping ML usage behind "you need to first understand the math and take a course in statistics", yet no one would ask you to "understand how compilers and CPUs work" before writing a simple mobile app.
Knowing the underlying math and engineering is of course still useful for those that want to go the extra step (squeezing out .5% accuracy, or training a model on 1TB of data), but for all normal people out there ML should be as simple as loading up a webpage and dropping your data.
Great product - I love that you can actually use it immediately. Will give it a try. Some comments below though.
There's a whole ecosystem of tools now being built around explainability, bias-awareness, and more given that your models can only be as good as your datasets. I'd go as far as arguing that building models isn't the hard part, it's instead building really good datasets. I come from a computer-vision background but I think this definitely applies to all types of tabular / numerical data as well. How do you see a tool like yours playing into that realization?
Thanks! Ease of use (e.g. no lengthy signups / waiting minutes for cloud jobs to schedule) is one of the main features, so I'm happy you appreciate that!
RE:bias/explainability, that's indeed the main argument against asking people to use ML as a black-box.
I think fully understanding the very nuanced biases that can sneak in data (e.g. selectivity bias from some events being more represented in records, let's say) does require a keen sense for data, which an automated tool likely cannot provide. My bet is that this is not enough of a reason to block people from at least dipping their toe in the ML world, and that we can do more education down the line (e.g. about evaluating systems in real-life, to at least catch underperformance before looking for its source) to solve these cases.
On explainability, some basic tools (e.g. feature importance) are in the roadmap, I hope to get to it in this quarter!
RE:creating datasets being the hardest part of the job, not modeling... well, I think you're 100% right. And that's a tougher nut to crack.
One thing I'm planning to do to help here is to provide a number of "templates", e.g. concrete use cases that people can piggy back on. e.g. explain to realtors that they can estimate house prices by creating a spreadsheet with features A,B,C and D. I can't do this for every imaginable use case, but I hope this is enough to at least inspire people on how to think about data and ML.
We ourselves use BigML and AutoML from google, and we have engineers in our team who can do TF. It's just less of a pain in the ass, and the models are very reliable.
I'll give this a try, though I'm not sure what strategy it's using (I'm assuming an ensemble?)
Also you're asking the user if the objective should be classification or regression, I think that should be automatic, it's already buzzwords for your typical marketing person :)
BigML and AutoML are great! And I myself write a lot of custom TF code.
I see this as addressing a different stage (rapid/cheap prototyping) and audience (less technical people, smaller enterprises); it'll never be as powerful & state-of-the-art as advanced ML tools by cloud providers, and that's OK :)
> I'll give this a try, though I'm not sure what strategy it's using (I'm assuming an ensemble?)
It's embarrassingly simple right now: a fully-connected DNN with a static architecture (hand-picked because it worked fine for all datasets I tried), but you can also enable ensembling and/or change the network's architecture (just click on "Show advanced settings")
> Also you're asking the user if the objective should be classification or regression, I think that should be automatic, it's already buzzwords for your typical marketing person :)
Good point, but it's not always something that can be determined automatically. I did add some heuristics to warn people when we think they are wrong (e.g. running classification on a numeric variable with too many unique values, usually the sign it should be a regression problem instead).
One thing that might help here is that the redesigned flow shows a step-by-step instructions where one step is to pick the task (regression vs classification), and we'll add some text there to help people understand what the difference is. I'll also consider the heuristic approach going forward! (and make sure it doesn't override cases where users already manually set the task)
Try to use either of those solutions, and you'll be hit with a "Schedule a call/demo" wall, have to chat with their sales team, etc.
You can train a model in <30s with ML Console. Two clicks: 1 to pick a dataset, 2nd to click on "train a model". No wait time or sign-up required.
There's a very good reason for this: ML Console uses a very different approach, as all computations (data pre-processing, model training, etc.) is written in Javascript and runs in the client.
This makes the app orders of magnitude more responsive than any competitor (and hosting it is so cheap that I'm offering it for free during this beta phase, something these services wouldn't be able to afford).
With that being said, cloud-based solutions like these will always have their place when you need to train on terabytes of data, or squeeze out 1% higher accuracy with some state-of-the-art models.
Existing no-code ML solutions are quite expensive and way too complex for small companies: long sign-up flow, require you to read 3 tutorials and go through 26 steps, spin up cloud servers, learn a lot of jargon... and the UX is often optimized for technical people, not digital marketers or Shopify seller.
ML Console takes a fresh shot at this problem: 100% client-side ML training. This is enabled by modern web technologies (WASM, WebGL), which allow us to process data and train models with a minor performance overhead.
AFAIK, this is the fastest way to train an ML model compared to all other solutions (cloud-based or not):
1) No lengthy boilerplate code for every new project. 2) No downloads. 3) No sign-ups or credit-card checks. 4) No lengthy tutorials. 5) No need to spin-up cloud instances and share data back-and-forth between the client and the backend.
Best of all, this means we never see your data, as all computations run locally. This also allows us to be cheaper than all our competitors => for free :) we'll provide a subscription service offering more advanced features later on.
We're still very early stage, so any feedback would be greatly appreciated!