Google creating a new language? To shut it down in 3 years? Swift will be around in 15 years. It would be Microsofts F#, or Googles Dart all over again. It's a monumental task to create a language that people want to use, and an even bigger task to create tools around it (IDEs, language-specifications, cross platform frameworks, package management).
I know this is a hot take but... I doubt Google has the capability to be frank. They created Dart and Go (a.k.a generics ain't necessary). They created Tensorflow 1 which is totally different from Tensorflow 2.
Swift may not be the best but Swift is starting to become such a large part of Apple so it will have backing no matter the internal politics.
The language is not where the battle will be, it will be the tooling.
I would disagree, first Go has been tremendously successful and yes will have generics soon. The ML community really needs a better language than Python, the current alternatives (Julia, Nim, R) are alright but seem to miss the mark in this arena. I see few data scientists excited about Swift, its too heavy handed and deeply embedded in the Apple iOS community.
People are searching for a better language in this space and it's something that often needs a corporate backing. Google is aware of this problem and hired Chris Latner to fix it, its just a bit of unfortunate oversight, I guess we'll keep using Python for now.
Nim is a languge that has good performance and I had good experience porting an enterprise Python application to Nim (for performance gain). For a new user the risk obviously is the newness of Nim but the Nim team was very helpful and prompt whenever I posted a question. Its a very complete and surprisingly issue-less language.
Hopefully Arraymancer will help increase its reach, I wish the implementors all the best.
I don't know that the ML community necessarily _needs_ a better language than Python for scripting ML model training. Python is decent for scripting, a lot of people are pretty happy with it. Model training scripts are pretty short anyway, so whatever language you write it in, it's just a few function calls. Most of the work is in cleaning and feature engineering the data up front.
Perhaps a more interesting question is whether the ML community needs a better language than C++ for _implementing_ ML packages. TensorFlow, PyTorch, CNTK, ONNX, all this stuff is implemented in C++ with Python bindings and wrappers. If there was a better language for implementing the learning routines, could it help narrow the divide between the software engineers who build the tools, and the data scientists who use them?
I think the ML community really needs a better language than Python but not because of the ML part, that works really good, its because of the Data Engineering part (which is 80-90% of most projects) where python really struggles for being slow and not having true parallelism (multiprocessing is suboptimal).
That said I love Python as a language, but if it doesn't fix its issues, on the (very) long run its inevitable the data science community will move to a better solution. Python 4 should focus 100% of JIT compilation.
I've found it generally best to push as much of that data prep work down to the database layer, as you possibly can. For small/medium datasets that usually means doing it in SQL, for larger data it may mean using Hadoop/Spark tools to scale horizontally.
I really try to take advantage of the database to avoid ever having to munge very large CSVs in pandas. So like 80-90% of my work is done in query languages in a database, the remaining 10-20% is in Python (or sometimes R) once my data is cooked down to a small enough size to easily fit in local RAM. If the data is still too big, I will just sample it.
It's an argument that Python being slow / single-threaded isn't the biggest problem with Python in data engineering. The biggest problem is the need to process data that doesn't fit in RAM on any single machine. So you need on-disk data structures and algorithms that can process them efficiently. If your strategy for data engineering is to load whole CSV files into RAM, replacing Python with a faster language will raise your vertical scaling limit a bit, but beyond a certain scale it won't help anymore and you'll have to switch to a distributed processing model anyway.
Can you get things done in python/c++ sure, but the two language problem is a well known issue, and python has a number of problems. People certainly want a better option, and google investing as much as they did validates that notion.
Yes, so to me, the key question is not whether Swift can replace Python's role, but whether it can replace C++'s role, and thereby also making Python's role unnecessary and solving the two-language problem in the process.
I think we can all agree that C++ is a dragon that needs to be slain here. Swift could potentially get close to that for most of the needs, but I still wouldn't bet data scientists would write swift.
As a data scientist, most of my projects have been individual--I'm generally the only person writing and reading my code. No one tells me which language I have to use. Python and R are the most popular, and I use either one depending on which has better packages for the task at hand. I don't use Julia because I don't see enough of a benefit to switching at this point. But I really don't care, they're just tools, and I will use any language, Julia, Swift, whatever, if I see enough of a benefit to learning it. I would just take a day or two and learn enough of it to write my scripts in it.
So I think that's the good news--because of the more independent nature of the work, you generally can win data scientists over to a new language one at a time, you don't necessarily need to win over an entire organization at once.
Getting a company or a large open-source project to switch from C++ to Swift or Rust or whatever, seems much harder.
Ideally they'd get behind a strict subset of typed python that could be compiled the same way that cython is. Numba, PyTorch JIT and Jax are already handling a decent chunk of the language.
RPython is not intended for humans to write programs in, it's for implementing interpreters. If you're after a faster Python, you should use PyPy not RPython.
Numba gives you JIT compilation annotations for parallel vector operations--it's a little bit like OpenMP for Python, in a way.
I just look forward to have a proper JIT as part of regular Python, as PyPy still seems to be an underdog, and JIT research for dynamic languages on GraalVM and OpenJ9 seems more focused on Ruby, hence why I kind of hope that Julia puts some pressure into the eco-system.
For what it's worth, I think learning Julia would be a fantastic investment. I think it made me a much better programmer because it substantially lowers the barrier between 'developers' and 'users'.
I also don't think I could ever go back to using a langue that doesn't have multiple dispatch, and I don't think any language out there has a comparable out-of-the-box REPL experience.
Julia is really nice the only problems are 1) its focus is it being used from the REPL. While you can use a .jl script from the cli it feels wrong because of 2) its lack of a good AOT compilation/runtime/static binary option. Its JIT compiler is really good but you pay a price on startup and first runs hence Julians usually just have long running REPLs they don't close. 3) the ecosystem is still immature there are some amazing parts but also a lot of empty or poor parts still.
If I'm a person who wants to do some data science or whatever and I have very little software background I want there to be libraries that do basically everything I ever want to do and those libraries need to be very easy to support. I want to be able to Google every single error message I ever see, no matter how trivial or language specific, and find a reasonable explanation. I also want the environment to work more or less out of the box (admittedly, python has botched this one since so many machines now have a python2 and a python3 install).
Julia punches well above it's weight in libraries, especially for datascience and has the best online community I've ever been a part of. Googling an error in julia definitely won't give you nearly as many StackOverflow hits, but the community Discourse, Slack and Zulip channels are amazingly responsive and helpful.
I think a big advantage of Julia is that it has a unusually high ratio of domain experts to newbies, and those domain experts are very helpful caring people. It's quite easy to get tailored, detailed personalized help from someone.
This advantage will probably fade as the community grows, but at least for now, it's fantastic.
I've a lot more experience with Julia than any other language (and am a huge fan/am heavily invested). My #2 is R, which has a much more basic type system than Julia.
So -- as I don't have experience with languages with much richer type systems like Rust or Haskell -- it's hard to imagine what's missing, or conceive of tools other than a hammer.
Mind elaborating (or pointing me to a post or article explaining the point)?
I found multiple dispatch to be odd at first, but after adapting my mindset a bit I really like it. It makes it really easy to just drop your functions specialized for your types into existing libraries, for example. It's a win for code reuse.
What do you mean by "types are too shallow"?
Yes, jit can be slow to boot, but I think this is an area they're going to be focusing on.
"the tooling is poor" Not sure I agree here. I think it's great that I can easily see various stages from LLVM IR to x86 asm of a function if I want to.
Boot times still aren't ideal, but I find it takes about .1 seconds to launch a julia repl now. First time to plot is still a bit painful due to JIT overhead, but that's coming down very aggressively (there will be a big improvement in 1.5, the next release with differential compilation levels for different modules), and we now have PackageCompiler.jl for bundling packages into your Sysimage so they don't need to be recompiled every time to you reboot julia.
I also think the tooling is quite strong, we have ana amazingly powerful type system and I would classify discovering multiple dispatch as a religious experience.
> I see few data scientists excited about Swift, its too heavy handed and deeply embedded in the Apple iOS community.
Which is unfortunate, because it would probably be the best language if it were controlled by a non-profit foundation like Python. As it stands it's basically unusable.
Why do you think Swift would be the best language? I am doing a lot of C#, and so far have not seen anything in Swift, that would make it feel better. In fact, at this moment even Java is doing leaps forward, so will quickly catch up on syntax.
And C# and Java have a benefit of JIT VM by default, meaning you only build once for all platforms, unless you need AOT for whatever rare reason (which they also have).
I'd say the culture is very, very different. Java/C#-heads are in love with OOP and create layers on layers everywhere, hiding as much state and methods as they can (you can't use this operator, you'll shoot yourself in the foot!) and rarely doing pure functions. It's just a long way from how math works.
Not saying it wouldn't work, it definitely would, but I think I'd rather switch profession than deal with Maven and Eclipse in 2020.
Swift culture is more about having non-mutable structs that in turn is extended via extensions and heavy use of copy on write when mutable structs are needed. It's a small difference but it's there.
I fail to see how culture is related to the language.
You have a weird notion of mutable by default in either Java or .NET. The former is notorious for builder patter because of that exact reason. Does Swift have special syntax for copy + update like F#: { someStruct with X = 10 }?
Never had problems with Maven. How is Swift different?
People have not been using Eclipse much for a while. There is IntelliJ IDEA for Java and Resharper for C#.
I might be wrong but as I've understood it Builder Pattern is mostly used as a solution to mitigate mutable state from being accidentally shared. Which is duct taping around the complexity instead of removing it.
I don't really know why but the coding patterns (what I call culture) that are popular for each language are very, very different even when they can support the same feature-set.
My understanding is C#'s behavior around structs is identical, except it is not called "copy-on-write". From what I see this behavior is identical to copy-always from language spec standpoint of view. If an actual copy is created is down to the code generator, but semantics is the same.
Because JS wasn't designed for scientific computing like R and Julia are. Best case scenario is that you reimplement all the libraries Python has, but then you're just replacing Python with another generic scripting language instead of a language built for that purpose. Why would data scientists bother switching to JS when Python already has those libraries, and Julia and R have better numeric and data analysis support baked in?
And if Python, Julia and R don't cut it, then there's no reason to think another scripting language would. Instead you'd be looking at a statically typed and compile language with excellent support for parallelism.
JavaScript is a mess of a language in general. But even if that was not true, it is definitely not designed for numerical computation, nor symbolic manipulation, nor high performance, nor reliability.
Going from Python to JavaScript is a step backwards, not forward.
Feels like there's too much impedance mismatch, as integers aren't a first-class citizen in JS. You can use array buffers, but... I imagine you would want precise control over numerical representations everywhere to fully auto-differentiate code.
I tested this to a certain extent and its not a toy. Its well thought out product from a very talented team, and has the ease of coding that we love about javascript. It can run on browsers!
This being said, we should note the strengths of a statically compiled language with the ease of installation and deployment like with Go, Rust, Nim, etc. in enterprise scale numerical computing.
I have similar thoughts. I think that Typescript could allow for a lot of the "bolt-on" type checking that I find appealing with static languages and most of these are just interfaces to the same C/C++ framework, so no reason you couldn't create typescript bindings..
I have to agree. Google suffers from a really bad case of ADHD when it comes to creating anything for consumption by outside developers...there is a long list and Dart and GWT are just two that stand out because there are large codebases out there that suffered because of these decisions.
Frankly, I'm surprised that Go has made it this far - I mean, it's a great language, I get it, but Google is fickle when it comes to things like this.
Dart's relevancy is heavily dependent on Flutter's success.
Seeing how Chrome team pushes for PWAs, Android team woke up and is now delivering JetPack Composer, and we still don't know what is going to happen with Fuchsia, the question remains.
What makes you think Dart and Go have been unsuccessful?
Both are relatively young languages with rapidly growing adoption. Dart was on a downward trend but has seen a rejuvenation in the last few years thanks to flutter.
I was looking into Dart the other day because I guess Google is insisting on using it for Flutter and… it's someone in 2011's idea of a better JavaScript, except it's worse than modern ES2020 JavaScript? Why would anyone prefer Dart to modern JS/TypeScript? It's just not as good.
I just was looking at their code samples on the website, and I was really unimpressed with it. Why learn a new language if there’s nothing distinctive or better about it, you know? It’s just a better ES5 and a worse TypeScript.
I know this is a hot take but... I doubt Google has the capability to be frank. They created Dart and Go (a.k.a generics ain't necessary). They created Tensorflow 1 which is totally different from Tensorflow 2.
Swift may not be the best but Swift is starting to become such a large part of Apple so it will have backing no matter the internal politics.
The language is not where the battle will be, it will be the tooling.