The fastest way to detect a vowel in a string

ok_dad · 2025-06-13T19:14:59 1749842099

> The regex methods are unbelievably fast regardless of string length.

I immediately pinned regex as the winner, and here is why:

In Python, you can almost always count on specialized functionality in the stdlib to be faster than any Python code you write, because most of it has probably been optimized in CPython by now.

Second, I have to ask, why would someone think that regular expressions, a tool specifically designed to search strings, would be slower than any other tool? Of course it's going to be the fastest! (At least in a language like Python.)

Kranar · 2025-06-13T19:19:55 1749842395

But it's not the fastest, it's actually incredibly slow and in general regexs are very slow. The key insight is that string processing in Python is very slow, and that regex here outperforms everything else because it's the only approach that is implemented in C.

With that insight it should follow that using another implementation in C should outperform even the regex, and indeed the following simple Python method that this article for whatever reason ignored vastly outperforms everything:

    def contains_vowel_find(s):
        for c in "aeiouAEIOU":
            if s.find(c) != -1:
                return True
        return False

That's because s.find(c) is implemented in C.

In my benchmark this approach is 10 times faster than using a regex:

https://gist.github.com/kranar/24323e81ea1c34fb56aff621f6c09...

a_e_k · 2025-06-14T04:07:24 1749874044

Playing around, I found that the generator approach was competitive with this if you permuted the loop nest, and often even slightly faster (at the 1000 character length):

    def any_gen_perm(s):
        return any(c in s for c in "aeiouAEIOU")

I think the crux is that you want the inner loop inside the fast C-implemented primitive to be the one iterating over the longer string, and to leave the outer loop in Python to iterate over the shorter string. With both my version and yours, the Python loop only iterates and calls into the C search loop 10 times, so there's less interpreter overhead.

I suspect that permuting the loop nest in similar variations will also see a good speed up, and indeed trying just now:

    def loop_in_perm(s):
        for c in "aeiouAEIOU":
            if c in s:
                return True
        return False

seems to give the fastest result yet. (Around twice as fast as the permuted generator expression and your find implementation on my machine, with 100 and 1000 character strings.)

IAmBroom · 2025-06-16T19:44:28 1750103068

For English samples, this should prove faster:

     def loop_in_perm(s):
        for c in "eaoiuEAOIU":
            if c in s:
                return True
        return False

a_e_k · 2025-06-26T20:40:55 1750970455

Oh, that's a nice optimization!

azhenley · 2025-06-14T19:06:18 1749927978

This is beautiful!!! Adding it to the end of the post right now.

jonstewart · 2025-06-13T20:03:58 1749845038

The more you learn about regexes, the more you learn that you can’t make general statements about their performance. Performance is contingent upon the engine, its algorithms, its implementation, the patterns, and the input.

tylerhou · 2025-06-14T09:35:37 1749893737

> in general regexs are very slow

I really don't think this is true. If you assume that the string is ASCII, a well-optimized regex for a pattern of this type should be a tight loop over a few instructions that loads the next state from a small array. The small array should fit completely in cache as well. This is basically branchless. I expect that you could process 1 character per cycle on modern CPUs.

If the string is short (~100 characters or less, guessing), I expect this implementation to outperform the find() implementation by far as find() almost certainly will incur at least one more branch mispredict than the regex. For longer strings, it depends on the data, as the branchless regex implementation will scan through the whole string, so find() will be faster if there is an vowel early on in the string. Find still might be faster even if there are no vowels; the exact time in that case depends on microarchitecture.

For non-ASCII, things are a bit trickier, but one can construct a state machine that is not too much larger.

Kranar · 2025-06-14T17:46:55 1749923215

Sure, if you make a bunch of assumptions and manually implement how you think a regex will compile your code, allowing the optimizer to take your compile time implementation and make a finely tuned algorithm specifically for one use case, you can make something that outperforms what is kind of a dumb algorithm.

But if you use an actual regex the way people actually use them, using either the one provided by their standard library or one that is readily available then regex is pretty slow. Certainly, in principle a regex is neither fast or slow, it's a declarative description of a set of strings. Any claim about its performance rests on particular implementations.

For example you wrote out a benchmark that presumably is a lot faster than a naive search... but notice you didn't use the standard library <regex>, instead you manually implemented an algorithm to perform a search and then just decided that this is what a regex implementation would have done anyways (ignoring that by implementing it directly in source code, the optimizer can then fine tune it).

So now... there is a 10% chance that you genuinely didn't know that C++ has a <regex> library that you could have used, or you did in fact know that C++ has such a library and you chose not to you use it because... you also know it's very slow.

I did the benchmark myself using the standard library regex, boost regex, and PCRE2 regex, and all of them are about 5-15x slower than the simple loop:

https://gist.github.com/kranar/a3187cba00d57ba630b74b84f09b4...

tylerhou · 2025-06-15T03:01:06 1749956466

> Sure, if you make a bunch of assumptions and manually implement how you think a regex will compile your code, allowing the optimizer to take your compile time implementation and make a finely tuned algorithm specifically for one use case,

But that's not what is happening. The only thing that the optimizer does is unrolls the loop that advances the state. Otherwise, the sequence of instructions is standard: two loads and some pointer arithmetic. There is no finely tuned algorithm. https://godbolt.org/z/fqdb5bssc

> But if you use an actual regex the way people actually use them, using either the one provided by their standard library or one that is readily available then regex is pretty slow.

Yes, most regex implementations are not designed to be efficient, they are designed for ergonomics (including supporting exponential-time features like lookaround). (Both boost::regex and PCRE2 support exponential-time features; std::regex is just not optimized at all.) This is well-known. https://swtch.com/~rsc/regexp/regexp1.html. That's why I said "well-optimized regex." Most regexes, in practice, are not well-optimized!

Maybe this is a semantic distinction between what you and I consider a "regex." I interpreted "regex" as how people in CS theory understand it; i.e., a finite automata that decides whether a string is in a given language. I think that's perfectly reasonable here, as the given problem is finding the fastest algorithm that decides whether a string is in the language generated by ASCII characters without the vowels. I think it's reasonable for me to use a (not hand-optimized) implementation that obeys that definition in my comparison.

Kranar · 2025-06-16T03:04:19 1750043059

>The only thing that the optimizer does is unrolls the loop that advances the state.

The idea that you'd dismiss loop unrolling as some kind of non-issue is incredibly baffling, you need only compare the benchmark built without optimizations enabled to the one with optimizations enabled, and you'll see that the unoptimized build is about half as fast!

Speaking of Python the most recent Python release has some significant performance increases in some scenarios because of some additional loop unrolling that has been exploited.

>That's why I said "well-optimized regex." Most regexes, in practice, are not well-optimized!

So then what exactly is the point of your argument? I point out that using regex's are slow, which almost anyone would reasonably assume means that using regex libraries tend to be slow.

Is the entire point of your comment that you can take a regex and then manually implement a special purpose algorithm for it that isn't slow? Was that the point of your comment, because if so you could have made that clear initially and saved both of us a lot of time.

>I think that's perfectly reasonable here, as the given problem is finding the fastest algorithm that decides whether a string is in the language generated by ASCII characters without the vowels.

No it's not reasonable at all because if you take your definition then there is nothing to use. Being a regular language is not something you can "use" or "not use", it's a property of a set of strings, you don't get to use it. The language consisting of all strings that have at least one vowel in them is a regular language, case closed, usage has nothing to do with it. In the sense that you're using it, any algorithm whatsoever that returns true or false for a string that matches some arbitrary regular language can be considered an implementation of a regex, for some fixed regular language. But of course this is a completely trivial argument which is why there's absolutely no point in discussing it.

In the non-trivial sense that everyone else uses it... regex's are libraries that people use that let them write a string representing a pattern and then the library returns an object that can be used to test whether or not a string matches that pattern, or can be used to search for a substring that matches a pattern along with a host of functionality. The claim in my original post is that these libraries, including the one used by the blog post, tend to be slow. They are, as you point out, designed for ergonomics and convenience, not for performance.

It is not at all reasonable to argue that some arbitrary implementation you divined for a particular regular language constitutes some argument that regular expressions are a high performance means of performing string matching, but at least you have clarified your position and shown it to be absolutely trivial and meaningless, and hopefully anyone who has read your post won't be misled into thinking you meant that regex libraries that are commonly used by actual engineers like Python's, or PCRE2, or boost or the standard library are fast... but I have my doubts about that.

tylerhou · 2025-06-14T11:50:49 1749901849

Putting benchmarks where my mouth is: https://github.com/tylerhou/benchmarks/blob/main/vowels-benc...

Writing implementations in C++, using a reasonable encoding of how an efficient regex compiler might compile the vowel regex, the regex implementation outperforms the loop-based implementations significantly in all cases except for long strings with vowels on M1 Max MBP.

Strings were generated randomly following the regex [0-9A-Za-z]. Short strings were between 5-20 chars. Long strings were 5000 chars.

The speedups are:

- Short strings & with vowels: regex is 2.6x faster

- Short strings & no vowels: regex is 5.9x faster

- Long strings & with vowels: loop is 329x faster. This is expected, as because the regex implementation is branchless, it must always search through the whole string.

- Long strings & no vowels: regex is 1.5x faster.

If you add an early return to the regex implementation, the regex with early return becomes strictly faster than the loop versions. The early return is slower than the no early return for the short strings & no vowels case because the extra branch has a cost. Full outputs in a comment at the bottom of the linked file.

SleepyMyroslav · 2025-06-14T12:08:46 1749902926

Why do you loop over haystack multiple times though? If you iterate over long string once and write fixed loop with vowels checks in way that will be friendly to autovectorize optimization it might be faster and more idiomatic C or C++.

tylerhou · 2025-06-14T12:25:35 1749903935

Thanks for reminding me, I meant to also benchmark the interchanged version. I updated the file above with the new benchmark results.

The original commenter called `find()` once per vowel, so that's why I benchmarked the regex against the less-idiomatic code.

The interchanged version (loop over haystack outside, over vowels inside) is (mildly) faster than all the regex versions except for short strings & no vowels.

One thing to note is that all the loop versions are not easily generalizable to non-ASCII strings, while the regex version is fairly easily.

SleepyMyroslav · 2025-06-14T16:46:21 1749919581

I poked that loop in compiler explorer for few minutes and I think its allergic to autovectorization. Lets assume that properly vectorized regexp from proper language in the future wins :)

azhenley · 2025-06-13T20:02:29 1749844949

Very nice find (pun intended). I added an update at the end with your solution and a link to this comment.

ok_dad · 2025-06-13T21:16:35 1749849395

I've always found regexes to do a string search faster than other methods, but there is always more to learn! Thanks for the lesson today.

arp242 · 2025-06-13T20:05:50 1749845150

> Second, I have to ask, why would someone think that regular expressions, a tool specifically designed to search strings, would be slower than any other tool? Of course it's going to be the fastest! (At least in a language like Python.)

I know that in Go doing string operations "manually" is almost always faster than regexps. In a quick check, about 10 times faster for this case (~120ns vs. ~1150ns for 100 chars where the last is a vowel).

Of course Python is not Go, but I wouldn't actually expect simple loops in Python to be that much slower – going from "10x faster" to "2x slower" is quite a jump.

Perhaps "yeah duh obvious" if you're familiar with Python and its performance characteristics, but many people aren't. Or at least I'm not. Based on my background, I wouldn't automatically expect it.

silisili · 2025-06-13T21:01:22 1749848482

It's worth noting here that Go has a notoriously slow regexp implementation.

high_na_euv · 2025-06-13T19:47:08 1749844028

>Second, I have to ask, why would someone think that regular expressions, a tool specifically designed to search strings, would be slower than any other tool? Of course it's going to be the fastest! (At least in a language like Python.)

Flexibly, search string flexibly.

Why you think that we cannot achieve faster search in non flexible fashion?

glangdale · 2025-06-14T00:20:27 1749860427

To paraphrase Arthur Dent, "this must be a strange new usage of the word 'fastest' with which I am not previously familiar".

The fastest way to detect a vowel in a string on any reasonable architecture (Intel or AMD equipped with SIMD of some kind) is using 3-4 instructions which will process 16/32/64 (depends on SIMD length) bytes at once. Obviously access to these will require using a Python library that exposes SIMD.

Leaving SIMD aside, a flat byte array of size 256 will outperform a bitmap since it's always faster to look up bytes in an array than bits in a bitmap, and the size is trivial.

windward · 2025-06-26T13:52:36 1750945956

I don't think the size is trivial, it's 4 times the cache length on x64

glangdale · 2025-07-01T08:03:33 1751357013

I'm not sure what you mean here. A single cache line is 64B, and this table would thus occupy 4 cache lines, but typical x86 cache sizes are 32K or for more recent cores 48K. Whether consuming 1/512th or 1/768th of your level 1 cache is excessive is a value judgement, but most people wouldn't think so.

Rendello · 2025-06-14T03:27:08 1749871628

A few years ago I came across the article "SIMD-friendly algorithms for substring searching". The "Generic SIMD" section into and example is small and quite understandable. I modified it to implement an LZ77 sliding window in Zig.*

http://0x80.pl/notesen/2016-11-28-simd-strfind.html

* Or rather, I tried my best. I burnt out on that project because I kept jumping back and forth between making a proper DEFLATE implementation or something bespoke. The SIMD stuff was really tough and once I got it "working", I figured I got all I needed from the project and let it go.

https://github.com/rendello/compressor/blob/dev/src/str_matc...

DannyB2 · 2025-06-13T19:12:41 1749841961

Assume use of 8 bit characters. Declare a constant 256 entry array pre-filled with all False except for the five (or six) vowel characters. This is baked into the code and not initialized at runtime.

Now for each character c in the input string, simply do an array index and see if it is true (a vowel) or not. This avoids either five conditionals, or a loop over the string 'aeiou'. The vowel test is constant time regardless of the character value.

masklinn · 2025-06-14T06:35:54 1749882954

In Python this is going to be slow, as you’ll have a ton of Python code in your hot loop.

It’s also going to be even more broken than TFA if any non-ascii character is present in the string.

absurdo · 2025-06-14T14:56:51 1749913011

Waiting for a few follow-ups:

“What every programmer should know about vowels”

“You and your vowels”

“What the vowels have wrought”

“We’re hiring! Vowels (YC 26)”

And last but not least:

“Things I Won’t Work With: Vowels”

haiku2077 · 2025-06-14T19:20:28 1749928828

"Falsehoods programmers believe about vowels"

layer8 · 2025-06-13T18:18:48 1749838728

This doesn’t even cover strings like “fly” and “naïve”. ;)

cenamus · 2025-06-13T18:41:49 1749840109

Yeah, should be called "how to detect aeiou in a string".

zerocrates · 2025-06-13T18:43:52 1749840232

well... it'll work for naïve. twice over, even

s09dfhks · 2025-06-13T18:49:40 1749840580

What am I missing about fly

ninkendo · 2025-06-13T18:57:07 1749841027

The article thinks aeiou are all the vowels, and forgets about y.

SwiftyBug · 2025-06-13T19:13:40 1749842020

Is y considered a vowel in English? In my native language it is not.

horsawlarway · 2025-06-13T19:20:42 1749842442

Sometimes.

Which is actually part of why I clicked into the article - I expected it to get into the complexity of trying to detect if 'y' was a vowel as part of the search, and instead got a mostly banal python text search article.

You can see the technical rules for when 'Y' is a vowel in english here:

https://www.merriam-webster.com/grammar/why-y-is-sometimes-a...

Y is considered to be a vowel if…

The word has no other vowel: gym, my.

The letter is at the end of a word or syllable: candy, deny, bicycle, acrylic.

The letter is in the middle of a syllable: system, borborygmus.

lcnPylGDnU4H9OF · 2025-06-13T19:49:09 1749844149

I dunno about those rules; there are exceptions. Yttrium. Yggdrasil. Probably others, but those from off the top of my head.

I think the best way to define when Y is a vowel is when it's not a consonant. Basically, if you make the sound that Y represents in the word "yes", it's a consonant. Otherwise, it's a vowel. (At least, no exceptions come to mind.)

DougN7 · 2025-06-14T05:56:57 1749880617

Wow, how would you code that?!

lcnPylGDnU4H9OF · 2025-06-14T13:05:43 1749906343

Heh, good question. I’d probably use a heuristic like what’s discussed in this thread and hope that it works. :D

layer8 · 2025-06-13T19:23:04 1749842584

It depends on the word, and possibly even on dialect. There isn’t a 1:1 mapping between characters and vowels. Vowels are fundamentally a phonetic concept, not a graphemic one.

This is also why Unicode doesn’t have a “vowel” character property. Otherwise you could use a regex like `\p{Vowel}`.

csb6 · 2025-06-14T00:18:25 1749860305

Yes, this is true. Many people seem to think of words as written text and not as spoken words, so they focus on analyzing the characters is a word and not how it sounds.

I had a linguistics professor say something like “Writing is parasitic on speech”

mystified5016 · 2025-06-13T19:56:37 1749844597

Y and W are sometimes used as vowels. They are technically consonants, but in some cases produce a vowel sound.

The rule in English is not "all true words contain a vowel" it's that all words contain a vowel sound.

Except the ones that don't, because English is a very messy language.

IAmBroom · 2025-06-16T19:50:48 1750103448

Hmm...

nemomarx · 2025-06-13T19:20:07 1749842407

The list of vowels in American English is "A E I O U and sometimes Y" as taught to school children. It would be a vowel in Ypsilanti for example.

xigoi · 2025-06-14T06:39:40 1749883180

Letters are not vowels; sounds are vowels. The letter Y sometimes represents a vowel sound and sometimes a consonant sound.

tough · 2025-06-13T19:17:17 1749842237

same, i is a vowel, y is a consonant

TIL: When y forms a diphthong—two vowel sounds joined in one syllable to form one speech sound, such as the "oy" in toy, "ay" in day, and "ey" in monkey—it is also regarded as a vowel. Typically, y represents a consonant when it starts off a word or syllable, as in yard, lawyer, or beyond.

adrian_b · 2025-06-14T07:17:48 1749885468

In English it is pretty much unpredictable whether Y is a vowel or a consonant.

While for older English words there is a complex set of rules mentioned by another poster for determining whether Y is a vowel, as mentioned by yet another poster, English also includes more recent borrowings from languages with other spelling rules for Y.

At its origin, Y was a vowel, not a consonant. It was added to the Latin alphabet for writing the front rounded vowel that is written "ü" in German, "u" in French or "y" in Scandinavian languages.

It is very unfortunate that in English, and in some other languages that have followed English, Y has been reassigned to write consonant "i". This has created a lot of problems due to the mismatches between the spelling rules of different languages. The rule that is most consistent with the older usage would have been to use J for consonant "i", like in German and other languages inspired by it. However in many Romance languages the pronunciation of consonant "i" has changed in time, leading to other 3 phonetic values for the letter J, like in English (i.e. Old French), like in French/Portuguese and like in Spanish.

So the result is that both for Y and for J there are great differences in pronunciation between the European languages, and the many words using such letters that have been borrowed between languages create a lot of complexity in spelling rules.

tough · 2025-06-14T07:31:18 1749886278

Thank you for the extended explanation, really interesting stuff!

holycrapwhodat · 2025-06-13T19:22:50 1749842570

https://www.youtube.com/watch?v=1_it0G2KcrM

mystified5016 · 2025-06-13T20:44:11 1749847451

Things don't have to be complicated or pretty to be fast or good. For a UTF-8 or ASCII (English) string:

  for(char c in string)
    if(c & 1 == 0)
      continue;
    switch(c & 0x1F)
      case(a & 0x1F)
        return true; //a or A
      case(e & 0x1F)
        return true; //e or E
      case(i & 0x1F)
        return true; //i or I
      case(o & 0x1F)
        return true; //o or O
      case(u & 0x1F)
        return true; //u or U
      default
        continue;

Checking for consonants is about as free as it gets. 50-70% of characters in English text are consonants. By checking one bit, you can eliminate that many checks across the whole string. This should also more or less apply to any text encoding; this technique comes from an artifact of the alphabet itself. It just so happens that all English vowels (including Y and W) fall on odd indices within the alphabet.

Characters aren't magic! They're just numbers. You can do math and binary tricks on them just like you would with any other primitive type. Instead of thinking about finding letters within a string, sometimes you get better answers by asking "how do I find one of a set of numbers within this array of numbers?". It seems to me that a lot of programmers consider these to be entirely disjoint problems. But then again, I'm an embedded programmer and as far as I'm concerned characters are only ever 8 bits wide. String problems are numeric problems for me.

While I don't want to discourage people from exploring problem spaces, do understand that the problem space of ASCII has been trodden to the bedrock. Many problems like "does this string contain a vowel" have been optimally solved for decades. Your explorations should include looking at how we solved these problems in the 20th century, because those solutions are likely still extremely relevant.

pjz · 2025-06-13T21:04:43 1749848683

The problem with the 'characters are just numbers' approach is that they're not _just_ numbers.. with the advent of unicode, they're _sequences of numbers_... so bytes thus can mean different things when part of a a sequence than when standalone.

That said, since they're numbers, we should use the most efficient checks for them... which are likely vectorized SIMD assembly instructions particular to your hardware. And which I've seen no one mention.

glangdale · 2025-06-14T00:55:31 1749862531

Yes, that's it. Vectorized SIMD annihilates this problem, a space I've been working in since 2006 and it wasn't all that new even then. A close second would be a heavily optimized (pipelined and less branchy) table or bitvector lookup. Doing anything that involves lots of control flow, like the grandparent post, will be slow as a wet week with our without bit manipulation tricks due to the inherently unpredictable nature of the branches (subject to our input).

guluarte · 2025-06-13T19:13:20 1749842000

need to add a bloom filter

okibry · 2025-06-14T03:27:14 1749871634

LOL. I found that take time to find amazon affiliate link on this blog is so funny.

luma · 2025-06-13T18:50:42 1749840642

If your approach to "the fastest way to do x" starts with writing some python, you're already not taking the assignment seriously.

tines · 2025-06-13T19:08:42 1749841722

I think it’s reasonable to use algorithmic complexity as a definition for “fast”.

kiru_io · 2025-06-13T19:08:59 1749841739

Can you elaborate or show a better way?

fedsocpuppet · 2025-06-13T20:22:43 1749846163

   julia> using Random, BenchmarkTools
   julia> function isvowel(c)
            idx = (c | 0x20) - Int('a')
            return (0x00104111 & (1 << idx)) != 0
         end

   julia> hasvowel(str) = any(c -> isvowel(Int(c)), str)

   julia> @btime hasvowel(s) setup=(s=randstring("bcdfghjklmnpqrstvwxyzBCDFGHJKLMNPQRSTVWXYZ0123456789", 10))
   15.739 ns (0 allocations: 0 bytes)
   false

some 77 thousand times faster

camel-cdr · 2025-06-13T21:13:35 1749849215

The usual way to do this in SIMD is via vpermb(lut, str) == str, or potentially slightly faster vpshufb(str>>1, lut) == str.

str|0x32 if you want case-insensitive.

This works because the lowest five bits, more specifically bits two to five, of the vowels are distinct.

The lut is zeroed except for the indices corresponding to the vowels, which contain it self: lut[vowel&31]=vowel or lut[(vowel&31)>>1]=vowel

teo_zero · 2025-06-14T07:20:06 1749885606

> The lut is zeroed except for the indices corresponding to the vowels, which contain it self

So in the comparison vs. the original str, vowels get a true and consonants get a false. Doesn't the "==" then returns true if all the chars are vowels?

camel-cdr · 2025-06-14T09:20:30 1749892830

I used the C operators as short cuts to denote SIMD operations. So the == would be element wise and return a mask of the vowels in the input vector.

ninkendo · 2025-06-13T20:31:24 1749846684

I was curious if there were any interesting bit-masking patterns in ASCII for vowels that could be exploited, so I shamelessly asked ChatGPT and it gave a pretty nice insight:

- Setting bit 5 high forces lowercase for the input letter

- masking with `& 31` gives an index from 0-25 for the input letter

Then you can the `bt` instruction (in x86_64) to bit-test against the set of bits for a,e,i,o,u (after lowercasing) and return whether it matches, in a single instruction.

It came up with this, which I thought was pretty nice: https://godbolt.org/z/KjMdz99be

I'm sure there's other cool ways to test multiple vowels at once using AVX2 or AVX-512, I didn't really get that far. I just thought the bit-test trick was pretty sweet.

Chat transcript is here (it failed pretty spectacularly the first couple times, tripping over AT&T syntax and getting an off-by-one error, but still pretty good) https://chatgpt.com/share/684c8b39-a9c4-8012-8bb6-74e1f8b6d0...

mystified5016 · 2025-06-13T20:54:32 1749848072

See my other comment.

You can do extremely trivial binary checks on ASCII, UTF-8 (and most? other) encoding schemes. All vowels including W and Y contain a 1 in the lowest bit. Then by comparing bits 1-4, you can do a trivial case-insensitive comparison. You detect case by checking bit 5. 0 is upper, 1 is lower.

rurban · 2025-06-14T10:59:18 1749898758

How does that fly with foreign scripts? It certainly won't work with Greek, Arabic, Korean, Chinese, Japanese and the Indian languages.

IAmBroom · 2025-06-16T19:56:23 1750103783

Again, this isn't really about detecting vowels; it is exactly about detecting upper- and lower-case examples of the characters a, e, i, o, and u.

Language is not an issue; ø and é are not considered "vowels" by this article (nor is y).

rurban · 2025-06-16T20:44:12 1750106652

Sure, there is only ASCII, and nobody needs more. ø and é are not considered "vowels", ha!

tough · 2025-06-13T19:16:35 1749842195

Rust, Go, C?

dmoy · 2025-06-13T19:53:32 1749844412

Or even Lua if you don't want to go to a statically compiled typed language?

yongjik · 2025-06-13T20:16:50 1749845810

TBH any mainstream language (and most non-mainstream ones) will be better than Python at this. Python can be pretty useful, but it's not known for having reasonable performance.

jonstewart · 2025-06-13T20:02:34 1749844954

[flagged]

shruggedatlas · 2025-06-13T23:07:38 1749856058

This comment seems needlessly unkind

azhenley · 2025-06-13T20:10:17 1749845417

We are hiring.

kazinator · 2025-06-13T20:05:38 1749845138

Idea: let's nerd-snipe one of those, and then you can have their office.