I would encourage not directly using the user-uploaded images. But uploading dir...

JimDabell · on May 4, 2024

> For example you should almost certianlly be stripping geo location. But in general I would recommend stripping everything non-essential.

Including animation in most cases. Otherwise somebody can use a single frame with very long duration that will be reviewed by moderators, then follow that frame with a different frame containing objectionable material which will eventually be shown to people.

joshstrange · on May 4, 2024

Wow, that's something I had never considered. I did run into a bug with someone uploading a gif to our servers and our resize script spitting out a file per resized gif frame (base_1.png, base_2.png, ...) that I had to fix but I'd never have thought of your example. We just took the first gif frame in our case, which would have been safe from this thankfully.

qingcharles · on May 4, 2024

I often find you can upload animated GIFs to sites that don't allow them by just renaming them to PNG first.

tetris11 · on May 4, 2024

> recent WebP vulnerability

For anyone wondering:

https://blog.cloudflare.com/uncovering-the-hidden-webp-vulne...

> The vulnerability allows an attacker to create a malformed WebP image file that makes libwebp write data beyond the buffer memory allocated to the image decoder. By writing past the legal bounds of the buffer, it is possible to modify sensitive data in memory, eventually leading to execution of the attacker's code.

chefkd · on May 4, 2024

I'm not the sharpest tool in the shed just learning python right now can the wizards explain how this would work? Isn't data stored in memory super random so the attacker would need to upload an image multiple times to get sensitive data? Or does the webp image itself have code that will be run and can retrieve the data?

dmoy · on May 5, 2024

Generally with a buffer overrun, you're trying to put executable code in, and get it to a spot where it'll be run, allowing you to do further stuff.

Here is a canonical old article on one type of buffer overflow:

http://www.phrack.org/issues/49/14.html#article

(The webp one, iirc, is not a stack buffer overflow, but rather a heap, but that'll give you a general sense of what's going on, even if that article is a bit ... dated)

mmsc · on May 4, 2024

>Re-encoding the image is a good idea to make it harder to distribute exploits. For example imaging the recent WebP vulnerability.

And now your server which is doing re-encoding is pwned. Gotta segment the server doing that somehow, or use pledge, capsicum, or seccomp I guess.

dpkirchner · on May 4, 2024

This is a pretty good use case for serverless endpoints, assuming the volume is pretty low.

hamandcheese · on May 4, 2024

Serverless + making sure the serverless function doesn't have any privileges. More often than not I see people use serverless in the name of security, and then give the function write access to prod resources.

cyberpunk · on May 4, 2024

Ideal use case for a cheeky OpenBSD machine...

Not that OpenBSD is actually unhackable or anything, but I doubt many attackers would guess you're running imagemagick on OpenBSD in your image pipeline.

I rather like it for such use cases; it has the added benefit that it never, ever seems to die. I found a 6.0 machine I setup doing some kind of risky Kafka processing that had an uptime of 6 years the other week (since migrated).

Moru · on May 4, 2024

I had to reboot one of our servers last year, also just over 6 years. Reboot because physical move to another server hotel, not because it wasn't working :-)

It is running imagemagic to optimize images, create different resolutions and reencode them. It's only open for us to upload manually from our customers though, they can't upload themselves. Input anything, output jpg, very easy to use.

Hamuko · on May 4, 2024

I've ran into a security issue where a serverless function had pretty large range of AWS access and a pentester was able to utilise that.

dpkirchner · on May 4, 2024

That's a bad use case for serverless :)

afiori · on May 4, 2024

For video you might want to use the GPU, but for images this sounds like a good use case for Wasm

remram · on May 5, 2024

Or WebAssembly: https://hacks.mozilla.org/2020/02/securing-firefox-with-weba...

forgotusername6 · on May 4, 2024

Draw image to canvas in the browser. Read image from canvas. Upload. If you are completely paranoid you could then upload raw pixel data only and construct whatever image format you wanted server side.

samus · on May 4, 2024

The user could push the file to the endpoint directly without using the client-side functionality.

gopher_space · on May 5, 2024

I think the idea is that a browser needs to do this conversion to display images, so you'd be writing a http client that will run server-side and process each uploaded file as if it were a http response. It'd need to be transmissible via screenshot at this point, right?

samus · on May 5, 2024

By running a web browser on the server you are at square one again: running the encoder on the server, which is risky.

gopher_space · on May 5, 2024

In this case isn't the browser decoding the image and drawing a bitmap? That sounds more like a critical vulnerability in e.g. Chrome.

In my mind the chain of events looks like: - Alice uploads an image file to Bob's file server - Charlie's image converter server invokes a local browser with the address of the file on Bob's server, resulting in a .bmp (or .png? a little out of my element here) saved to Charlie's server. - Doris can now pull images from Charlie's server knowing they've been vetted by a major browser.

Does that make sense?

samus · on May 5, 2024

Yes, which outsources the problem to Charlie. Charlie has to firewall and lockdown that browser as much as possible to reduce the risk of Alice pwning Charlie, else Charlie's server could (among other things) be made to taint all output files. Browsers are not magically immune, they also just call decoder libraries.

beeboobaa3 · on May 4, 2024

Did you forget to never trust the client? Please tell me you haven't built any products using this philosophy.

Arch485 · on May 4, 2024

It's not a terrible idea... This doesn't "trust" the client, it just interprets the data that the client sent as an array of pixel values. In a memory safe language (e.g. JS, C#, Go, Rust, ...), that would make it basically impossible to pwn: the worst thing an attacker could do is upload an arbitrary image.

cqqxo4zV46cp · on May 5, 2024

My sibling comments are the terrible sort of ‘instead of applying your own logic, blindly and sometimes incorrectly pattern match against a checklist of ‘best practices’’ internet lectures. Image processing exploits almost always exist at the decompression stage. Once you’re processing a bitmap of pixels, there is a whole lot less that can go wrong. Having decompression happen at the client has an obvious performance impact. Ignoring that though, given that it implicitly involves changing the server-side API, it’s not “trusting the client”. It very clearly offloads a bunch of risk. The underlying premise is that raw image data is harder to shove an exploit in. People are so quick to lecture others before thinking twice.

forgotusername6 · on May 5, 2024

Thanks for one of the only rational replies to this thread. I often hold back from commenting at all on HN as the replies are so often full of low effort non-constructive criticisms.

beeboobaa3 · on May 5, 2024

Cool you have a bitmap. Now what? You're going to distribute all that child porn people enclosed in the bitmap (byte array) that doesn't render as a valid image?

smsm42 · on May 4, 2024

It's just not secure - anything you do on the client can be trvially circumvented.

TylerE · on May 4, 2024

The imaging handling libraries I. Those langs are almost all written in C/C++. If you’re just wrapping ffmpeg or imagemagick or libpng you’re not really protected from much of anything.

False security, if anything.

cyberpunk · on May 4, 2024

The image is still being posted somewhere, right? What guarantee do you have that it was your wasm blob doing the post vs some j33t haxxors curl command from his kali vm?

remram · on May 5, 2024

So you're just making your own image format with no compression?

A 1080p image with 8 bits per channel would be 6 MB. Real mobile-friendly...

SkyPuncher · on May 4, 2024

We run these services on isolated machines.

hypeatei · on May 4, 2024

How isolated? If it's an image processing service then it can't live in a vacuum and needs to talk to something else at a certain point.

SkyPuncher · on May 5, 2024

Only accept in-bound requests. Don't provided it with any special credentials.

Someone might still be able to get an RCE on them and burn a bit of money, but they certainly wouldn't be able to move laterally.

vitro · on May 4, 2024

> 3. Generating different sizes as you mentioned can be useful.

For years now I use nginx's image filter [1] which handles file resizing quite nicely. Resized images are cached by nginx. For some usecases it works very vell and I no longer need to specify sizes beforehand, you just ask for the size by crafting your url properly.

[1] https://nginx.org/en/docs/http/ngx_http_image_filter_module....

TylerE · on May 4, 2024

How does that scale if? Say, image access is relatively random and your data set exceeds server ram by a couple orders of magnitude?

vitro · on May 5, 2024

Cannot say, scaled images are saved and served from the filesystem, not kept in ram. One possible solution would be to use CDN.

hju22_-3 · on May 4, 2024

Do also note that re-encoding can also be used as part of the exploit. E.g. Team Fortress 2 recently had one that exploited a similar system.

kevincox · on May 4, 2024

I don't think the exploit in that case was re-encoding. What happened is an image with very large dimensions was uploaded. When this was decoded into a raw pixel buffer on the client it used tons of memory. It was effectively a zip bomb attack.

In fact re-encoding probably would have solved this as the server could enforce the expected dimensions and rescale or reject the image.

toast0 · on May 4, 2024

There'a been explpots in image, video, and audio codecs... Which is why it's important to protect your users, but also your servers...

Best to sandbox/jail/etc as tightly as possible, and limit the codecs to only what you need. You can configure the ffmpeg builds pretty granularly... default will include too much.

beeboobaa3 · on May 4, 2024

I care more about protecting my servers than my users. If one user attacks another that's not really my problem, blame can easily be shifted to the browser. But if someone hacks my servers and leaks everyone's data, that's my problem.

So not encoding is probably the safer way to go for the business.

mthoms · on May 4, 2024

Yikes.

What if the other user getting attacked is you or another admin on your team?

Now the attacker has admin access and can compromise your servers and “leak everyone’s data” just fine.

I don’t think you’ve thought this through.

bawolff · on May 4, 2024

Trust me, your website will always get blamed regardless of its a user fault.

If websites get blamed when users reuse passwords, they are definitely getting blamed if you distribute malicious files.

theendisney · on May 5, 2024

You can think that but you should never say or write it. :p

beeboobaa3 · on May 5, 2024

It was just a joke, your honor.

theendisney · on May 5, 2024

I got the picture.

sdsd · on May 4, 2024

>Re-encoding the image is a good idea to make it harder to distribute exploits.

Famously, the Dangerous Kitten hacking toolset was distributed with the classic zip-in-a-jpeg technique, because imageboards used to not re-encode images.

https://web.archive.org/web/20120322025258/http://partyvan.i...

cpeterso · on May 4, 2024

Didn’t some photo gallery service (Google Photos or Takeout?) make the news because they re-encoded users’ images but didn’t preserve the original files? People who relied on the service as a safe backup lost their original quality images. So in some cases, you may want to re-encode/optimize uploaded images for display but also archive the original files somewhere.

cqqxo4zV46cp · on May 5, 2024

It obviously depends on the use case.

busymom0 · on May 4, 2024

> Can you create a size limit on the pre-signed URL

Yes, pre-signed URL can have the `Content-Length` set and amazon S3 checks it. However, note that this is true for Amazon S3 but not for others like BackBlaze or R2. Last time I tried, BackBlaze didn't support it.

gehen88 · on May 4, 2024

Only with createPresignedPost, not with getSignedUrl.

perpil · on May 4, 2024

Presigned post lets you set content-length-range: https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-authen... You can specify content length on presigned PUTs, but it needs to be set as a header and added to the signed headers for SIGv4. It can't be set as a query param.

smackeyacky · on May 4, 2024

This is good advice. Just pick a resize dimension and try to resize everything that comes in. If it fails it's not an image.

You can hang an event off S3 and have a lambda that does the work / warns you of a bad upload.

kijin · on May 4, 2024

Resizing also helps reduce costs. The latest phones can generate ridiculously large images with 200+ megapixels. You really don't want to dump that kind of behemoth in your S3 bucket and serve it as somebody's profile pic.

Videos will add even more to your AWS bill if you're not careful. Re-encode that 4K cat video as soon as it comes in, or wire up a CDN to do it for you.

perpil · on May 4, 2024

> Can you create a size limit on the pre-signed URL?

Yes, if you use the POST method you can set the content-length-range property in your presigned URL form inputs to limit min and max bytes. https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-authen...

smsm42 · on May 4, 2024

There's also an old bug where browsers re-interpreted image as HTML (even with correct MIME type set) and this allowed to host exploits on user-upload sites. Not sure if any modern browser still has this problem, but it used to be a concern. Recoding the image usually broke those exploits. Though it also could break some metadata - e.g. if you go from JPEG to PNG you could lose EXIF data.

Frost1x · on May 5, 2024

I’m pretty sure, if memory serves me, some of the older versions of IE (Internet Explorer, I imagine some people here may be young enough to not be aware of that) used to let you pass any sort of file hosted over HTTP with arbitrary extension set (e.g. ‘.jpeg’) however it would actually do some MIME typing and execute the file based on the actual content, so you could literally set a .exe with a jpeg file extension at the time and IE would let you run arbitrary executables on Windows users systems at the time with full permission of the user running it (which was pretty much often admin). You could also do other file types but an executable was the most glaring example of how bad the issue was. Obviously you could embed this in HTML as well…

blincoln · on May 5, 2024

This still generally works with SVGs.

badrabbit · on May 4, 2024

You can tell a lot from the exif metadata of images so that's one reason (user privacy) to always re-encode images.

arrowsmith · on May 4, 2024

Sounds complicated. Any recommendations for existing libraries I can use to handle this?

JimDabell · on May 4, 2024

https://www.libvips.org

RobotToaster · on May 4, 2024

I know lemmy uses pict-rs https://git.asonix.dog/asonix/pict-rs/

bufferoverflow · on May 5, 2024

> Re-encoding the image is a good idea to make it harder to distribute exploits

Which immediately makes you vulnerable to zip-bomb attacks.

If you want to be more safe, you first have to do all kinds of checks of the image headers.

deadbabe · on May 5, 2024

#1 sounds interesting. Any examples of this being used?