Can you outline what specific problems you have? I'm going to guess upgrade of Java runtime are not the issue, but upgrades of packages are? What kind of packages, an assorted list of cobbled together libraries or half the Internet's of BOMs like spring boot? And I'm going to take a leap and say you have not been keeping up to date for a few years at least?
> Senior roles are expected to be able to independently reach quality results without too much extra attention/help from their direct management or technical team leaders.
I'd add that in a number companies this is more of a mid-level role, with senior level having significant impact within the team and some impact outside (like driving initiatives involving other teams). It is difficult to show this after working solo for a while, and some companies might not know how to approach it and drop candidates with experience like that. I guess one angle to address this would be to show how you worked on longer-term projects and coordinating with founders/whoever.
There's sometimes also program managers who organize relevant projects across the company (for example with GDPR compliance a program manager might coordinate between legal and various products, identify risks and report progress to C-level, and each product will have a project kick off to get it into compliance (depending on the effort, the person overlooking this project and working with program might be a dedicated project manager, engineering manager, some engineer etc).
"The ideal candidate is somebody deeply familiar with the tech stack we are currently using, preferably having spent at least 6 months working on our code base."
Seriously though, the performance of an interviewee in such conditions are mostly a function of how well the interviewer can "onboard" the interviewee within 4 hours, or how much the interviewer is willing to actively guide the interviewee to the solution (as opposed to letting them figure it out).
Even with a single interviewer this easily introduces bias. I've interviewed many many people over the years, and sometimes I wonder whether I'm giving slightly more hints or less hints based on how subjectively I "like" the candidate. I mean, I try hard to be fair, but I don't follow a strict script, and the variability occasionally makes me doubt myself.
For such open ended tasks as 4 hours of essentially pair programming, I don't know how any interviewer could be objective and fair. Especially presuming that you wouldn't be re-using the same task once it's actually been solved/fixed... (otherwise that's just another artificial problem)
There are a lot more use cases in taxi and food delivery space than in video streaming. At least by an order of magnitude. Consider various user personas for one, legal considerations and so on. Technically each use case might be less demanding than video streaming, but overall much more complex.