> In talking to another person in the industry dealing with the issue, they noted that technically 2020 had 53 weeks, and this is the 53rd week. As such, the suspect Sony data file issue might actually be tied to that complexity.
Wouldn't be surprised
On a positive note, this is the first year in a row of 4 where I didn't see CI errors related to the year currently being 2021 but the 'week-year' still being 2020.
(Did find one recent test that hardcoded 2020 instead of just taking the current year though, but at least that doesn't take any further investigation)
Relatedly, I had my Android phone set to the en-US locale. I noticed Calendar was showing 2020 had 52 weeks and all week numbers of 2021 were off by one. I changed to en-UK locale and the issue got resolved.
Which begs the question: what on Earth week numbering system the en-US locale was using? I am not aware of any US-specific week numbering system, so it would seem logical for the en-US locale to use the ISO system.
> In US weeks more of W53/2020 lies in 2020 than in ISO weeks
Your error is in assuming every week is uniquely identified as a week-year, but that's only true in ISO week numbering.
In ISO, every week-year is exclusive and whichever week contains January 4th is the first of the year, so 2020-12-28/2021-01-03 is 2020-W53 and 2021-01-04/2021-01-10 is 2021-W01
But in the US whichever week contains Jan 1st is the first… and whichever week contains December 31st is the last of the previous year. Meaning the same week can be both X-W53 and X+1-W01 (aka the US system has overlapping / partial weeks at both start and end of year).
This is exactly the case this year (and most years, really), the week from December 27th (Sunday) to January 2nd (Saturday) contains both January 1st and December 31st, meaning it's both the last week of 2020 and the first week of 2021.
Right, I figured something like this had to be in play. But by whom and where is this US week numbering system actually defined?
I can see it is probably implemented in Java by java.time.temporal.WeekFields#SUNDAY_START, but there is no reference to any standard that actually defines it in the documentation. Just this (oddly worded) claim: "This week definition is in use in the US and other European countries."
> But by whom and where is this US week numbering system actually defined?
I don't think there is any formal definition, it's just customary. AFAIK americans don't really use week numbering so it's not really a thing. OTOH ISO defines a very strict week-based calendaring system.
Anyway SUNDAY_START defines that weeks start on sunday and the first week-of-year needs only contain a single day, so I expect you can never get the "overlapping" information from the JDK, it just considers that W01 is whichever week contains 01-01 and forgets about the overlapping last week of the previous year, if any.
So, we have the en-US locale returning arbitrary week numbers in a system that is not really well-defined or even used by anybody - in lieu of using the one and only standardized week numbering system. If a person asks a computer for a "week number 23 of 2021", the only sensible solution in any culture I think would be to use the ISO system. Only locale dependent thing should be on which "ISO week number" the Sunday falls.
My point is the "customary US system" seems to actually have nothing to do with week numbering - or has anyone actually heard someone say "23rd week of 2021" or such? This system seems to only make sense when talking about ordinal weeks in and around the New Year, meaning terms like "first week of 2021", "second week of 2021" or "last week of 2020". But those are ambiguous, even in cultures that use ISO weeks.
In my country ISO weeks are used extensively in business, but if you were talking about "first week of 2021" (ordinal), I would think you're talking about the week on which January 1 falls - same as the "US customary" system. But if you said "week 1 of 2021" (cardinal) I would understand immediately it as the ISO week 1 of 2021, which I know from experience is not always the same as "first week of 2021".
My experience in the US is that week numbering is used principally by finance to keep track of weekly operations, which is usually only payroll. So it makes perfect sense to me that the week containing Jan. 1 should always be week 1 - in fact otherwise is rather unintuitive. This results in some confusion around the new year but this is kind of an intrinsic problem with weekly payroll and is one of the reasons that payroll closing tends to be a specific and somewhat complex process, as the year totals are calculated more or less independently from the pay schedule.
Outside of payroll, I have seen very little use of week numbers for any purpose in the US. Far more common to write "the week of Jan. 18" where the date given is typically, but not always, the Sunday or Monday in that week. Week numbers are not displayed on most calendars to most people would be pretty frustrated if you told them a week number.
> If a person asks a computer for a "week number 23 of 2021", the only sensible solution in any culture I think would be to use the ISO system.
No, because it wouldn't match american the expectations of people not using ISO calendars, despite and regardless of weeknumbers not being routinely used by them.
It is an argument that maybe weeknumbers should not be a thing / available in all locales, but not that they should be renumbered to match an other locale, regardless of that other locale making more sense.
But ... haven't we established that the en-US locale week numbering system is not really based on anything sensible. So how the user can have any sensible expectation of what they get as a reply when they ask for a week number?
I tried to research the glibc version control history of how it came to be like this. I think it's completely arbitrary, and at times en-US did use ISO 8601 weeks.
> the year currently being 2021 but the 'week-year' still being 2020
Strava seems to have a similar problem. When I pull up leaderboard data for a segment I've just done, "this month" and "this year" and "all time" show my result but "this week" doesn't. Checking now for three segments, each "this week" only includes the part of the week that was in December.
BTW, for any other Strava users here, be aware that it's also pretty stupid about years in which you go from one age group to another (in my case from 45-54 to 55-64). AFAICT only your annual CR gets considered for leaderboards, and if that's before your birthday then it'll act as though you've never even done that segment in your new age group.
Wouldn't be surprised
On a positive note, this is the first year in a row of 4 where I didn't see CI errors related to the year currently being 2021 but the 'week-year' still being 2020.
(Did find one recent test that hardcoded 2020 instead of just taking the current year though, but at least that doesn't take any further investigation)