It is amazing to me how one little bad file absolutely fucked shit up worldwide. I cannot wait to see the post mortem on this one, and I really think they're going to have to make it public given the widespread effects. Also - quite the lesson in "maybe monopolies aren't great!" (A lesson we will absolutely NOT learn.)
As we all wondered how this got out into the wild like it did, this post from Mastodon summed it up so well:
speaking as a tester; the test team were promised a week, then had it shortened to 3 days and ended up with three hours left to test.
After 2 hours of testing someone started the roll out.
Yep, I have 110% BEEN THERE. Nothing like going balls to the wall to get a test series done because someone needs the file yesterday, finishing up, and discovering the file had been sent out hours before. I also have to wonder if they are trying to use AI to design and execute tests.
I got lucky and did not get the Blue Screen Of Death on my work laptop, but our network was pretty hosed most of the day. Around 1PM, our "critical systems" were back online and that is how I discovered that the tools and systems I need to actually do my job are not considered critical, and still weren't online when I bounced at 3PM.
Why they did not just say "Look, if you're not getting the BSOD and don't need IT for that, but all your other shit is broken and being worked on, just call it a day" is beyond me. I basically got paid to sit and stare lovingly at my cats all day. (Oh no.)
And the thing is - once I hit the halfway mark of the day, there was no way I was going to be able to get back into "work mode" even if they did get things back online before I left - my brain is GREAT at getting into work mode as soon as I put on my headphones and open my work laptop, but staying there is dependent on actually having work to do. The work mode ship sailed around noon.
One thing I realized last night - they had sent out emails saying "If you're getting the BSOD, try rebooting, and call IT" - but if you're getting the BSOD, you're not going to be able to get to your email...
I had a hair appointment yesterday and realized that I had no idea if their payment systems would be working and that my "power outage cash stash*" was empty. Everything was broken at work, so I popped out to see if the ATMs were working (thankfully they were) and decided to hit up Starbucks as long as I was out.
Oh my gawd, the poor folks working at Starbucks. I did an mobile order and thought "great, things are working for them!" My first hint that something was amiss should have been that instead of saying "Your order will be ready at 8:45" it just said "Your order will be ready soon" - I just wrote that off to the fact that they recently upgraded their time estimate bit, and new things can glitch.
Oh no, not a glitch. "Soon" is how the app says "Fuck if we know!" There was a 10-15 minute delay in orders going from the app to the stores. The printers weren't working, so they were having to transcribe the orders off the store iPads, and orders were just getting eaten by the system. (Mine was!) Plus I am sure they had extra volume from folks like me going "Welp, can't work, might as well make a Starbucks run." Bless them, they were working SO HARD and I was happy to see that later the online ordering was just shut off for their sake. (Same crew today and I left them a BIG tip, which I should have done yesterday, but I flaked big time on that.)
Seems like lots of things are back online today, which is good. I suspect the affected airlines are going to be messed up for a couple days more, because all the planes are in the wrong places.
Huge, HUGE props to all the IT folks all over the place trying to fix this. I hope you got paid overtime for it, or got some kind of good reward. And I hope everyone is sending invoices to Crowdstrike for this mess.
* A takeaway from the 2012 derecho. So many places could only take cash for days. The cash stash has been appropriately replenished.
No comments:
Post a Comment