Archives for 2023
Experimenting With A Drawer
Yes, it could work, but I need to remove the hose garage from the vacuum first.
Corner Jigs
Abuse Chisels
Long Drawer of Push Blocks and Sticks
Drawer of Router Bits
Entrance Hall Drawer #1
Kitchen Drawer #1
How much cache you got on you?
Holy crap that’s a lot of requests per second. 40 million per second, more than 2 billion in a minute, on the test hardware.
Have managed to make the back-end never hit the database for any reads except during start-up, and all database writes go through a WAL queue rather than directly to the database.
The requests are a mixture of request/response objects (which contain DTOs and collections of DTOs in the response) and media such as images and short videos. I don’t know how close we were to maxing out a dual 100gbit NIC — it is some exotic dual 100gbit NIC with four 25gbit ports on each NIC from my understanding, not sure on that as I haven’t physically seen the hardware with my own eyes. But there was still plenty of bandwidth to spare, and the quite a bit of CPU to spare too.
Now I will admit, these are synthetic tests, i.e. traffic replay of earlier live sessions, coming from another machine, without a bunch of network hardware in between, so I suspect real-world production usage will be considerably lower.
Configuration is:
MS SQL -> WAL cache -> Kestrel -> In-Memory Query & Object Cache -> NGINX -> Apache Traffic Server
Technically the Kestrel server application is doing its own in-memory/in-process object and object collection caching so that I am not dumbly relying on external caches to do the heavy lifting for me. The Kestrel server application is effectively just a cache filler once it gets going and has answered the initial burst of queries and built internal collections. And the only time Kestrel gets queried (after that initial burst) is if something hasn’t migrated to a more forward position (higher tier?) cache.
Update:
I have since moved from synthetic replay tests to production requests and our numbers ticked up a bit due to the way we shape our traffic so we’re just cresting 42M RPS, or around 2B+ per minute on a single machine.
I cannot go in to a lot of detail on implementation. But I am very happy with the throughput.
There’s still a little headroom, but I suspect I am going to be hitting limitations of the OS, network stack and drivers at that point.
I feed everything through eight separate 25Gbit connections, though tests have shown that I could get far better results by feeding through 32 x 10Gbit connections. Results better than “yeah, but 320Gbit is more than 200Gbit” would suggest. This would be due to packet offload, PCIe bandwidth, RAM <=> NIC DMA constraints, etc, etc.
The majestic monolith literally handles everything, from media to database to server. There are two other containers for logging and housekeeping.
I focused on what can I get out of the hardware. I optimized at each layer, which was the low-hanging fruit, then optimized the more esoteric parts, e.g. read-once/WAL database access pattern, automatically denormalizing data once it enters a caching layer, storing cached data in different data structures depending on its usage, deduplicating (but not normalizing) data where it made sense, updating in-memory cached objects on database writes, deferring all database writes to post-processed batches (housekeeping container), making the client smarter so that it didn’t ask for complex joins and other operations that couldn’t be cached, and so on. Not trusting the client, but assuming the client was smart.
There were a lot of bottlenecks that I was able to kick out by being given carte blanche to “fix it, I don’t care what it takes, make it so we can hit reddit numbers.” And then finally just tuning the hell out of the OS, the Mellanox cards, and so on.
Organizational Obsession
Cave of memories
Years ago, 45 years ago, I cut my teeth (programming teeth) on what are now very old microprocessors. The MOS (later Commodore) 6502 (not so obscure), Zilog Z80 (not so obscure), the Commodore 6501 (kind of obscure) and the MOS 7600 (not so obscure).
There was a question on Hacker News a while back about the 7600 and I was sure it was a CPU I had written code for. I was a little fuzzy on the details, recalling how the paddle inputs worked from a different game console.
Single channel sound, four outputs to control an RF modulator, four joystick inputs and a light gun input. Back then, CPUs weren’t exactly what you called sophisticated! No interrupts, no NMI, no external RAM, no real internal RAM, two X & Y registers, I don’t think we even had indirect addressing, it was uphill, both ways, in the snow.
My memory is a little hazy, but I think the 7600 was legitimately the second CPU I learned to write code for, and probably the first CPU I got paid to write for. I recall it came in a few different editions, with hard-coded programs in an internal ROM and this is where my memory gets really bad because I am not sure if I am confusing the 7600 and another CPU, but one of those CPUs had this weird piggy back socket on it to which you could attach an EEPROM, or an ICE, and it would let you run your own code compared to what was built-in to the masked ROM on the microprocessor. There was also a machine that had the EEPROM window that you could wipe with a strong ultra-violet light, and again, I don’t recall if that was the 7600 or another CPU. I don’t recall if the ROM on the microprocessor of this special edition had anything in it, or it was the standard edition mask with a special package. I seem to recall the different 7600 models had different graphics built-in, and you picked the variant based on what game you were trying to create.
Being a pack rat, and prone to stashing away any useful paperwork that isn’t nailed down, I was digging around for some other reason at our storage locker and in one of my several boxes of memories located old programming books and datasheets I haven’t looked at in decades, and I located an article with schematics for building a game console around the 7600 Video Game Array microprocessor, a service manual for repairing the consoles, a few schematics from old game consoles that used the 7600 and a datasheet with pin outs.
The reason I got the job of working with the 7600, at such a tender young age, was because I worked for cheap, easily exploited in my youthful naivety, and had been hardware hacking on broken 1970’s games consoles of various stripes for about four or five years by that point. That summer holiday job in the middle of the school year lasted all of three months before my Dad said to the proprietor “You have to pay my son if you want to him to continue working for you.”
I never did get my money.
Contortionist
“So how tightly coupled is the code?” asked my colleague.
“Our architecture diagram is used as the illustration on page 73 of the Kama Sutra.” I replied.
“Where should we be?”
“Brexit.”