sometime around april 2024. I thought quantum computing could be hugely impactful, so I started working on it. Later I realized this might not be the case. There's an algorithms crisis, and my prediction is quantum computers are probably going to end up like particle accelerators or gravitatonal wave detectors or huge telescopes and what not, essentially another device to probe physics and what not with.
quantum error correction is good for many things. From a practical perspective, it's necessary for building a quantum computer. From a theoretical perspective, there's many deep connections to physics and theoretical computer science and what not.
I refused to do dirty work for a grad student and whatever else boring stuff labs I talked to had in mind for me. So I went on my own.
One of the first papers and only opensource on using rl to discover quantum error correcting codes I found was by Jan, and I messed around with a bit and sent him an email. He replied, and a year later after I got my paper out I emailed him again and we chatted. He completely forgot about my first email and my existence. His work used circuit level representation, and he wrote up a custom jax simulator which was very fast. They briefly mentioned how one could constrain weight, but their method didn't work. The released a new paper on pi day which cited mine, and it's tied with a paper by my collaborator for being the first paper to cite mine. I'll count my collaborator as first, since for some reason Jan's paper doesnt show up in google scholar because they used some different bib style or something.
Anyways, I tried a ton of stuff and sometime in June 2024 I came across this weight reduction paper by hastings way back around 2016. I tried out a bunch of stuff in between with different representations from using the boundary maps in chain complexes derived from some simplicial complexes, plain bitflips, actions of the symplectic group on projections of maximal isotropic subspaces, and some other stuff. they all didn't work so I finally tried the weight reduction idea and all the results magically came out in the span of 3 weeks.
It's important to mention what we call the distance of a code is the minimum distance between all distinct codewords, and the number of codewords grows exponentially with code size. Previous approaches just focused on designing codes with good distance from scratch, and didn't considier weight constraints at all. The difficulties with designing a code with good distance from scratch were mostly on the reward design, they were either too sparse or needed too much space. Methods directly using distance as a reward got stuck around distance 9. There are huge differences between different distance n codes, and distance n+1 codes. Its really hard for a model to figure out how close it is to a distance n+1 code without going through every single codeword. I tried training a model to predict minimum distance for a given code using a dataset of a few hundred thousand codes I made, and it's accuracy was terrible. It makes sense since all it takes is one pair of codewords with a low distance to tank the distance of the whole code, and theres exponentially many codewords that it can go wrong on. There was a trend that might still be going about using neural networks to solve np hard problems. I think they churned out a few papers at neurips and probably the other venues, but my take is that they are terrible at solving things that are "delicate" and vary wildly from small differences/pertubations. To get around the sparse reward problem other approaches just stored every codeword. They got stuck at distance 5 and projected needing like 80gb of vram to design a distance 10 code (they still got published into a nature sister journal anyways though https://www.nature.com/articles/s41534-024-00920-y).
In my approach I completely revisted the objective and reward design. My approach worked by first taking an existing code with good distance then modifying it to reduce the weight (another property thats important for implementing error correcting codes in practice. theres a tension between having high distance and low weight. normally having one tends to come at compromise of the other.) It turned out that I didn't even have to try to design a code with good distance from scratch, and instead its a lot easier to take a code with good distance and bad weight, then reducing its weight. The rl agent learned quickly to avoid and undo moves that decrease distance. It effectively side stepped the whole difficulty with learnning to increase distance from sparse rewards or exponential space usage, by instead learning to not decrease distance from dense rewards. My approach could design codes with distances up to 40 on 16 cores with basically constant space, and also introduced weight constraints which people before didn't add yet because they couldn't even get distances more than 10 first. It felt a bit surreal how everybody got stuck on something which I could get better results at by just sidestepping it entirely and optimizing a different quantity while I was at it.
I was up until like 4am working everday during this time, so it was not that magical and more through shear force of will. I also had some feeling like I "cheated" on the problem since the results all came out quite straightforwardly after making some leaps in perspective. I suppose this type of feeling isn't very abnormal. Once you find something that is substantially better than other methods I guess it is normal to feel like you found some "unfair" advantage. The implementation part happened around late november and early december of 2024. First draft was ready mid december and I slowed down to write it up with my coauthor and put it on arxiv on feb 20 2025. It took doing an unfathomable amount of stupid things along the way to get it done, since I was on my own and without much guidance (not that I wanted any). My coauthor helped with the writing. I ideated and executed alone. After it all I've become very disapointed in the state of rl algorithms so I'm looking at work on learned optimizers and learned rl update rules, since I think a good way to learn smartly is to learn to learn. I doubt it'll be sucessful without copious amounts of compute and alchemy.