iev.ee | ian erik varatalu

i'm a phd candidate at taltech in estonia, working at the high-assurance software laboratory. my research is mostly about automata theory and string algorithms, with a focus on high-performance regex engines, but also anything else to do with text and streams. if you need to search through your email, process tons of documentation, or find that one file from your computer, i'm your guy (!). i've been looking for more industrial applications to my research post phd, so feel free to reach out if you are looking for consulting or collaboration on anything related to my work.

research and projects

i work with my advisors Juhan Ernits (taltech) and Margus Veanes (microsoft research) on Derivative-Based Symbolic Automata, an incremental approach to hard problems by pushing heavy computation into lazy step-by-step unwinding of automata - i.e., solving only as much as needed, when needed. we've applied this to regex matching, model checking, and string constraint solving, with promising results.

the main project i've been working on for the past few years is RE#, a high-performance regex engine in F# and Rust. RE# supports extended regex features (lookarounds, intersection, complement) while being fast and memory efficient thanks to derivative-based algorithms and optimizations. some discoveries we made with RE# helped improve the performance of the .NET NonBacktracking regex engine, originally developed by my advisor Margus Veanes, as well as inspire other projects, like the LLM tokenizer for the AICI project at microsoft (the thing that helps constrain LLM outputs).

i'm a big fan of F# and functional programming in practice, and i'm constantly working on various smaller projects and libraries in my spare time, which you can find on my github. some highlights include fflat, a tiny native compiler for F#, built on top of bflat, a plugin for obsidian, a note taking app i use daily. i enjoy microoptimizing things and do a lot of benchmarking both in research and other projects, so if you like reading assembly or want 'the fastest possible' implementations of something, you might find something interesting through my repos.

papers

interests

automata theory and formal languages
functional programming (F#)
compilers and systems programming (Rust)
minimalism and cybersecurity
linux, nixos, embedded systems
dogs, coffee, lifting?

presentations

some day i'll put some presentations here

ian erik varatalu

research and projects

papers

interests

presentations

recent posts

what 262,715 regex questions on stack overflow haven't answered (part 2)

what 262,715 regex questions on stack overflow haven't answered

finding all regex matches has always been O(n²). even in the engines built to prevent it

symbolic derivatives and the rust rewrite of RE#