The Summer Research Institute (SuRI) is a premier venue to discuss recent research results. Each year we invite the top faculty in a given topic to present their research in front of an interested crowd of students, faculty, researchers, fellows from EPFL, the Vaud area, Switzerland, and abroad.
This year, our focus was security, systems, and formal methods. This distinct system stint allowed us to explore upcoming problems in consensus, reliability, distributed systems, verification, and cyber defense. Over two days we heard from ten amazing speakers complemented by mentoring and poster sessions and a neat walking tour through Lausanne.
Natacha Crooks: Scaling Decentralized Trust with Databases
Natacha started with a background on byzantine fault tolerance and how it applies to distributed and shared ledgers. Basil, the fault tolerance system from her group consists of three core components: concurrency control to detect concurrent conflicting operations on replicas, the atomic commit component that ensures consistency across replicas and shards and the fallback component that ensures byzantine participants cannot stall honest clients. Following the description of Basil, Natacha introduced Pesto which runs a query engine on top of Basil that runs complex SQL benchmarks without code modifications at 3-5x better performance than previous approaches. Interestingly, they demonstrated that the system is only 10-35% slower than classic systems proving great performance. Fault tolerance matters and append-only ledgers are just databases.
Aurojit Panda: Runtime Verification of Distributed Systems
We have protocol specifications but how do we prove the correctness of a given implementation? Key is that one needs to specify the protocol in a way that aids the search. A key finding is that static analysis can enable lightweight runtime verification that does not affect failure properties and has low runtime overhead. The system cannot detect bugs outside of the specification or performance bugs, detectable bugs require visibility of the bug at runtime and instantiatability. The biggest challenge remains the lack of specification and corresponding unawareness of what it means for a program to be correct.
Caroline Trippel: Design and Formal Verification of Hardware-Software Security Contracts
As it turns out, modern hardware is incredibly complex and side channels exist. Sadly I missed part of the talk due to lunch organization. Caroline presented several recent works from her group on enforcing security guarantees under different attacker models, especially also micro-architectural side channels.
Peter Pietzuch: Improving Cloud Security with Hardware Memory Capabilities
Clouds have performance/isolation trade-offs and require strong isolation between tenants. Workloads require efficient sharing between isolation domains for performance and to limit overhead. There's redundant memory contents at all levels. The goal of Peter's research therefore is to rethink the cloud stack with new hardware or custom-tailored hardware extensions.
One goal is to break down isolation barriers by enabling memory capabilities. That way, systems can avoid the MMU tax and reduce the cost of address translation. Another angle is the focus on de-privileging system software and small TCBs by breaking systems into smaller parts. The underlying goal is to minimize disruption to existing applications by accepting changes to the privileged systems.
The two papers that Peter presented were CAP-VMs from OSDI'22 where they go into details to enable strong isolation with fine-grained data sharing and ORC from OSDI'23 where they look at concrete memory deduplication with semantic object reuse.
Lesly-Ann Daniel: Hardware-software co-designs for microarchitectural security
Mitigations are facing a hard time under micro-architectural side channels. The complexity of modern processors opens up a large area of potential attack vectors. Lesly-Ann focused on recent works that address these side channels.
She presented ProSpeCT from Usenix SEC'23, a defense against SPECTRE that hardens constant code against SPECTRE attacks. The challenge of SPECTRE mitigations is that one must enumerate all possible branches that could be vulnerable to SPECTRE and harden them individually. For constant code, the software should not be responsible for tracking potential speculation. The goal is to prove that software is constant time and then use hardware-software contracts to ensure underlying guarantees.
After lessons learned about protecting against SPECTRE, Lesly-Ann focused on securing constant time execution through different compiler extensions. At a high level, Libra from CCS'24 balances executions for complex processors. The key idea is to balance execution that will leak the same set of secrets by going through the code in a carefully balanced way that orchestrates micro-architectural state carefully.
Vyas Sekar: Enabling AI-based Autonomous Security Operations
Due to the complexity, it is extremely challenging to evaluate the security of autonomous systems. As a case study, Vyas and his team implemented a mock environment for the Equifax system. Key challenge was to model the different defenses and detection systems which resulted in a complex configuration of mangled systems.
After discussing the design and MHBench system to evaluate the cyber range, Vyas talked about deception insights, novel LLM-based attackers, robust and stealthy attackers, and opportunities for LLM-based defenders.
In systems, there's only three novel ideas: use an abstraction layer, use a level of indirection, or create a cache.
Vyas discussed MHBench as a general platform to implement different strategies for attackers and deception based on static and dynamic properties and a variety of capabilities such as decoy hosts or files along with various attack strategies that mimic APTs and replicate real-world settings.
To evaluate the MHBench system, they used LLMs and existing pen testing queries to try to exfiltrate data from the network. Interestingly, LLMs were unable to do end-to-end multistage attacks. This was mostly due to the LLMs having to generate lots of coherent low level code and its inability to keep track of state at that level. But using high-level abstractions, they were able to guide the LLM to produce an end-to-end attack. Their tool Incalmo is open source and available to try.
Going forward, a key question will be how to leverage these results to scale up the evaluation and how to leverage this to test your systems.
Bozidar Radunovic: The future of mobile cellular connectivity is programmable
Cellular networks have become complex and run multiple layers of software on diverse hardware. One key difference is that for wireless, you can communicate with several clients at the same time due to separation in the frequency domain.
Bozidar went into details on how to build and deploy such a network, including running large parts on an open-source stack. Even though the systems are open-source, one of the challenges is that the code is also complex which makes it difficult to change individual aspects without breaking other tangential features.
To tackle this complexity, Bozidar introduced eBPF-based hooks that allow him to measure diverse properties of the network. Similar to the Linux kernel, these hooks allow introspection without requiring heavy changes of the underlying system. Using customizable hooks (codelets), they managed to analyze diverse interference attacks using these lightweight probes as a sample application. But many other use-cases are doable.
Looking at data, interestingly many base stations have very low utilization, including during peak times. There's an immense opportunity at saving energy when switching off idle cells and ensuring coverage through nearby cells. Challenge here is that RAN software has real-time deadlines (with 10s of ms), CPU load can experience spikes at ms granularity, and RAN software is a black box. This setup makes it challenging to optimize in this environment.
Key message: cellular networks are exciting, it's easy to build a network yourself. Small changes can have a huge impact and network programmability can enable next-gen apps!
Ivan De Oliveira Nunes: Provable Execution in Real-Time Embedded Systems
Embedded systems are used in diverse contexts, interact with the environment through sensors and signal status to other systems. The security properties on deeply embedded systems is challenging as code often runs bare-metal without any security mitigations. Ivan's key question is if it's possible to guarantee some level of service for embedded systems even if part of the embedded system is compromised.
Protecting embedded systems is hard as there is generally no virtual memory abstraction and all applications run, along with the OS, in the same address space. Newer embedded systems may allow some secure enclaves that share the same CPU but allows doing security critical operations.
In a nutshell, they allow potentially dangerous interferences to go through but log everything using a monitor in the trusted enclave. It only works because of the passive checks guaranteeing integrity (but not availability). A trusted verifier can then assess if the execution was correct.
Tianyin Xu: Software Reliability in Emerging Cloud Computing Paradigms
Tianyin gave a great overview and deep dive into software reliability. I sadly missed this talk due to the organization of the walking tour.
Ana Klimovic: Rethinking Cloud System Software and Abstractions for True Elasticity in the Cloud-Native Era
Last but not least, Ana talked about performance optimizations for cloud applications. She especially highlighted performance optimizations in her Dandelion system and how to further improve cloud workloads and systems. Similarly as for Tianyin, I sadly missed parts of her talk due to admin duties.
Concluding thoughts
Overall SuRI was a great success. We had over 100 attendees, 10 speakers, and 5 organizers attend amazing talks and ask lots of questions. We do hope that the discussions at SuRI will trigger lots of new collaborations and look forward to the next years!