Semester Project and Thesis

The SPRING lab offers project opportunities for BSc, MSc, and PhD students. We encourage interested students to have a look at the Thesis & Project Guidelines from the MLO lab, where you will gain an understanding about what can be expected of us and what we expect from students.

Last Update: 18th April 2024

How to apply

Please, apply via Google form (login may be required). You will need to specify which project(s) you are interested in, why you are interested, and if you have any relevant experience in this area.

External students, i.e., students who are not from EPFL nor ETHZ, should get in touch with the supervisors of the project(s) via email.

Applications are processed in two rounds. For each round, we collect applications before the deadline. Then, we will get back to selected applicants during the corresponding “First Contact From Supervisors” period. If we do not get back to you during the indicated period, it means that we probably do not have space any more.

We will make a mark on the project once it is taken. We strongly recommend that you apply as soon as possible for best consideration, since we expect most projects would be taken after the first round. However, we will leave the form open after the second round and consider all applications, if there are still available projects at that time.

Early deadline: 6th July 2024

First Contact From Supervisors: 8th July 2024 - 22nd July 2024

Late deadline: 23rd August 2024

First Contact From Supervisors: 26th August 2024 - 6th September 2024

If you encounter any technical issue, please get in touch with Saiid El Hajj Chehade.

Projects on Cryptography

CRYPTO1: A Library for Lattice Parameter Selection

Many constructions for post-quantum-secure cryptographic tools and privacy-enhancing technologies rely on hardness assumptions based on lattice problems. However, setting the parameters for the underlying lattice problem in order to achieve a desired level of security is non-trivial, and really only accessible to a handful of experts today.

To date, the best tool for lattice parameter selection is the lattice estimator [1]; however, this tool is written in SageMath, and cannot easily be used as-is in production-ready codebases. Additionally, it does cover important variants of standard assumptions (e.g., MSIS, vSIS).

As part of our effort to port lattice-based cryptographic primitives from research to practice, you will design and implement a Rust library for lattice parameter selection. Concretely, this library should: (i) provide an intuitive and hard-to-misuse interface, (ii) internally use precise estimates for the hardness for lattice problems (e.g., by reducing to the shortest vector problem), and (iii) compute estimates for variants of standard problems of interest (e.g., new lattice assumptions, or standard assumptions with additional leakage). This library would be designed to be integrated into lattirust (to be open-sourced soon), our Rust library for lattice-based cryptography.

Requirements

Applying to this project

This research project/master’s project (PDM) is aimed at one MSc student. The student will work with Christian Knabenhans.

[1] https://github.com/malb/lattice-estimator

CRYPTO2: Systematization of Knowledge of Private Information Retrieval protocols

Private Information Retrieval (PIR) [1] protocols allow a client to retrieve an element of a public database held by a server without revealing its query to the server. In addition, the communication costs between the client and the server are much less than the size of the database. As such, PIR is a very useful building block for privacy-enhanced applications such as private messaging and private search/lookups.

The past few years have seen an explosion of PIR protocols with a flurry of different assumptions and efficiency characteristics. Comparing these protocols is non-trivial, as they can be analyzed under many different angles (cryptographic assumption, client/server processing time, client/server storage overhead, communication bandwidth, pre-processing time/size, behaviour under updates/churn, size/format of database entries (e.g., small entries vs text vs images), etc.).

In this project, you will review existing/recent PIR protocols, establish a formal way of comparing them (e.g., by targeting a few specific use cases). Ideally, you will (re-)implement these protocols in Rust, using our lattirust library (to be open-sourced soon).

Requirements

Applying to this project

This research project/master’s project (PDM) is aimed at one/two BSc/MSc student(s). The student(s) will work with Christian Knabenhans.

[1] https://en.wikipedia.org/wiki/Private_information_retrieval



Projects on Machine Learning

ML1: Reconstruction Attacks using Diffusion Models

Perceptual hashing algorithms are widely used to detect edited copies of targeted content, such as CSAM or non-consensually shared intimate images, in social media platforms. A perceptual hashing algorithm maps an image to a fixed-size vector representation, which captures the main features of the image and is called a perceptual hash. Perceptual hashes are different from cryptographic hashes in that they are robust to small transformations applied to the image, such as grayscaling and resizing. Perceptual hashes are believed to be privacy-preserving because of the signal loss with respect to the original image, as they are typically very low dimensional and consist of bits.

The goal of this project is to design and evaluate reconstruction attacks against perceptual hashes, whose goal is to recover a version of the original image given the hash, and to explore different adversary assumptions. The starting point will be to replicate an existing proof of concept [1]. Then, the student will explore more advanced reconstruction techniques including through the use of generative models such as diffusion-based approaches.

Requirements

Applying to this project

This research project/master’s project (PDM) is aimed at one MSc student. The student will work with Ana-Maria Cretu.

[1] Anish Athalye. Inverting PhotoDNA (2021).



Projects on System Security

SYSTEM1: Constant-Time Fully Homomorphic Encryption/Decryption/Key Generation

Fully Homomorphic Encryption (FHE) allows for arbitrary computations to be performed on encrypted data, and is a very useful building block for more complex privacy-preserving protocols. FHE schemes are now fast enough to be deployed in practice, and are poised to be more widely deployed than ever in 2024 – 2026 thanks to the emergence of FHE hardware accelerators, various standardization efforts, and user-friendly compilers. FHE schemes are implemented in several libraries, but most libraries do not offer constant-time key-generation, encryption, and decryption operations, which can lead to devastating key-recovery attacks in practice [1-3].

Ensuring that these algorithms are actually constant-time is nigh impossible when using a high-level language (e.g., C, C++, Rust) and relies heavily on trusting the compiler and platform, which is an extremely brittle guarantee in practice. In this project, you will implement FHE key generation / encryption / decryption using the low-level Jasmin language, which automatically proves secret-independence of implementations using the EasyCrypt framework. A Jasmin implementation of ML-KEM (a key encapsulation mechanism based on similar cryptographic assumptions as FHE schemes) will provide a starting point for your implementation, but you will need to implement additional FHE-specific operations (e.g., arithmetic over polynomial rings, random sampling from truncated Gaussians, number-theoretic transforms). Ideally, your implementation will be bit-for-bit compatible with an existing FHE library, and will thus be a major step forward for more secure deployments of FHE in the real world.

Requirements

Applying to this project

This research project/master’s project (PDM) is aimed at one MSc student. The student will work with Christian Knabenhans.

[1] F. Aydin, E. Karabulut, S. Potluri, E. Alkim, and A. Aysu, “RevEAL: Single-Trace Side-Channel Leakage of the SEAL Homomorphic Encryption Library”. 2022 Design, Automation & Test in Europe Conference & Exhibition (DATE), IEEE, Mar. 2022

[2] Wei Cheng, Jean-Luc Danger, Sylvain Guilley, Fan Huang, Amina Bel Korchi, et al., “Cache-Timing Attack on the SEAL Homomorphic Encryption Library”. 11th International Workshop on Security Proofs for Embedded Systems (PROOFS 2022), Sep. 2022

[3] F. Aydin and A. Aysu, “Leaking secrets in homomorphic encryption with side-channel attacks”. J. Cryptogr. Eng., Jan. 2024



Projects on Web Security

WEB1: Replay user interactions for interactive web measurement

The vast majority of web privacy studies [1] rely on automated crawls to perform scalable and large measurement studies and extrapolate conclusions about real visits. Underlying most of them an implicit assumption about the representativeness of these simulations to real users.

Today however, the web is becoming more and more interactive [2] as websites turn to webapps and full ecosystems, requiring complex and intentional user interactions to navigate them (eg: searching for a keyword, filling a form, navigating a menu, …). This raises concerns about the capabilities of crawling tools (OpenWPM [3], Puppeteer [4], …) to resemble the visits by users.

As shown by many works disputing the representativeness of crawls [5], the limits of crawlers cause a disparity in metrics of interest to studies (like the total number of contacted ATS domains, total count of fingerprinting scripts, … ) and in turn weaken the generalizeability of their insight to real users.

A main limitation to crawlers is their interactivity with the webpage. Most crawlers are fully static [1], while few interact in preset manners (logging in [6]). A new wave of generative web agents [7] emerged with complex techniques, but exploring the replay of user interactions was skipped altogether. Replaying user interactions is more efficient, more faithful to the user population, and deterministic across repetitions.

In this project, you will explore the problem of recording and replaying user interactions for web measurement, working through alternative designs from related work and language modeling field, devising evaluation frameworks to validate the designs in terms of advantage over static crawling and feasibility, and potentially launching a pilot study.

Requirements

Applying to this project

This research project is aimed at one MSc student. The student will work with Saiid El Hajj Chehade.

[1] Stafeev, A., Center, C. H., & Pellegrino, G. SoK: State of the Krawlers–Evaluating the Effectiveness of Crawling Algorithms for Web Security Measurements.

[2]https://publications.waset.org/10013485/evolution-of-web-development-techniques-in-modern-technology

[3] https://github.com/openwpm/OpenWPM

[4] https://pptr.dev/

[5] David Zeber, Sarah Bird, Camila Oliveira, Walter Rudametkin, Ilana Segall, Fredrik Wollsén, and Martin Lopatka. 2020. The Representativeness of Automated Web Crawls as a Surrogate for Human Browsing. In Proceedings of The Web Conference 2020 (WWW ‘20). Association for Computing Machinery, New York, NY, USA, 167–178.

[6] Yana Dimova (imec-DistriNet, KU Leuven), Tom Van Goethem (Google / imec-DistriNet, KU Leuven), Wouter Joosen (imec-DistriNet, KU Leuven). Everybody’s Looking for SSOmething: A large-scale evaluation on the privacy of OAuth authentication on the web. PETS.

WEB2: Evaluating the effectiveness of network volume metrics for web privacy insights

To measure the effectiveness of PETs, the capabilities of attacks, or argue about the viability of crawlers, researchers need to quantify privacy impact with respect to a set of representative page visits. Most studies use aggregate counting metrics (like the number of total 3rd party domains contacted, total number of API calls, … ) as a proxy for the privacy leakage – either as exposing the user to trackers or fingerprinting the user, depending on the adversarial model.

However, for many cases quantity is not only what matters. For example, more leakage can be caused by a smaller set of high prevalence trackers and the connection between them. Also, some studies show that crawlers contact more 3rd party domains than real users which is counter-intuitive and weakens the representativeness of this metric [1]. This could be partly due to the statefulness of user browser sessions compared to primarily stateless crawlers, blasting the latter with more ATS traffic for cookie syncing for example [2].

In this project, you will tackle this problem in the context of a web evaluation study for various ad blockers, by building a representation of ATS inter-relations and impact on the website, exploring various new metrics for privacy leakage, and comparing them to the baseline across a dataset of popular webpages.

Requirements

Applying to this project

This research project is aimed at one MSc student. The student will work with Saiid El Hajj Chehade.

[1] David Zeber, Sarah Bird, Camila Oliveira, Walter Rudametkin, Ilana Segall, Fredrik Wollsén, and Martin Lopatka. 2020. The Representativeness of Automated Web Crawls as a Surrogate for Human Browsing. In Proceedings of The Web Conference 2020 (WWW ‘20). Association for Computing Machinery, New York, NY, USA, 167–178.

[2] Steven Englehardt and Arvind Narayanan. 2016. Online Tracking: A 1-million-site Measurement and Analysis. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS ‘16). Association for Computing Machinery, New York, NY, USA, 1388–1401.

WEB3: Reproducing/extending privacy measurement and Analysis of 1-Million sites

The web domain is an ever-evolving ecosystem, with new practices introduced rapidly in every direction. Unfortunately, while some improve user web experience, other advances push trackers and advertisers to new heights, increasing the privacy risks for users. In 2016, Englehardt and Narayanan [1] conducted a large-scale study to measure the prevalence and prominence of tracking on the web. This study has been widely cited and has influenced the design of privacy-enhancing technologies. Among other things, they found that only 123 out of 81,000 trackers appear on more than 1% of webpages. Later studies have shown that the landscape has been changing in the last 7 years, we would like to investigate whether the same trends are still valid today, or if new trends have emerged.

In this project, you will reproduce the study of Englehardt and Narayanan [1] and extend it to the present day. You will crawl a large number of websites, measure the prevalence and prominence of tracking, and analyze the results. You will also extend the study by investigating new metrics proposed by newer studies and discuss the implications of the results.

Requirements

Applying to this project

This research project is aimed at one BSc/MSc student. The student will work with Saiid El Hajj Chehade.

[1] Steven Englehardt and Arvind Narayanan. 2016. Online Tracking: A 1-million-site Measurement and Analysis. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS ‘16). Association for Computing Machinery, New York, NY, USA, 1388–1401.