Applicant
Prof. Dr. Burkhard Rost
Lehrstuhl: I12 – Department for Bioinformatics and Computational Biology
Technische Universität München
Project Overview
Traditional High–Performance Computing (HPC) requires specifically designed communication interfaces for scientific simulation or machine–learning tools. Due to its massively parallel nature, HPC commonly relies on highly automated inter–process communication which often uses file–based implementations. For hundreds of interwoven processes, this entails an explosion of small files putting strain on traditional file systems. To alleviate this strain, we developed a high–performance file system targeted at storing many small files in a unified and scalable way in a background database while being transparent to processes.
To this end, we leveraged the capabilities of FUSE, MongoDB, and Rust to create a user–space file system capable of efficiently working with small flat files and potentially storing additional meta–data. We evaluated the file system by putting it to several read and write tests and by including it in our PredictProtein webservice. To facilitate easy usage, the code comes packaged into a docker image and can be deployed using docker–compose.
During our project, we closely collaborated with the Leibniz Supercomputing Centre which provided guidance and computational resources.