Learning of structured data for generative applications in forensics

DegreeBachelor / Master
StatusAvailable
Supervisor(s)Verena Lachner, MSc

Description

Digital forensics deals with the scientific reconstruction of digital traces for use in a court of law. Over the past two decades, this field has seen a significant increase in the use of machine learning.
The process of learning from digital data and subsequently organizing and representing the learned data has undergone rapid development. Initially, forensic applications of machine learning focussed on tensor data, such as digital images. About a decade ago, research in this field progressed towards tokenization, particularly in natural language processing. Only recently, tabular data is considered as well. However, the realm of forensics encompasses many other forms of structured data that remain to be fully elucidated within the framework of machine learning.

The objective of this thesis is to investigate, through the use of toy problems, which architectural elements of deep learning models are particularly well-suited for the task of learning structured data in the context of digital forensics. The student makes a justified choice of architectural elements (documenting their approach), conducts own experiments, evaluates their results, and gives recommendations for future investigations.

References

  • Gu, J., Wang, Z., Kuen, J., et al. Recent Advances in Convolutional Neural Networks. Pattern Recognition, 77, (2018), 354–377.
  • Vaswani, A., Shazeer, N., Parmar, N., et al. Attention Is All You Need. Advances in Neural Information Processing Systems, (2017), 5998–6008.
  • Shwartz-Ziv, R. and Armon, A. Tabular data: Deep Learning Is Not All You Need. Information Fusion, 81, (2022), 84–90.
  • Pagnoni, A., Pasunuru, R., Rodriguez, P., et al. Byte Latent Transformer: Patches Scale Better Than Tokens. arXiv preprint arXiv:2412.09871, (2024).