Recent advances in adversarial machine learning include practical attacks that aim to “steal”/extract an ML-model using only legitimate queries, sometimes with classification labels alone. Attacks have been demonstrated on architectures including decision trees, binary classifiers, 2-layer neural networks, and convolutional neural networks. Each uses a different number of queries and achieves different levels of success (as measured in task accuracy or fidelity to the victim model).
The objective of this project is to quantify the trade-off between number of queries and accuracy across different model architectures, depths, parameter spaces, etc. The first step would be to quantify the trade-off in attacks demonstrated in the literature. A second step would be to generate a series of models varying in input space, layer depth, layer width, parameter precision, etc. and then try to extract the model measuring both the accuracy and the number of queries required.