Evasion attacks against watermarking techniques found in MLaaS systems
Deep neural networks have had enormous impact on various domains of computer science applications, considerably outperforming previous state-of-the-art machine learning techniques. To achieve this performance, neural networks need large quantities of data and huge computational resources, which heavily increase their costs. The increased cost of building a good deep neural network model gives rise to a need for protecting this investment from potential copyright infringements. Legitimate owners of a machine learning model want to be able to reliably track and detect a malicious adversary that tries to steal the intellectual property related to the model. This threat is very relevant to Machine Learning as a Service (MLaaS) systems, where a provider supplies APIs to clients, allowing them to interact with their trained proprietary deep learning models. Recently, this problem was tackled by introducing in deep neural networks the concept of watermarking, which allows a legitimate owner to embed some secret information (watermark) in a given model. Through the use of this watermark, the legitimate owners, remotely interacting with a model through input queries, are able to detect a copyright infringement, and prove the ownership of their models that were stolen/copied illegally. In this paper, we focus on assessing the robustness and reliability of state-of-the-art deep neural network watermarking schemes. In particular we show that, a malicious adversary, even in scenarios where the watermark is difficult to remove, can still evade the verification of copyright infringements from the legitimate owners, thus avoiding the detection of the model theft.