Abstract:
Deep-learning based watermarking framework has been extensively studied recently. The main structure of such framework is an encoder, a noise layer and a decoder. By training with different distortion sets in the noise layer, the whole network can realize different robustness. However, such framework has a huge drawback that the noise layer must be differentiable, otherwise it cannot be trained end-to-end. But for practical use, much distortions are non-differentiable, so such framework cannot be applied. To address such limitations, this paper propose a triple-phase watermarking framework for practical distortions. The proposed framework consists of three phases including a noise-free initial phase, a mask-guided frequency enhancement phase and an adversarial-training phase. Phase 1 aims to initialize an encoder to embed watermark with high visual quality and a decoder to extract the watermark. In order to generate high quality watermarked image, we design the just noticeable difference (JND)-mask image loss in phase 1 to guide the encoder. At phase 2, based on the investigation of the encoded features and distortions, we propose a mask-guided frequency enhancement algorithm to enhance the encoded feature which ensures the survival of such features after distortion, so that there will be enough features to be learned in phase 3. And phase 3 aims to train a stronger decoder to extract the watermark from the image after practical distortions. The combination of these 3 phases can well handle the non-differentiable problems and make the whole network trainable. Various experiments indicate the superior performance of the proposed scheme in the view of traditional differentiable image processing distortion robustness and practical non-differentiable distortion robustness.