SAMControl: Controlling Pose and Object for Image Editing with Soft Attention Mask

Publisher:
Association for Computing Machinery (ACM)
Publication Type:
Journal Article
Citation:
ACM Transactions on Multimedia Computing Communications and Applications
Filename Description Size
SAMControl.pdfAccepted version37.93 MB
Full metadata record
To achieve content-consistent results in text-conditioned image editing, existing methods typically employ a reconstruction branch to capture the source image details via diffusion inversion and a generation branch to synthesize the target image based on the given textual prompt and the masked source image details. However, accurately segmenting source details is challenging with the current fixed-threshold mask strategy. Additionally, the inadequacies in the inversion process can lead to insufficient retention of source details. In this paper, we propose a method called SAMControl ( S oft A ttention M ask) to adaptively control the pose and object details for image editing. SAMControl dynamically learns flexible attention masks for different images at various diffusion steps. Furthermore, in the reconstruction branch, we utilize a direct inversion technique to ensure the fidelity of source details within SAM. Extensive qualitative and quantitative results demonstrate the effectiveness of the proposed method.
Please use this identifier to cite or link to this item: