Field |
Value |
Language |
dc.contributor.author |
Zhang, Y |
|
dc.contributor.author |
Wang, C
https://orcid.org/0000-0003-1297-768X
|
|
dc.contributor.author |
Fang, F |
|
dc.contributor.author |
Zhuge, Y |
|
dc.contributor.author |
Fan, H |
|
dc.contributor.author |
Chang, X
https://orcid.org/0000-0002-7778-8807
|
|
dc.contributor.author |
Deng, C |
|
dc.contributor.author |
Yang, Y |
|
dc.date.accessioned |
2025-03-02T23:51:34Z |
|
dc.date.available |
2025-03-02T23:51:34Z |
|
dc.identifier.citation |
ACM Transactions on Multimedia Computing Communications and Applications |
|
dc.identifier.issn |
1551-6857 |
|
dc.identifier.issn |
1551-6865 |
|
dc.identifier.uri |
http://hdl.handle.net/10453/185478
|
|
dc.description.abstract |
<jats:p>
To achieve content-consistent results in text-conditioned image editing, existing methods typically employ a
<jats:bold>reconstruction branch</jats:bold>
to capture the source image details via diffusion inversion and a
<jats:bold>generation branch</jats:bold>
to synthesize the target image based on the given textual prompt and the masked source image details. However, accurately segmenting source details is challenging with the current fixed-threshold mask strategy. Additionally, the inadequacies in the inversion process can lead to insufficient retention of source details. In this paper, we propose a method called SAMControl (
<jats:bold>S</jats:bold>
oft
<jats:bold>A</jats:bold>
ttention
<jats:bold>M</jats:bold>
ask) to adaptively control the pose and object details for image editing. SAMControl dynamically learns flexible attention masks for different images at various diffusion steps. Furthermore, in the reconstruction branch, we utilize a direct inversion technique to ensure the fidelity of source details within SAM. Extensive qualitative and quantitative results demonstrate the effectiveness of the proposed method.
</jats:p> |
|
dc.language |
en |
|
dc.publisher |
Association for Computing Machinery (ACM) |
|
dc.relation.ispartof |
ACM Transactions on Multimedia Computing Communications and Applications |
|
dc.relation.isbasedon |
10.1145/3702999 |
|
dc.rights |
info:eu-repo/semantics/restrictedAccess |
|
dc.subject |
0803 Computer Software, 0805 Distributed Computing, 0806 Information Systems |
|
dc.subject.classification |
Artificial Intelligence & Image Processing |
|
dc.subject.classification |
4603 Computer vision and multimedia computation |
|
dc.subject.classification |
4606 Distributed computing and systems software |
|
dc.subject.classification |
4607 Graphics, augmented reality and games |
|
dc.title |
SAMControl: Controlling Pose and Object for Image Editing with Soft Attention Mask |
|
dc.type |
Journal Article |
|
utslib.for |
0803 Computer Software |
|
utslib.for |
0805 Distributed Computing |
|
utslib.for |
0806 Information Systems |
|
pubs.organisational-group |
University of Technology Sydney |
|
pubs.organisational-group |
University of Technology Sydney/Faculty of Engineering and Information Technology |
|
pubs.organisational-group |
University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science |
|
pubs.organisational-group |
University of Technology Sydney/UTS Groups |
|
pubs.organisational-group |
University of Technology Sydney/UTS Groups/Australian Artificial Intelligence Institute (AAII) |
|
utslib.copyright.status |
in_progress |
* |
dc.date.updated |
2025-03-02T23:51:30Z |
|
pubs.publication-status |
Published online |
|