While prior methods struggle with trimap inaccuracies or single-object assumptions, MaGGIe offers efficient instance matting and enhanced temporal consistency.While prior methods struggle with trimap inaccuracies or single-object assumptions, MaGGIe offers efficient instance matting and enhanced temporal consistency.

Evolution of Matting: From Traditional Sampling to MaGGIe's Instance Approach

Abstract and 1. Introduction

  1. Related Works

  2. MaGGIe

    3.1. Efficient Masked Guided Instance Matting

    3.2. Feature-Matte Temporal Consistency

  3. Instance Matting Datasets

    4.1. Image Instance Matting and 4.2. Video Instance Matting

  4. Experiments

    5.1. Pre-training on image data

    5.2. Training on video data

  5. Discussion and References

\ Supplementary Material

  1. Architecture details

  2. Image matting

    8.1. Dataset generation and preparation

    8.2. Training details

    8.3. Quantitative details

    8.4. More qualitative results on natural images

  3. Video matting

    9.1. Dataset generation

    9.2. Training details

    9.3. Quantitative details

    9.4. More qualitative results

2. Related Works

There are many ways to categorize matting methods, here we revise previous works based on their primary input types. The brief comparison of others and our MaGGIe is shown in Table 1.

\ Image Matting. Traditional matting methods [4, 24, 25] rely on color sampling to estimate foreground and background, often resulting in noisy outcomes due to limited high-level object features. Advanced deep learningbased methods [9, 11, 31, 37, 46, 47, 54] have significantly improved results by integrating image and trimap inputs or focusing on high-level and detailed feature learning. However, these methods often struggle with trimap inaccuracies and assume single-object scenarios. Recent approaches [5, 6, 22] require only image inputs but face challenges with multiple salient objects. MGM [56] and its extension MGM-in-the-wild [39] introduce binary maskbased matting, addressing multi-salient object issues and reducing trimap dependency. InstMatt [49] further customizes this approach for multi-instance scenarios with a complex refinement algorithm. Our work extends these developments, focusing on efficient, end-to-end instance matting with binary mask guidance. Image matting also benefits from diverse datasets [22, 26, 27, 29, 33, 50, 54], supplemented by background augmentation from sources like BG20K [29] or COCO [35]. Our work also leverages currently available datasets to concretize a robust benchmark for human-masked guided instance matting.

\ Video Matting. Temporal consistency is a key challenge in video matting. Trimap-propagation methods [17, 45, 48] and background knowledge-based approaches like BGMv2 [33] aim to reduce trimap dependency. Recent techniques [28, 32, 34, 53, 57] incorporate ConvGRU, attention memory matching, or transformer-based architectures for temporal feature aggregation. SparseMat [50] uniquely focuses on fusing outputs for consistency. Our approach builds on these foundations, combining feature and output fusion for enhanced temporal consistency in alpha maps. There is a lack of video matting datasets due to the difficulty in data collecting. VideoMatte240K [33] and VM108 [57] focus on composited videos, while CRGNN [52] is the only offering natural videos for human matting. To address the gap in instanceaware video matting datasets, we propose adapting existing public datasets for training and evaluation, particularly for human subjects.

\ Table 1. Comparing MaGGIe with previous works in image and video matting. Our work is the first instance-aware framework producing alpha matte from a binary mask with both feature and output temporal consistency in constant processing time.

\ \

:::info Authors:

(1) Chuong Huynh, University of Maryland, College Park ([email protected]);

(2) Seoung Wug Oh, Adobe Research (seoh,[email protected]);

(3) Abhinav Shrivastava, University of Maryland, College Park ([email protected]);

(4) Joon-Young Lee, Adobe Research ([email protected]).

:::


:::info This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.

:::

\

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact [email protected] for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.