Our method supports segmentation at multiple granularities.
← Fine Coarse →
Part-level point cloud segmentation has recently attracted significant attention in 3D computer vision. Nevertheless, existing research is constrained by two major challenges: native 3D models lack generalization due to data scarcity, while introducing 2D pre-trained knowledge often leads to inconsistent segmentation results across different views. To address these challenges, we propose S²AM3D, which incorporates 2D segmentation priors with 3D consistent supervision. We design a point-consistent part encoder that aggregates multi-view 2D features through native 3D contrastive learning, producing globally consistent point features. A scale-aware prompt decoder is then proposed to enable real-time adjustment of segmentation granularity via continuous scale signals. Simultaneously, we introduce a large-scale, high-quality part-level point cloud dataset with more than 100k samples, providing ample supervision signals for model training. Extensive experiments demonstrate that S²AM3D achieves leading performance across multiple evaluation settings, exhibiting exceptional robustness and controllability when handling complex structures and parts with significant size variations.
S²AM3D pipeline. Left: with 3D supervision and contrastive learning, the input point cloud is encoded into per-point feature vectors. Right: given a text prompt p and a scale value s, the scale is mapped by a sinusoidal embedding to FiLM parameters that perform channel-wise modulation and produce a scale-enhanced feature representation. The prompt feature is then indexed from this representation and interacts with the global features via bi-directional cross-attention, after which an MLP and a sigmoid layer produce a probability mask.
Our method supports real-time interactive segmentation with adjustable granularity. The following videos demonstrate the interactive segmentation process on various 3D objects.
Below are 16 examples of part segmentation results. Each model can be rotated and zoomed interactively. The colored regions represent different segmented parts.
Using an automated data processing pipeline, we collect a dataset of over 100,000 point cloud instances spanning 400 categories, annotated with approximately 1.2 million fine-grained part labels.
If you find this work useful, please consider citing:
@article{su2025s2am3d,
author = {Su, Han and Huang, Tianyu and Wan, Zichen and Wu, Xiaohe and Zuo, Wangmeng},
title = {S²AM3D: Scale-controllable Part Segmentation of 3D Point Cloud},
journal = {arXiv preprint arXiv:2512.00995},
year = {2025},
}