This paper investigates the problem of using a large number of independent, identical sensors jointly for multi-object detection and estimation (MODE), namely massive sensor MODE. This is significantly different to the general target tracking using few sensors. The massive sensor data allows very accurate estimation in theory (but may instead go conversely in fact) but will also cause a heavy computational burden for the traditional filter-based tracker. Instead, we propose a clustering method to fuse massive sensor data in the same state space, which is shown to be able to filter clutter and to estimate states of the targets without the use of any traditional filter. This non-Bayesian solution as referred to massive sensor observation-only (O2) inference needs neither to assume the target/clutter model nor to know the system noises. Therefore it can handle challenging scenarios with few prior information and do so very fast computationally. Simulations with the use of massive homogeneous (independent identical distributed) sensors have demonstrated the validity and superiority of the proposed approach.