Advanced multi-sensor systems are expected to combat the challenges that arise in object recognition and state estimation in harsh environments with poor or even no prior information, while bringing new challenges mainly related to data fusion and computational burden. Unlike the prevailing Markov-Bayes framework that is the basis of a large variety of stochastic filters and the approximate, we propose a clustering-based methodology for multi-sensor multi-object detection and estimation (MODE), named clustering for filtering (C4F), which abandons unrealistic assumptions with respect to the objects, background and sensors. Rather, based on cluster analysis of the input multi-sensor data, the C4F approach needs no prior knowledge about the latent objects (whether quantity or dynamics), can handle time-varying uncertainties regarding the background and sensors such as noises, clutter and misdetection, and does so computationally fast. This offers an inherently robust and computationally efficient alternative to conventional Markov–Bayes filters for dealing with the scenario with little prior knowledge but rich observation data. Simulations based on representative scenarios of both complete and little prior information have demonstrated the superiority of our C4F approach.