A complex multimedia object is an information unit composed by multiple media types like text, images, audio and video. Applications related with huge sets of such objects exceed the human capacity to synthesize useful information. The search for similarities and dissimilarities among objects is a task that has been done through clustering analysis, which tries to find groups in unlabeled data sets. Such analysis applied to complex multimedia object sets has a special restriction. The method must analyze the multiple media types present in the objects. This paper proposes a clustering ensemble that jointly assesses several media types present in this kind of objects. The proposed ensemble was applied to cluster webpages, constructing a text and image clustering prototypes. The Hubert's statistic was used to evaluate the ensemble performance, showing that the proposed method creates clustering structures more similar to the real classification than a joint-feature vector.