Retrieval and classification methods for textured 3D models: a comparative study

This paper presents a comparative study of six methods for the retrieval and classification of textured 3D models, which have been selected as representative of the state of the art. To better analyse and control how methods deal with specific classes of geometric and texture deformations, we built a collection of 572 synthetic textured mesh models, in which each class includes multiple texture and geometric modifications of a small set of null models. Results show a challenging, yet lively, scenario and also reveal interesting insights into how to deal with texture information according to different approaches, possibly working in the CIELab as well as in modifications of the RGB colour space.


Introduction
Thanks to advances in geometric modelling techniques and to the availability of cheaper, yet effective 3D acquisition devices, we are witnessing a dramatic increase in the number of available 3D data [1,36].How to accurately and efficiently retrieve and classify this data has become an important problem in computer vision, pattern recognition, computer graphics and many other fields.Most methods proposed in the last years analyse geometric and/or topological properties of 3D models [4,22,73], that is, they focus on shape.Nevertheless, most sensors are able to acquire not only the 3D shape but also its texture; this is the case, for instance, of the Microsoft Kinect device.Also, image-based modelling and multiple-view stereo techniques enable the recovery of geometric and colourimetric information directly from images [66].
Characterizing 3D shapes based on both geometric and colourimetric features can be of great help while defining algorithms for the analysis and the comparison of 3D data.Texture and colourimetric features contain rich information about the visual appearance of real objects: perceptual studies demonstrated that colour plays a significant role in low-and high-level vision [72].Thus, colourimetric information plays an important role in many shape analysis applications, such as matching and correspondence; it can also provide additional clues for retrieval in case of partial or inaccurate shape scans [28].An example is given by face recognition, where the combination of geometric and colourimetric properties is a way to achieve better trust-worthiness under un-controlled environmental conditions (illumination, pose changes, uncooperative subjects) [26].
The attention towards texture properties has grown considerably over the last few years, as demonstrated by the number of techniques for the analysis of geometric shape and texture attributes that have been recently proposed [33,46,54,64,75,80].Since 2013, a retrieval contest [9] has been launched under the umbrella of the SHREC initiative [76] to evaluate the performances of the existing methods for 3D shape retrieval when dealing with textured models.The contest provided the first opportunity to analyse a number of state-of-the-art algorithms, their strengths as well as their weaknesses, using a common test collection allowing for a direct comparison of algorithms.In 2014 the contest ran over a larger benchmark and was extended to include also a classification task [3].The two events obtained a positive outcome, indeed they saw the participation of six groups in 2013 and eight groups in 2014.
In this context, we present here a comparative study on the retrieval and classification performance of six state-of-the-art methods in the field of textured 3D shape analysis.The present contribution builds on the dedicated SHREC'14 benchmark [3], and extends the associated track in three main respects: • Most of the algorithms tested in [3] have been reimplemented with some modifications for performance improvement.Additionally, a new method has been included in the comparative study in order to have a sufficiently detailed picture of the state-of-the-art scenario; • To help the reader in comparing methods beyond their algorithmic aspects, Section 4.7 presents a taxonomy of methods highlighting the emerging shape structure, the scale at which the shape description is captured, the colour space that is considered to analyse texture shape properties, and how this information is combined with the geometric one; • The analysis of methods has been strengthened by exploiting the peculiar composition of the dataset, which has been populated by considering multiple modifications of a set of null shapes.This has made possible to evaluate how algorithms cope with specific geometric and colourimetric modifications.
The remainder of the paper is organized as follows: In Section 2 we introduce the related literature.Section 3 describes the collection of textured 3D models and how the comparative study has been organized, while in Section 4 we describe the methods implemented and discuss their main characteristics.Experimental results are presented, analysed and discussed in Section 5, while conclusive remarks and possible future developments are outlined in Section 6.

Related literature
While the combination of shape and colour information is quite popular in image retrieval [24] and processing [30,45], most of methods for 3D object retrieval and classification do not take colourimetric information into account [4,73].
The first attempts to devise 3D descriptors for textured objects adopt a 3D feature-vector description and combine it with the colourimetric information, where the colour is treated as a general property without considering its distribution over the shape.For example, Suzuki et al. [71] complemented the geometry description with a colour representation in terms of the Phong's model parameters [57].Similarly, Ruiz et al. [64] combined geometric similarity based on Shape Distributions [53] with colour similarity computed through the comparison of colour distribution histograms, while in Starck and Hilton [69] the colourimetric and the 3D shape information were concatenated into a histogram.
In the field of image recognition, a popular description strategy is to consider local image patches that describe the behaviour of the texture around a group of pixels.Examples of these descriptions are the Local Binary Patterns (LPB) [52], the Scale Invariant Feature Transform (SIFT) [47], the Histogram of Oriented Gradients (HoG) [13] and the Spin Images [27].The generalization of these descriptors to 3D textured models has been explored in several works, such as the VIP description [79], the meshHOG [80] and the Textured Spin-Images [12,54].Further examples are the colour-CHLAC features computed on 3D voxel data proposed by Kanezaki et al. [28]; the sampling method introduced by Liu et al. [46] to select points in regions of either geometry-high variation or colour-high variation, and define a signature based on feature vectors computed at these points; the CSHOT descriptor [75], meant to solve the surface matching problem based on local features, i.e. by point-to-point correspondences obtained by matching shape-and colour-based local invariant descriptors of feature points.
Symmetry is another aspect used to characterize local and global shape properties [51].For instance Kazhdan et al. [29] introduced the Spherical Harmonic descriptor to code the shape according to its rotational symmetry around axes centred in the centre of mass.In [3], the Spherical Harmonic descriptor has been proposed in combination with colourimetric descriptors to analyse textured 3D models.Giachetti and Lovato [25] introduced the Multiscale Area Projection Transform (MAPT) to couple the local degree of radial symmetry (in a selected scale range) with a saliency notion related to high shape symmetry, following an approach similar to the Fast Radial Symmetry [48] used in image processing.Colour-weighted variations of MAPT, merging geometric and texture information, have been presented in [3,9].
In the last years, close attention has been paid to non-rigid 3D shape matching and retrieval.To deal with non-rigid deformations (bendings) it is necessary to adopt shape descriptions that are invariant to isometric shape deformations.A suitable metric for comparing non-rigid shapes is the geodesic one; indeed 3D shape descriptions based on geodesics, such as geodesic distance matrices [68] or geodesic skeleton paths [38], have been successfully adopted for non-rigid shape comparison, see also [44].In addition to geodesic, more sophisticated choices are possible, such as the diffusion or the commute-time distance [77].On the basis of the fact that these distances are well-approximated by the Laplace-Beltrami operator, several spectral descriptors were proposed to characterize the geometric features of non-rigid 3D shapes [37], such as the ShapeDNA [62], the Heat Kernel Signature [23,70], the Wave Kernel Signature [2], the Global Point Signature [65] and the Spectral Graph Wavelet Signature [40].In the contest of textured 3D meshes, the Photometric Heat Kernel Signatures [31,32,33] fuse geometry and colour in a local-global description.The underlying idea is using the diffusion framework to embed the shape into a high-dimensional space where the embedding coordinates represent the photometric information.Following the same intuition, in [5] the authors generalized the geodesic distance as a hybrid shape description able to couple geometry and texture information.
Other invariance classes can be relevant in applications, possibly including non-isometric transformations such as topological deformations or local and global scaling.In this case, topological approaches [16,21] offer a modular framework in which it is possible to plug in multiple shape properties in the form of different real functions, so as to describe shapes and measure their (dis)similarity up to different notions of invariance.Examples of these descriptions are Reeb graphs [6,61], size functions [7], persistence diagrams [42,11] and persistence spaces [10].Recently, topological descriptors have been shown to be a viable option for comparing shapes endowed with colourimetric information [5].

The benchmark
In this Section, we describe the benchmark adopted in the proposed comparative analysis.The dataset and the ground truth are available by following the instructions at http://www.ge.imati.cnr.it/?q=shrec14.

The dataset
The dataset is made of 572 watertight mesh models, see Figure 1, grouped in 16 geometric classes of 32 or 36 instances.Each geometric class represents a type of geometric shape (e.g., humans, birds, trees, etc).Besides the geometric classification, models are also classified in 12 texture classes.Each texture class is characterized by a precise pattern (e.g., marble, wood, mimetic, etc).
The collection is built on top of a set of null models, that is, base meshes endowed with two or three different textures.All the other elements in the dataset come as the result of applying a shape transformation to one of the null shapes, so that a geometric and a texture deformation are randomly combined case by case.
The geometric deformations include the addition of Gaussian noise, mesh re-sampling, shape bending, shape stretching and other non-isometric transformations that do not necessarily preserve the metric properties of shapes (e.g. the Riemannian metric).
As for texture deformations, they include topological changing and scaling of texture patterns, as well as affine transformations in the RGB colour channels, resulting in, e.g., lighting and darkening effects, or in a sort of pattern blending.While the topological texture deformation has been applied manually, affine transformations admit an analytic formulation.A numerical parameter allows to tune each analytic formulation, thus making possible to automatically generate a family of texture deformations.In our dataset, texture transformations are grouped in five families, each family being the result of three different parameter values.
Figure 2 illustrates some geometric and texture deformations in action.The added value in working with a dataset built in this way is that particular weaknesses and strengths of algorithms can be better detected and analysed.Indeed, methods can be evaluated in specific tasks, for example retrieval against (simulated) illumination changing or degradation of texture pattern.This is actually part of the proposed comparative study, see Section 5.1.1 for more details.
Together with the dataset, a training set made of 96 models classified according to both geometry (16 classes) and texture (12 classes) has been made available for methods requiring a parameter tuning phase.

The retrieval and classification tasks
In our analysis we distinguished two tasks: retrieval and classification.For each task, at most three runs for each method have been considered for evaluation, being the result of either different parameter settings or more sub- stantial method variations.
Retrieval task.Each model is used as a query against the rest of the dataset, with the goal of retrieving the most relevant objects.For a given query, a retrieved object is considered highly relevant if the two models share both geometry and texture; marginally relevant if they share only geometry; not relevant otherwise.For this task, a dissimilarity 572 × 572 matrix was required, each element (i, j) recording the dissimilarity value between models i and j in the whole dataset.
Classification task.The goal is to assign each query to both its geometric and texture class.To this aim, a nearest neighbour (1-NN) classifier has been derived from the dissimilarity matrices used in the retrieval task.For each run, the output consists of two classification matrices, a 572 × 16 one for the geometric classifi-cation and a 572 × 12 one for the texture classification.In these matrices, the element (i, j) is set to 1 if i is classified in class j (that is, the nearest neighbour of model i belongs to class j), and 0 otherwise.

The evaluation measures
The following measures have been used to evaluate the retrieval and classification performances of each method.

Retrieval evaluation measures
3D retrieval evaluation has been carried out according to standard measures, namely precision-recall curves, mean average precision, Nearest Neighbour, First Tier, Second Tier, Normalized Discounted Cumulated Gain and Average Dynamic Recall [76].
Precision-recall curves and mean average precision.Precision and recall are common measures to evaluate information retrieval systems.Precision is the fraction of retrieved items that are relevant to the query.Recall is the fraction of the items relevant to the query that are successfully retrieved.Being A the set of relevant objects and B the set of retrieved object, Note that the two values always range from 0 to 1.For a visual interpretation of these quantities it is useful to plot a curve in the reference frame recall vs. precision.We can interpret the resul as follows: the larger the area below such a curve, the better the performance under examination.In particular, the precision-recall plot of an ideal retrieval system would result in a constant curve equal to 1.As a compact index of precision vs. recall, we consider the mean average precision (mAP), which is the portion of area under a precision recallcurve: from the above considerations, it follows that the maximum mAP value is equal to 1.
Nearest Neighbour, First Tier and Second Tier.These evaluation measures aim at checking the fraction of models in the query's class also appearing within the top k retrievals.Here, k can be 1, the size of the query's class, or the double size of the query's class.Specifically, for a class with |C| members, k = 1 for the nearest neighbour (NN), k = |C| − 1 for the first tier (FT), and k = 2(|C| − 1) for the second tier (ST).Note that all these values necessarily range from 0 to 1.
Average dynamic recall.The idea is to measure how many of the items that should have appeared before or at a given position in the result list actually have appeared.The average dynamic recall (ADR) at a given position averages this measure up to that position.Precisely, for a given query let A be the set of highly relevant (HR) items, and let B be the set of marginally relevant (MR) items.Obviously A ⊆ B. The ADR is computed as: where r i is defined as Normalized discounted cumulated gain.It is first convenient to introduce the discounted cumulated gain (DCG).
Its definition is based on two assumptions.First, highly relevant items are more useful if appearing earlier in a search engine result list (have higher ranks); second, highly relevant items are more useful than marginally relevant items, which are in turn more useful than not relevant items.Precisely, the DCG at a position p is defined as: with rel i the graded relevance of the result at position i.
Obviously, the DCG is query-dependent.To overcome this problem, we normalize the DCG to get the normalized discounted cumulated gain (NDCG).This is done by sorting elements of a retrieval list by relevance, producing the maximum possible DCG till position p, also called ideal DCG (IDCG) till that position.For a query, the NDCG is computed as It follows that, for an ideal retrieval system, we would have NDCG p = 1 for all p.

Classification performance measures.
We also consider a set of performance measures for classification, namely confusion matrix, sensitivity, specificity and the Mattews correlation coefficient [19,49].
Confusion matrix.Each classification performance can be associated with a confusion matrix CM , that is, a square matrix whose order is equal to the number of classes (according to either the geometric or the texture classification) in the dataset.For a row i in CM , the element CM (i, i) gives the number of items which have been correctly classified as elements of class i; similarly, elements CM (i, j), with j = i, count items which have been misclassified, resulting as elements of class j rather then elements of class i.Thus, the classification matrix CM of an ideal classification system should be a diagonal matrix, such that the element CM (i, i) equals the number of items belonging to the class i.
Sensitivity, specificity and Matthews correlation coefficient.These statistical measures are classical tools for the evaluation of classification performances.Sensitivity (also called the true positive rate) measures the proportion of true positives which are correctly identified as such (e.g. the percentage of cats correctly classified as cats).Specificity (also known as true negative rate) measures the proportion of true negatives which are correctly identified as such (e.g. the percentage of non-cats correctly classified as non-cats).A perfect predictor is 100% sensitive and 100% specific.The Matthews correlation coefficient takes into account true and false positives and negatives and is generally regarded as a balanced measure which can be used even if the classes are of very different sizes.The MCC is in essence a correlation coefficient between the observed and predicted classifications; it returns a value between -1 and 1.A coefficient of 1 represents a perfect classification, 0 no better than random classification and -1 indicates total disagreement between classification and observation.

Description of the methods
Six methods for textured 3D shape retrieval and classification have been implemented.In this section we describe them in detail, focusing also on their possible variations and the choice of the parameters adopted to implement the runs used in our comparative evaluation.

Histograms of Area Projection Transform and Colour
Data and Joint Histograms of MAPT and RGB data (runs GG1,GG2,GG3), Section 4.1.These runs are based on a multi-scale geometric description able to capture local and global symmetries coupled with histograms of the normalized RGB channels; 2. Spectral geometry based methods for textured 3D shape retrieval (runs LBG1, LBG2, LBG3 and LBGtxt), Section 4.2.These runs combine an intrinsic, spectral descriptor with the concatenated histogram of the RGB values; 3. Colour + Shape descriptors (runs Ve1, Ve2, Ve3), Section 4.3.These runs adopt combinations (with different weights) of the histogram of the RGB val-ues with a geometric descriptors represented by the eigenvalues of the geodesic distance matrix; 4. Textured shape distribution, joint histograms and persistence (runs Gi1, Gi2, Gi3), Section 4.4.These runs combine several geometric, colourimetric and hybrid descriptors: namely, the spherical harmonics descriptor, the shape distributions of the geodesic distances weighted with the colourimetric attributes and a persistence-based description based on the CIELab space; 5. Multiresolution Representation Local Binary Pattern Histograms (run TAS), Section 4.5.This run captures the geometric information through the combination of a multi-view approach with local binary patterns and combines it with the concatenated histograms of the CIELab colour channels; 6. PHOG: Photometric and geometric functions for textured shape retrieval (runs BCGS1, BCGS2, BCGS3), Section 4.6.These runs combine a shape descriptor based on geometric functions; a persistencebased descriptor built on a generalized notion of geodesic distance that combines geometric and colourimetric information; a purely colourimetric descriptor based on the CIELab colour space.

Histograms of Area Projection Transform and Colour Data and Joint Histograms of MAPT and RGB data (runs GG1-3)
Computing the similarity between textured meshes is achieved according to two different approaches based on histograms of Multiscale Area Projection Transform (MAPT) [25].MAPT originates from the Area Projection Transform (APT), a spatial map that measures the likelihood of the 3D points inside the shape of being centres of spherical or cylindrical symmetry.For a shape S represented by a surface mesh S, the APT is computed at a 3D point x for a radius of interest r: where T r (S, n) is the surface parallel to S shifted along the inward normal vector n for a distance r, and k σ (x) is a sphere of radius σ centred in x.APT values at different radii are normalized to have a scale-invariant behaviour, creating the Multiscale APT (MAPT): A discrete version of the MAPT function is implemented following [25].Roughly, the map is estimated on a grid of voxels with side length s and for a set of corresponding sampled radius values r 1 , ..., r t .This grid partitions the mesh's bounding box, but only the voxels belonging to the inner region of the mesh are considered when creating the histogram.
Histograms of MAPT are very good global shape descriptors, performing state of the art results on the SHREC 2011 non-rigid watertight contest dataset [43].For that retrieval task, the MAPT function was computed using 8 different scales (radius values) and the map values were quantized in 12 bins; finally the 8 histograms were concatenated creating an unique descriptor of length 96.The voxel size and the radius values were chosen differently for each model, proportionally to the cube root of the object volume, in order to have the same descriptor for scaled versions of the same geometry.The value of c was always set to 0.5.
To deal with textured meshes, the MAPT approach has been modified in two different ways, so to exploit also the colour information.

Histograms of MAPT and Colour Data
MAPT histograms are computed with the same radii and sampling grid values as in [25]: the isotropic sampling grid is proportional to the cube root of the volume V of each model, that is, of side length s = 3 √ V /30), and the sampled radii are integer multiples of s (10 values from 2s to 11s).The radius σ is taken, as in the original paper, equal to r i /2 for all the sampled r i .Furthermore, for each mesh the histogram of colour components is computed.With this procedure each mesh is described by two histograms, the first one representing the geometric information and the second one representing the texture information.The total dissimilarity between two shapes S 1 , S 2 is then assessed using a convex combination of the two histogram distances: where 0 ≤ γ ≤ 1 , d geo (S 1 , S 2 ) is the normalized Jeffrey divergence between the two MAPT histograms of S 1 and S 2 , and d clr (S 1 , S 2 ) corresponds to the normalized χ 2 -distance of the two colour histograms.The choice of γ in (1) allows the user to decide the relevance of colour information in the retrieval process.
Results shown in Section 5 are obtained by applying two different pre-processing steps to the RGB values, both adopted to have a colour representation that is invariant to illumination changes.
The first method, resulting in run GG1, is a simple contrast stretching for each RGB channel, mapping the min-max range of each channel to [0, 1].In this case the colour quantization is set to 4 bins for each normalized RGB channels and γ is set to 0.6.
The second model, corresponding to run GG2, is the greyworld representation [20] in which each RGB value is divided by its corresponding channel mean value: Here, γ = 0.6 and the 4 histogram bins are centred in the mean value and are linearly distributed within a range of [0, 2] for each greyworld RGB channel.

Joint Histograms of MAPT and greyworld RGB data
To get run GG3, a new descriptor has been designed by concatenating the original APT histogram with those obtained from the 3 components of the selected (normalized) colour space.For each voxel, the APT is evaluated at a certain radius; the procedure is then repeated for all the radius values (r 1 , ..., r t ), and the t histograms are finally linearized and concatenated.In the present paper, a sampling grid with side length s = 3 √ V /18 has been used for each model, together with 9 sampled radii that are integer multiples of s (t = 9 values from 2s to 10s).The APT target set has been divided into 8 bins.As for the colour components, the above greyworld RGB representation has been adopted, and each channel has been quantized in 4 bins in the range [0, 2].The dissimilarity between two meshes is obtained with the normalized Jeffrey divergence [15] between the two corresponding linearized and concatenated sets of joint histograms.

Spectral geometry -based methods for textured 3D shape retrieval (runs LBG1-3, LBGtxt)
This method is build on the spectral geometry-based framework proposed in [37], suitably adapted for textured 3D shape representation and retrieval.
The spectral geometry approach, which is based on the eigendecomposition of the Laplace-Beltrami operator (LBO), provides a rich set of eigenbases invariant to isometric transformations.Also, these eigenbases serve as ingredients for two further steps: feature extraction, detailed in Section 4.2.1, and spatial sensitive shape comparison via intrinsic spatial pyramid matching [39], discussed in Section 4.2.2.The cotangent weight scheme [14] was used to discretize LBO.The eigenvalues λ i and associated eigenfunctions ϕ i can be computed by solving the generalized problem where A is a positive-definite diagonal area matrix and C is a sparse symmetric weight matrix.In the proposed implementation, m is set to 200.

Feature extraction
The first step consists in the computation of an informative descriptor at each vertex of a triangle mesh representing a shape.The spectral graph wavelet signature [40] is used to capture geometry information and colour histogram to encode texture information.
Geometry information.In general, any of the spectral descriptors with the eigenfunction-squared form reviewed in [41] can be considered in this framework for isometric invariant representation.Here, the spectral graph wavelet signature (SGWS) is adopted as local descriptor.SGWS provides a general and flexible interpretation for the analysis and design of spectral descriptors.For a vertex x of a triangle mesh, it is defined as In a bid to capture both global and local geometry, a multi-resolution shape descriptor is derived by setting g(t, λ i ) as a cubic spline wavelet generating kernel and considering the scaling function (see [40, Eq. ( 20)] for a precise formulation of g).This leads to the multiscale descriptor defined as SGWS(x) = {SGWS t (x), t = 1, . . ., T }, with T the chosen resolution and SGWS t (x) the shape signature at the resolution level t.In the proposed implementation T is set to 2.
Texture information.Colour histograms (CH) are used to characterize texture information on the surface.Each channel is discretized into 5 bins.

Shape comparison via intrinsic spatial pyramid matching
To incorporate the spatial information, the Intrinsic Spatial Pyramid Matching (ISPM) [39] is considered.ISPM can provably imitate the popular spatial pyramid matching (SPM) [35] to partition a mesh in a consistent and easy way.Then, Bag-of-Feature (BoF) and Locality-constrained Linear Coding (LLC) [78] can be used to characterize the partitioned regions.The isocontours of the second eigenfunction (Figure 3) are considered to partition the shape into R regions, with R = 2 l−1 for the partition at a resolution level l.Indeed, the second eigenfunction is the smoothest mapping from the manifold to the real line, making this intrinsic partition quite stable.Thus, the shape description is given by the concatenation of R sub-histograms of SGWS and CH along eigenfunction values in the real line.To consider the two-sign possibilities in the concatenation, the histogram order is inverted, and the scheme with the minimum cost is considered as a better matching.Therefore, the descriptive power of SGWS and CH is enhanced by incorporating this spatial information.
Fig. 3 The isocontours of the second eigenfunction.
Given a SGWS+CH descriptor densely computed on each vertex on a mesh, quantization via the codebook model approach is adopted to obtain a compact histogram shape representation.The classical k-means method is used to learn a dictionary Q = {q 1 , . . ., q K }, where words are obtained as the K centroids of the kmeans clusters.In the proposed implementation, K = 100.In order to assign the descriptor to a word in the vocabulary, approximated LLC is performed for fast encoding, then max-pooling is applied to each region.Finally, ISPM induced histograms for shape representation are derived.
The dissimilarity between two shapes is given by L 2 distance between the associated ISPM induced histograms.Geometry and texture information are handled separately, and the final dissimilarity score is a combination of the geometric and the texture distance.

The runs
The proposed approach has been implemented to derive three different runs for the retrieval task: -LBG1 represents LCC strategy with partition level l = 1 for geometric information; -LBG2 represents LCC strategy with partition level l = 3 for geometric information; -LBG3 is a weighted combination of geometric and texture information, namely LCC strategy with partition level 3 for SGWS and partition level l = 5 for colour histograms, with coefficients 0.8 and 0.2, respectively.
For the classification task, two nearest neighbour classifiers are derived, a geometric one from LBG2 and a texture one from the texture contribution of LBG3.In what follows, the latter is referred to as LBGtxt.This method is a modification of the "3D Shape + colour" descriptor proposed in [9].To describe a textured 3D shape S represented by a surface mesh S, two main steps are considered: 1. Let G be a n × n geodesic distance matrix, where n is the number of vertices in S and the element G(i, j) denotes the geodesic distance from the vertex i to vertex j on S. Building on G, the centralised geodesic matrix [50] is defined as , where 1 n denotes a n×n matrix having each component equal to 1/n.Following [68], a spectral representation of the geodesic distance is finally adopted as shape descriptor, that is, a vector of eigenvalues Eig(D) = (λ 1 (D), . . ., λ n (D)), where λ i (D) is the ith largest eigenvalue.As in [9], the first 40 eigenvalues are used as shape descriptor.The vectors of eigenvalues Eig(D 1 ), Eig(D 2 ) associated with two shapes S 1 , S 2 are compared through the mean normalized Manhattan distance, i.e., 2. To incorporate texture information in the shape descriptor, the RGB colour histograms are considered as in [9].Accordingly, the distance d clr (S 1 , S 2 )) between the texture shape descriptors associated with S 1 , S 2 is given by the Earth mover's distance (EMD) between the corresponding RGB colour histograms.For two histograms p and q, the EMD measures the minimum work that is required to move the region lying under p to that under q.Mathematically, it has been defined as the total flow that minimizes the transport from p to q.We refer keen readers to [63] for a comprehensive review of EMD formulation, and to [58] for an application to shape retrieval.To concretely evaluate EMD, the fast implementation introduced by [56] has been used with a thresholded ground distance.
Last, the final distance between S 1 and S 2 is defined as follows: where p is a parameter to control the trade-off between colour and shape information.In the experiments, p = 0.75 (run Ve1), p = 0.85 (run Ve2), and p = 0.95 (run Ve3), following the paradigm that geometric shape properties should be more important than colourimetric ones in the way humans interpret similarity between shapes.An illustration of the proposed description for a textured shape is given in Fig. 4. The CIELab colour space well represents how human eyes perceive colours.Indeed, uniform changes of coordinates in the CIELab space correspond to uniform changes in the colour perceived by the human eye.This does not happen with some other colour spaces, for example the RGB space.In the CIELab colour space, tones and colours are held separately: the L channel is used to specify the luminosity or the black and white tones, whereas the a channel specifies the colour as either a green or a magenta hue and the b channel specifies the colour as either a blue or a yellow hue.
Run Gi1.The Textured Shape Distribution (TSD) descriptor is a colour-aware variant on the classical Shape Distributions (SD) descriptor [53].Indeed, TSD consists of the distribution of colour-aware geodesic distances, which are computed between a number of sample points scattered over the surface mesh representing the 3D model.The surface mesh is embedded in the 3-dimensional CIELab colour space, so that each vertex has (L,a,b) coordinates.Then, in order to get colour-aware geodesic distances, a metric has to be defined in the embedding space.To this end, the length of an edge is defined as the distance between its endpoints, namely, the CIE94 distance defined for CIELab coordinates [18].This distance is used here instead of a classical Euclidean distance as it was specifically defined for the CIELab space, and employs specific weights to respect perceptual uniformity [18].The colour-aware geodesic distances are computed in the embedding space with the metric induced by the CIE94 distance.The distances are computed between pairs of points sampled over the surface mesh.A set of 1024 points was sampled in the current implementation, following a farthest-point criterion.The Dijkstra algorithm was used to compute colourimetric geodesic distances between pairs of samples.
The final descriptor encodes the distribution of these distances.In the current implementation, the distribution was discretized using a histogram of 64 bins.Histograms were compared using the L 2 norm.Therefore, the distance between two models is the distance between their descriptors, namely the L 2 norm between the corresponding histograms.TSD encodes the distribution of colour distances, yet it also takes into account the connectivity of the underlying model, as distances are computed by walking on the surface model.In this sense, TSD can be considered as a hybrid descriptor, taking into account both colourimetric and geometric information.
Run Gi2.Though TSD retains some information about the shape of 3D models, in terms of the connectivity of the mesh representing the object, it still loses most of the geometric information about the object, as it does not take into account the length of the edges in the Euclidean space.This geometric information can be recovered by using a joint distribution, which takes into account both colourimetric geodesic distances and classical geodesic distances computed on the surface embedded in the Euclidean space.In this run, the joint distribution has been discretized by computing a 16×16 bi-dimensional joint histogram (JH) for each 3D model.The L 2 -norm is used for comparison.The distance matrix is the sum of the distance matrix obtained using the TSD descriptor and the distance matrix obtained using the JH descriptor.Run Gi3.In [5] the authors proposed a signature which combines geometric, colourimetric, and hybrid descriptors.In line with this idea, Run Gi3 combines TSD with a geometric descriptor, namely the popular Spherical Harmonic (SD) descriptor [29], and a colourimetric descriptor, namely the persistence-based descriptor of the PHOG signature in [5], using the CIELab colour space coordinates.The distance matrix corresponding to this run is the sum of the three distance matrices obtained using the TSD descriptor, the SH descriptor, and the persistence-based descriptor of PHOG, respectively.

Multi-resolution Representation Local Binary Pattern Histograms (run TAS)
The Multi-resolution Representation Local Binary Pattern Histograms (MRLBPH) is proposed here as a novel 3D model feature that captures textured features of rendered images from 3D models by analysing multiresolution representations using Local Binary Pattern (LBP) [52].
Figure 5 illustrates the generation of MRLBPH.A 3D model is normalized via Point SVD [74] to be contained in a unit geodesic sphere.From each vertex of the sphere, depth and colour buffer images with 256 × 256 resolution are rendered; a total of 38 viewpoints are defined.A depth channel and each CIELab colour channel are then processed as detailed in what follows.
To obtain multi-resolution representations, a Gaussian filter is applied to an image with varying standard deviation parameters.The standard deviation parameter σ l at level l is evaluated by using the following equation involving the left factorial of l: where σ 0 is the initial value of the standard deviation parameter, and α is the incremental parameter.This equation has been derived from the optimal standard deviation parameters obtained through preliminary experiments.In the proposed implementation, σ 0 = 0.8 and α = 0.6, while the number of levels Λ is set to 4.
For each scale image, a LBP histogram is evaluated.To incorporate spatial location information, the image is partitioned into 2 × 2 blocks and the LBP histogram at each block is computed.The LBP histogram of each scale image is obtained by concatenating the histograms of these blocks.Let g c denote the image value at arbitrary pixel (u, v), and let g 1 , . . ., g 8 be the image values of each of the eight neighbourhood pixels.The LBP value is then calculated as where s(t, g) is a threshold function defined as 0 if g < t and 1 otherwise.In the proposed implementation, the threshold value t is set to 0, and the LBP values are quantized into 64 bins.
An MRLBP histogram is generated by merging the histograms of scale images through the selection of the maximum value of each histogram bin.Let h (l) i be the ith LBP histogram element of a scale image at level l.The ith MRLBP histogram element h i is defined as i , l = 0, . . ., Λ.The MRLBP histogram is finally normalized using the L 1 norm.
The feature vector associated with a 3D model is obtained by calculating the MRLBP histogram of the depth and CIELab channel for each viewpoint.
To compare two shapes S 1 and S 2 , the Hungarian method [34] is applied to all dissimilarities between the associated MRLBP histograms.In evaluating the final dissimilarity score, the histograms of the depth and CIELab channels are combined by calculating the weighted sum of each dissimilarity.Let d d , d L , d a , and d b denote the dissimilarity of each channel, and let w d , w L , w a , and w b be the weight of each channel.The dissimilarity D(S 1 , S 2 ) is defined as the dissimilarity between the corresponding MRLBP histograms: In this implementation, w d is set to 0.61, w L to 0.13, w a to 0.13, and w b to 0.13.For the dissimilarity between two histograms, the Jeffrey divergence is used [15].4.6 PHOG: Photometric and geometric functions for textured shape retrieval (runs BCGS1-3) The combination of colourimetric properties and geometric properties represented in terms of scalar and multi-variate functions has been explored in PHOG [5], a shape signature consisting of three parts: -A colourimetric descriptor.CIELab colour coordinates (normalized L, a, b channels) are seen as either scalar or multi-variate functions defined over the shape.The CIELab colour space is considered due the perceptual uniformity of this colour representation; -A hybrid descriptor.Shape and texture are jointly analysed by opportunely weighting the colourimetric information (L, a, b channels) with respect to the underlying geometry and topology; -A geometric description relying on a set of functions representing as many geometric shape properties.
Functions are first clustered; then, a representative function is chosen for each cluster.The goal here is to select functions that are mutually independent, thus complementing each other via the geometric information they carry with them.
Figure 6 shows a pictorial representation for the generation of a PHOG signature.
Run BCGS1.Following the PHOG original setting, the colourimetric description is included in the persistence framework.Indeed, the a, b coordinates are used to jointly define a bivariate function over a given shape, whereas L is used as a scalar function.In this way, colour and intensity are treated separately.Precisely, for a shape S represented by a triangle mesh S, the two functions f L : S → R and f a,b : S → R 2 are considered, the former taking each point x ∈ S to the L-channel value at x, the latter to the pair given by the a-and the b-channel values at x, respectively.The values of f L and f a,b are then normalized to range in the interval [0,1].Last, S is associated with the 0th persistence diagram Dgm(f L ) and the 0th persistence space Spc(f a,b ): these descriptors encode the evolution of the connectivity in the sublevel sets of f L and f a,b in terms of birth and death (i.e.merging) of connected components, see [5] for more details.
The hybrid description comes from a geodesic distance f geod : S → R defined in a higher dimensional embedding space, similarly to the approach proposed in [31,33], and used as a real-valued function in the persistence framework to associate S with the persistence diagram Dgm(f geod ).The definition of the joint geometric and colourimetric integral geodesic distance is straightforward and implemented through the Dijkstra's algorithm, which is based on edge length.
The geometric description is based on the DBSCAN clustering technique [17].Once a set of functions {f i : S → R} (from an original set of 70 geometric functions, see [5] for the complete list) is selected, a matrix M DM (S) with entries is used to store the distances between all the possible couple of functions, with ∇ t f i , ∇ t f j representing the gradient of f i and f j over the triangle t of the mesh S.
To assess the similarity between two shapes S 1 and S 2 , the corresponding colourimetric, hybrid and geometric descriptions are compared.In particular, the colourimetric distance d clr (S 1 , S 2 ) is the normalized sum of the Hausdorff distance between the 0th persistence diagrams of f L and that between the 0th persistence Variations.Several variations of the PHOG framework are possible, for instance exploring the use of different distances between feature vectors or dealing with variations of the three (colourimetric, hybrid, geometric) shape descriptions.For the current implementation the following changes have been proposed: run BCGS2.The original hybrid description is replaced by a histogram-based representation of the geodesic distance.While getting rid of the additional geometric contribution provided by persistence, the hybrid perspective is maintained as the considered geodesic distance takes into account both geometric and texture information.-run BCGS3.The stability properties of persistence diagrams and spaces imply robustness against small variations in the L, a, b values.This also holds when colour perturbations are widely spread over the surface model, as in the case of slight illumination changes.On the other hand, colour histograms behave well against localized colourimetric noise, even if characterized by large variations in the L, a, b values.Indeed, in this case colour distribution is not altered greatly.In this view, the idea is to replace the hybrid contribution with CIELab colour histograms, so to improve the robustness properties of the the persistence-based description.Histograms are ob-tained as the concatenation of the L, a, b colour channels.
In Runs BCGS2-3, histograms are compared through the Earth Mover's distance (EMD).The DBSCAN clustering technique for selecting representative geometric functions is replaced by the one used in [8], which is based on the replicator dynamics technique [55].The modified geometric descriptors are compared via the EMD as well, after converting the M DM matrices into feature vectors.

Taxonomy of the methods
The methods detailed above can be considered as representatives of the variety of 3D shape retrieval and classification techniques overviewed in Section 2. Indeed, they range from local feature vector descriptions coded as histograms of geometric and/or colour properties, to spectral and topological based descriptions, including also a spatial pyramid matching framework.In what follows, we group the properties of these methods on the basis of the key characteristics they exhibit, e.g., the geometric and colourimetric structure they capture, at which scale level the shape description is formalized, or which colour space has been chosen for texture analysis.These characteristics are briefly described in the following and summarized in Table 1.
Intrinsic vs. extrinsic.Studying the geometric shape of a 3D model relies on the definition of a suitable metric between its points.Among the possible options, two particular choices appear quite natural.
The first one is to consider the Euclidean distance, which in turn reflects the extrinsic geometry of a shape.Extrinsic shape properties are related to how the shape is laid out in an Euclidean space, and are therefore invariant to rigid transformations, namely rotations, translations an reflections.
A second choice is to measure the geodesic distance between points, that is, to consider the intrinsic geometry of a shape.Intrinsic shape properties are invariant to those transformations preserving the intrinsic metric, including rigid shape deformations but also non-rigid ones such as shape bendings.
Global vs. multi-scale.Shape descriptors generally encode information about local or global shape properties.Local properties reflect the structure of a shape in the vicinity of a point of interest, and are usually unaffected by the geometry or the topology outside that neighbourhood.Global properties, on the other hand, capture information about the whole structure of a shape.
Another option is to deal with shape properties at different scales, thus providing a unifying interpretation of local and global shape description.Such an approach is usually referred to as multi-scale.
The methods presented in this contribution can be classified as global and multi-scale ones, both for geometric and texture information, see Table 1 for details.
RGB vs. non-RGB.The more natural colour space to be used in the analysis of colourimetric shape properties appears to be the RGB one.This was actually the choice carried on by methods associated with runs LBGtxt and Ve(1-3).However, other options are possible.For instance, runs GG(1-3) are based on normalized and averaged RGB channels, while methods related to runs TAS, BCGS(1-3), Gi(1-3) study colourimetric shape properties in the CIELab colour space.
Feature vectors vs. topology.Once geometric and texture shape properties have been analysed and captured, they have to be properly represented through suitable shape descriptors.The most popular approach is to use feature vectors [73], and most of the methods implemented in this paper actually adopt this description framework.Feature vectors generally encode shape properties expressed by functions defined on the shape, and are usually represented as histograms.While being very efficient to compute, histograms might forget part of the structural information about the considered shape property.To overcome this limitation, it is possible to consider the shape connectivity directly at the function level, see for instance shape distributions (runs Gi1 and Gi2).Alternatively, one can move to more informative histogram variations: bi-dimensional histograms, such as the geodesic distance matrix used in runs Ve(1-3) and the mutual distance matrix adopted in runs BCGS(1-3), or concatenated histograms obtained at different resolution levels, as in the case of runs GG(1-3), LBG(1-3,txt) and TAS.
A different way to preserve the structure of geometric information is provided by descriptors rooted in topology (runs BCGS(1-3) and Gi3).Indeed, they keep track of the spatial distribution of a considered shape property, and possibly encode the mutual relation among shape parts of interest, that is, regions that are highly characterized by the considered property.The reader is referred to Table 1 for details about the approaches adopted by methods under evaluation.
Hybrid vs. combined.Finally, methods can be distinguished by the way the geometric and texture information are merged together.Typically, this can be done either a priori or a posteriori.The first case results in a hybrid shape descriptor, as for runs GG3 and partially for runs BCGS(1-3) and Gi(1-3).In the second case, a pure geometric and a pure texture descriptor are obtained and compared separately, while the final dissimilar score is a weighted combination of the two distances.

Comparative analysis
The methods detailed in Section 4 have been evaluated through a comparative study presented in what follows.Each run has been processed in terms of the output specified in Section 3.2 and according to the evaluation measures described in Sections 3.3.1 and 3.3.2.

Retrieval performances.
Following [76], the retrieval performance of each run has been evaluated according to the following relevance scale: if a retrieved object shares both shape and texture with the query, then it is highly relevant; if it shares only shape, it is considered marginally relevant; otherwise, it is not relevant.Note that, because of the multilevel relevance assessment of each query, most of the evaluation measures have been split up as well.Highly relevant evaluation measures relate to the highly relevant items only, while relevant evaluation measures are based on all the relevant items (highly relevant items + marginally relevant items).

Highly relevant evaluation.
In the highly relevant scenario, the main goal is to evaluate the performance of algorithms when models vary by both geometric shape and texture.
Figure 7 shows the performances of the six methods in terms of the average precision-recall curve, which is obtained as the average of the precision-recall curves computed over all the queries.To ease results visualization, the plot in Figure 7 includes only the best run for each method, that is, runs with the highest mean average precision (mAP) score.We remind that, for ideal retrieval systems, the mAP score equals to 1. Fig. 7 Highly relevant precision-recall curves for the best run of each method.
The runs of Figure 7 have been analysed also in terms of the weighted average mAP score computed over the 48 classes that represent the highly relevant scenario, where weights are determined by the size of the classes, see the second column of Table 2.Moreover, we considered also the percentage of classes whose mAP score is larger that some threshold values, namely 0.40, 0.55, 0.70, 0.85, see columns (3)(4)(5)(6) in Table 2.
Table 2 Highly relevant analysis for the runs in Figure 7: weighted average mAP score (first column), and how many of the 48 highly relevant classes have a mAP score exceeding values 0.40, 0.55, 0.70, 0.85 ((third -last column, respectively; results are reported in percentage points).The best two results are in gold and silver text, respectively.To further analyse retrieval performances against texture deformations, we restrict the mAP analysis to each specific class of colourimetric transformations described in Section 3.More precisely, we first let the algorithms run exclusively on set of null models.Then, we add only those elements that come as a result of one of the five texture transformations used to generate the entire dataset.Note that, according to the procedure used to create the benchmark, each texture deformation is always applied together with a geometric one, hence it still makes sense to apply the highly relevant paradigm along the evaluation process.Table 3 summarizes the results.

Runs
Looking at how the performances degrade across the different families of transformations, it can be noted that all methods appear to be not too sensitive against transformations of type 2, while the worst results are distributed among transformations of type 1 (runs Gi3, LBG3, TAS and Ve2), and 3 (runs BCGS3 and GG1).
Table 4 reports the best highly relevant performances in terms of Nearest Neighbour, First Tier and Second Tier evaluation measures.Additionally, the last column of Table 4 records the ADR measures.All the scores, which range from 0 (worst case) to 1 (ideal performance), are averaged over all the models in the dataset.Finally, Figure 8 shows the best run of each method according to the NDCG measure as a function of the rank p.In the present evaluation, the NDCG values for all queries are averaged to obtain a measure of the average performance for each submitted run.Remind that, for an ideal run, it would be NDCG ≡ 1.The NDCG measure takes geometric retrieval performances into larger account than texture ones.In-deed, geometric shape similarity is involved in the definition of both relevant and highly relevant items.This means that runs characterized by moderate geometric retrieval performances will be more penalized than others (see also Section 5.1.2).This is the case of run GG1, which is indeed definitely tuned for the highly relevant rather then the relevant scenario.
Discussion.Trying to interpret the outcome of the highly relevant evaluation, we are led to the following considerations: • The algorithm design associated with runs BCGS3 and Gi3 proposes a similar combination of geometric and texture information.Indeed, both methods rely on a hybrid shape description, in which texture contribution is in part based on a geometric-topological analysis of colourimetric properties and carried out in the CIELab colour space.In other words, a "structured" analysis of the colour channels is paired with the choice of a colour space that better reflects, with respect to the RGB one, human colour perception.Moreover, the geometric-topological approach allows for keeping track of the underlying connectivity of 3D models, thus providing additional information about the spatial distribution of colourimetric shape properties; • Run GG1 represents a combined shape descriptor, whose texture contribution is based on considering a normalized version of the RGB colour space.Such a choice seems to imply a good robustness against texture affine transformations.Incidentally, it should be noted that, in spite of presenting only the best runs to ease readability and visualization of results, runs GG2 and GG3, which are based on the greyworld RGB channels normalization, exhibit results which are comparable with those of run GG1; • As for runs LBG and Ve, texture description is accomplished through standard histograms of RGB colour channels, although runs LBG incorporate some additional information as the result of considering a multi-resolution approach applied to shape sub-parts.However, it seems that dealing with colourimetric information in other colour spaces, such as the CIELab one or variations of the RGB colour space, allows for a representation of colour that is more robust to the texture deformations proposed in this benchmark.

Relevant evaluation.
In this Section we analyse the performances of methods with respect to their capability of retrieving relevant items; in this case shape (dis)similarity depends only on geometric shape properties.In analogy to Section 5.1.1,Figure 9 shows the best runs for all methods in terms of the average precision-recall curve.Table 5 reports the mAP scores of those runs, which are now averaged over the 16 geometric classes composing the dataset.Also in this case, we report for how many classes the mAP score exceeds values 0.40, 0.55, 0.70, 0.85.6, first column), it seems that the overall relevant performance is still not ideal here.This is actually not surprising, since most of methods considered for evaluation have been specifically tuned for dealing with both texture and geometric shape modifications.Nevertheless, the relevant evaluation can be used as a lever for further comments about the benchmark and the considered methods.As an overall comment, it is worth mentioning that a large part of geometric shape modifications are not metric-preserving.Indeed, the geometric deformations used to create the benchmark alter both the extrinsic and the intrinsic properties of shapes.This suggests the need for the development of more general techniques for 3D shape analysis and comparison.This is actually one of the most recent trends in the field, see e.g.[60,59].
More in detail, we observe that: • The good performance of run LBG2 relies on two main motivations.First, this run is completely determined by a geometric contribution, and therefore it is not affected by any other, possibly misleading, information about texture shape properties.Second, the method represented by run LBG2 is spectral-based, and therefore is able to capture intrinsic shape properties.As a consequence, it is invariant to rigid shape transformations, as well as some non-rigid deformations such as pose variations and bendings, which are all present in the dataset.Finally, differently from runs Gi3 and Ve2, whose geometric contribution is intrinsic as well, the descriptive power of the spectral-based approach is improved by additional spatial information provided by the intrinsic spatial pyramid matching (see Section 4.2 for details); • A similar reasoning about the invariance under rigid and non-rigid deformations also holds for run BCGS1, whose associated method can be considered as "mostly" intrinsic.Indeed, the geometric contribution relies in this case on a collection of descriptors which are mainly intrinsic, considering either spectral-based functions or geodesic distances or Gaussian curvature; • The relatively good performance of run TAS can be explained by the fact that it mixes an extrinsic approach with a "view-based" strategy that is widely acknowledged as the most powerful and practical approach for rigid 3D shape retrieval [67].Even when a 3D model is articulated, non-uniformly deformed or partially occluded, the number of views (38, actually) used in this implementation should limit the noisy effect possibly generated in the images captured around an object; • Run GG2 mainly focuses on radial and spherical local symmetries.While the approach appears to be robust when the analysis is restricted to a single class of geometric transformations, in the most general case (i.e. the whole dataset) results are affected by a non optimal trade-off between geometric and colourimetric contribution.In this way, the relevant retrieval performance is partially conditioned by the inter-class texture variations.

Classification performances.
In the classification task, each run results in two classification matrices, one for geometry and one for texture, which are derived from the 1-NN classifier associated with the dissimilarity matrices used for the retrieval task.Hence, for a classification matrix C, the element C(i, j) is set to 1 if model i is assigned to class j, meaning that j is the nearest neighbour of i, and 0 otherwise.
Figures 10 and 11 represent the confusion matrices for the best runs of each method.For visual purposes, we have normalized the matrices with respect to the number of elements in each class, so that possible values range from 0 to 1.
Tables 7 and 8 provide a quantitative interpretation of the visual information contained in the confusion matrices.Indeed, the true positive rate (TPR), the true negative rate (TNR) and the Matthews correlation coefficient (MCC) can be directly computed from the elements of a confusion matrix.
More precisely, given a confusion matrix CM and a class ī, it is possible to derive the associated TPR, TNR and MCC as follows.It is first convenient to introduce the number of true positive TP, the number of false negative FN, the number of false positive FP and the number of true negative TN that are defined as TP = CM (ī.ī), FN = j =ī CM (ī, j), FP = j =ī CM (j, ī), TN = i,j CM (i, j) − (TP + TN + FP).Then, we get TPR, TNR and MCC by the relations .
In Tables 7 and 8, the reported values are averaged over all the considered classes (16 geometric and 12 textured ones).Discussions.Dealing with 1-NN classifiers, the classification results resemble somehow the nearest neighbour performances registered in Tables 4 and 6.Note however, that the dataset classifications considered in this  task are a purely geometric and a purely texture ones.
In particular, the latter does not coincide with the classification adopted in the highly relevant retrieval task, since geometric similarity is not involved.Also, results in Tables 7 and 8 are averaged on the dataset classes: this explains the slight discrepancy between the geometric TPR and the relevant NN measure that is averaged over all the elements in the dataset.As shown by Figure 10, geometric confusion matrices reveal a good classification performance of the methods.Indeed, all matrices appear almost diagonal, meaning that almost all elements in the dataset should be correctly classified.This qualitative intuition is con-firmed by the TPR scores reported in Table 7. Furthermore, TNR values are even higher, thus revealing that all methods are close to the optimal performance in detecting true negatives (e.g., "non-tables" correctly identified as such).As much in the same way, the MCC measure assigns scores very close to 1 for all methods.
Nevertheless, Figure 11 shows that, while the geometric classification of the considered runs are roughly comparable, in the texture scenario a sort of transition occurs, in such a way that three methods (BCGS3, GG3, Gi2) perform substantially better than the others.The numerical details in Table 8 highlight that the main differences are at the TPR and MCC level, revealing that the confusion highlighted in Figure 11 is essentially in the localization of true positives (e.g., "tables" correctly identified as tables).
Finally, it is worth noting how, in the texture classification task, best performances still come from those methods dealing with texture information in colour spaces which differ from the standard RGB one, that is, CIELab and the greyworld normalized RGB colour space.

Discussion and conclusions
In this paper, we have provided a detailed analysis and evaluation of state-of-the-art retrieval and classification algorithms dealing with an emerging type of content, namely textured 3D objects, which we believe deserve attention from the research community.The increasing availability of textured models in Computer Graphics, the advances in 3D shape acquisition technology which are able to acquire textured 3D shapes, the importance of colour features in 3D Shape Analysis applications together call for shape descriptors which take into consideration also colourimetric information.
Beyond the extensive analysis that has been carried out throughout the paper, we hope that the experimental results presented here may offer interesting hints for further investigation.We list a few as follows: • On the one hand, the retrieval performances are positive for all methods, in either the relevant or the highly relevant scenario, or both.On the other hand, the NDCG and ADR measures are specifically designed for interpreting a multi-level dataset classification as in this case, and thus offer a complementary evaluation to the above ones.As can be seen by Figure 8 and Table 4, results in this sense are quite far from being optimal: indeed, the best possible value for the ADR score is 1, while the highest registered scores fluctuate around 0.5; similarly, the highest possible area under a NDCG curve equals 1, while the best scores in this contribution are around 0.75.In other words, the benchmark was challenging and call for further improvements and new strategies able to deal with non-isometric geometric deformations, as well as affine texture deformations; • Results achieved by some of the proposed runs suggest that a structured colourimetric analysis could be more informative that a purely histogram-bases one.However, there is probably still a long road ahead in this sense.For example, an interesting question could be how to generalize, in a reasonable colourimetric sense, well-know extrinsic and intrinsic geometric properties, for example as in the case of the colour-aware geodesic distances, see Section 4.4 for details; • An issue deserving further investigation is to understand which approach is preferable for textured shape analysis, either a combined or a hybrid one.For instance, the aforementioned colour-aware geodesic distances and the topological approach for the analysis of colour information appear to be promising hybrid solutions to extract useful information from texture shape attributes.Nevertheless, it should be noted that such approaches have been complemented with purely geometric and colourimetric contributions, in order to achieve satisfactory retrieval and classification performances.Another meaningful is the one provided by MAPT-based algorithms, which obtained toprank results in the highly relevant scenario: however, a combined approach generally performed better than a hybrid one in a retrieval context, being the opposite in the classification task.Based on the above remarks, it seems that the overall picture is still quite unclear, calling for a deeper understanding of how geometric and colourimetric shape properties can be jointly analysed; • Some of the best retrieval and classification results have been accomplished through the use of the CIELab colour space, as well as variations of the more classical RGB colour space.Indeed, the CIELab space well represents human perception of colour, and hence appears as a more natural choice.As for the considered variations of the RGB colour space, it seems that they allow to better cope with the affine colourimetric transformations that have been included in the proposed benchmark.Obviously, these are not the only possible alternative to the RGB colour space.Also, the affine colourimetric transformations considered here are just a subset of all the possible texture modifications.Therefore, it could be interesting to investigate how changing the choice of colour representation might affect performance results in retrieving and classifying textured 3D models, as well as which particular choice in the colour space is better suited to face with certain classes of texture deformations.

Fig. 1 Fig. 2
Fig.1The collection of textured 3D models used in the comparative study.

Fig. 4
Fig. 4 Proposed method includes a shape descriptor from the geodesic distance matrix and a colour descriptor from the histogram representation of RGB colour information.Details are included in section 4.3

Fig. 6
Fig. 6 Generating a PHOG signature.First row: A shape S (left), the function f geod (center) and the corresponding persistence diagram Dgm(f geod ).Second row: the mutual distance matrix M DM (S), the function f L and the corresponding persistence diagram Dgm(f L ).

Fig. 8
Fig. 8 Performances of the best runs w.r.t. the NDCG measure (run GG1 is almost totally covered by runs Gi3, LBG3 and TAS).

Fig. 9
Fig.9Relevant precison-recall curves for the best run of each method.

Table 4
Best NN, FT, ST and ADR values for each method.Numbers in parenthesis indicate the run achieving the corresponding value.For each evaluation measure, the best two results are in gold and silver text, respectively.

Table 5
Relevant analysis for the runs in Figure9: weighted average mAP score (first column), and how many of the 16 relevant classes have a mAP score exceeding values 0.40, 0.55, 0.70, 0.85 (third -last column, respectively; results are reported in percentage points).The best two results are in gold and silver text, respectively.

Table 6
reports the best relevant retrieval performances according to the Nearest Neighbour, First Tier and Second Tier evaluation measures.All scores are averaged over all the models in the dataset.Discussion.Apart from the nearest neighbour scores (Table

Table 6
Best NN, FT and ST values of each method.Numbers in parenthesis indicate the run achieving the corresponding value.For each evaluation measure, the best two results are in gold and silver text, respectively.

Table 7
Averaged geometric TPR, TNR and MCC for the best run of all considered methods.The best two results are in gold and silver text, respectively.

Table 8
Averaged texture TPR, TNR and MCC for the best run of all considered methods.The best two results are in gold and silver text, respectively.