Semantic content ranking through collaborative and ...

Viewer
Transcript

ARTICLE IN PRESS Neurocomputing 71 (2008) 2587– 2595

Contents lists available at ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

Semantic content ranking through collaborative and content clustering Marios C. Angelides , Anastasis A. Sofokleous Brunel University, Uxbridge, Middlesex UB8 3PH, UK

a r t i c l e i n f o

abstract

Available online 7 May 2008

COSMOS-7 models semantic content in MPEG-7 such as objects and events and their spatiotemporality. When a user queries a COSMOS-7 model, the output is usually presented as a sequence of relevant, albeit unranked, video segments through which the user must sift. In this paper, we report how we use Self-Organising Neural Networks (SONNs) to cluster and rank the video segments through consideration of user preferences and knowledge gained from usage of the same content by similar users and of similar content by the same user. & 2008 Elsevier B.V. All rights reserved.

Keywords: MPEG-7 content modelling and ﬁltering Collaborative and content clustering Self-Organising Neural Networks

1. Introduction COSMOS-7 is an MPEG-7 semantic meta-data modelling and ﬁltering scheme [2]. It uses MPEG-7’s Multimedia Description Schemes (MDSs) which semantically describe objects and events and the spatio-temporal relationships between them [17,18]. Like most other multimedia content modelling and ﬁltering schemes, such as [30,32], it does not model low-level syntactic features but rather high-level semantic features that are meaningful to the user [14]. A COSMOS-7 ﬁltering manager allows the user both to retrieve meta-data detail of relevant content and discard that which is for content that is irrelevant. Retrieving content from COSMOS-7 returns all video segments that satisfy the user ﬁlter criteria, in temporal order, rather than in order of relevance to user preferences. A user may choose to view some of these video segments but may not always select to watch the most-suited according to his preferences. Choosing to view all the ﬁltered video segments is not a viable alternative. This paper proposes an add-on ranking module to COSMOS-7 that uses Self-Organising Neural Networks (SONNs) to cluster and rank the video segments through consideration of user preferences and knowledge gained from usage of the same content by similar users and of similar content by the same user. The ranking module can be added to COSMOS-7 or any other open system. The module uses two SONNs; the ﬁrst for clustering users into optimal groups through consideration of user attributes and the second for clustering content through consideration of content features. The use of the add-on was ﬁrst reported in [3]. This paper builds upon that work by extending the usage history model to include user and content ranking information, by re-engineering Corresponding author. Tel.: +44 1895 265990; fax: +44 1895 269726.

E-mail addresses: [email protected] (M.C. Angelides), [email protected] (A.A. Sofokleous). 0925-2312/$ - see front matter & 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.neucom.2007.10.029

the add-on ranking module to consider that, and by comparing the SONNs’ performance against that of the average linkage (AVL) and the k-means algorithms. The rest of the paper is organised as follows. Section 2 describes how COSMOS-7 models content semantics in MPEG-7 and COSMOS-7’s content ﬁltering approach. Section 3 gives a detailed account of the clustering and ranking module. Section 4 compares the clustering performance of SONNs against that of the AVL and the k-means algorithms. Section 5 concludes the paper.

2. Overview of COSMOS-7 In this section we ﬁrst specify the attributes that need to be modelled in order to provide an all-inclusive description of the semantic content of a video stream and then describe how COSMOS-7 models these attributes using speciﬁc MPEG-7 MDS tools in order to produce such a rich and multi-faceted content model. We then describe how the COSMOS-7 ﬁltering manager extracts content following a user’s retrieval request. A semantic content model has to be tightly integrated with the video streams using referencing strategies. In this way, the ﬁltering process may determine the meaning conveyed by the media within the archive so as to compare against the speciﬁed ﬁltering criteria. In [2], we have identiﬁed four key aspects that are necessary in order to achieve this: events, and their temporal relationships, objects, and their spatial relationships. Events within the semantic content model represent the context for objects that are present within the video stream at various structural levels. Events are occurrences within the video stream that divide it into shorter semantically continuous content segments involving speciﬁc objects, and can frequently serve to represent object behaviour.

ARTICLE IN PRESS 2588

M.C. Angelides, A.A. Sofokleous / Neurocomputing 71 (2008) 2587–2595

Temporal relationships between events enable the semantic content model to express the dynamism in content that is apparent at these higher levels, thereby, enabling ﬁltering of non-static semantic content which is more in line with ‘‘What will or did happen here?’’ Again, this may occur on both a general and a speciﬁc level. Objects and object properties within the semantic content model refer to objects that are readily identiﬁable within the video stream. The term ‘object’ refers to any visible or hidden object depicted within a video frame at any necessary level of detail, from entire objects and groups of objects to the bottommost component objects. Objects may themselves exist within a structural hierarchy thereby enabling inheritance of properties from higher level objects. Spatial relationships between objects within the semantic content model concern the relative location of objects rather than the absolute location that comes from a co-ordinate-based representation. This enables reference to be made to the relationships between objects within the content and can provide for three-dimensional spatial representations, including those concerning hidden objects, which are difﬁcult to derive from co-ordinate representations. Spatial relationships have a duration due to their occurrence across multiple frames. Because of object motion, spatial relationships between two objects may differ over time within the same segment. The semantic content aspects described above have a generic applicability since virtually all domains require some representation of objects and/or events, including some relationships between them. For instance, entertainmenton-demand [4], multimedia news [29], and organisational content [27]. Hence, when these semantic aspects are accommodated by a content modelling scheme the resulting model can be used to model semantic content for most mainstream domains and user groups and, consequently, facilitate ﬁltering for those domains and user groups. Fig. 1 shows the key MPEG-7 MDS that are used by COSMOS-7 for each semantic aspect and illustrates how they are interrelated. A user will very often only be interested in certain video content. For example, when watching a soccer game the user may only be interested in goals and free kicks. Identifying and retrieving subsets of video content in this way requires the user’s preferences for content to be stated, so that content within a digital video resource may then be ﬁltered against those preferences. While newer approaches, such as those using agents are emerging [35], ﬁltering in video streams usually uses contentbased ﬁltering methods which analyse content features so that these may then be ﬁltered against the user’s content requirements [9,10,34]. Fig. 2 illustrates the process used by the ﬁlter manager. The speciﬁc clauses relating to the different aspects are extracted and then mapped to the COSMOS-7 terms that represent these aspects, which in turn becomes a homogenous ﬁlter for the COSMOS-7 content model. The ﬁltering process begins with speciﬁc reasoning logic that matches the ﬁlter criteria for each type of semantic content aspect to corresponding elements found in the content model. In this way, the returned content can be considered to be a subset of the full content model and thus the ﬁltering process reduces the content model to those segments relevant to the ﬁltering criteria. Content ﬁlters are speciﬁed using the FILTER keyword together with one or more of the following: EVENT, TMPREL, OBJ, SPLREL, and VIDEO. These may be joined together using the logical operator AND. This speciﬁes what the ﬁlter is to return. Consequently, the output of the ﬁltering process may be either semantic content information and/or video segments containing the semantic content information. The criteria are speciﬁed using

the WHERE keyword together with one or more of the following clauses: EVENT, TMPREL, OBJ, and SPLREL clauses. These clauses enable event, temporal relationship, object and their properties and spatial relationships, respectively, to be speciﬁed on the COSMOS-7 representation. Clauses may be joined together using the logical operators AND, NOT and OR and are terminated with semi-colons and grouped with braces.

3. Ranking semantic content SONNs have been used widely in various applications for clustering video segments, [20,22,28] and for content retrieval such as query by example [8], because they exhibit a higher degree of stability (their learning rate decreases to zero after a ﬁnite number of learning iterations and no pattern changes their clustering thereafter) and yield higher clustering accuracy (the right number of classes with reference to ground truth classes) and yet plasticity (ability to adapt to new data without having to re-cluster) comparison to approaches such as k-means and AVL [6,11,21,31]. SONNs, k-means and AVL can all be used for clustering multidimensional spaces. However, SONNs rank better in terms of stability, plasticity and accuracy because of training and their design. During training, SONNs use the input vector of each cycle to update the winning output neuron and its neighbour neurons. Gradually, the input space structure is mapped on weight vectors as similar input data are mapped on nearby neurons. Reordering and optimisation of the neurons will terminate after a ﬁnite number of learning iterations, when the SONN learning rate reduces to zero. Thereafter, a SONN will only need to reorder or optimise its neurons if new input data are not similar to those mapped on neurons. But in order to remove or at least reduce noise, SONNs average the input data. Not having to reorder or optimise every time new but similar data is input to the SONN yields stability and at the same time plasticity. This is equivalent to developing rules over time for classifying data. Cluster accuracy is directly proportional to cluster stability and plasticity, after learning reduces to zero. Contrary to SONNs, k-means and AVL record all input data as they need to use them to re-order and optimise the clusters [23]. When new data are input, a re-clustering exercise takes place which takes into account all data and existing clusters. This does not always yield the same clusters which in turn does not give rise to cluster stability or plasticity or cluster accuracy. The accuracy of k-means and AVL depends on the choice of initial classes which is a random assignment. For example, k-means may provide relatively unsuitable clusters as a result of the initial random clustering assignment and the greediness of the algorithm during re-clustering which does not allow cluster reﬁnement or corrections [5,33]. One clustering application of SONNs where the authors demonstrate that SONNs can detect patterns in massive amounts of unorganised and non-uniform data with some degree of noise is reported in [19]. Their nSOM method utilises two SONNs when organising non-uniform data and through experimentation they show that SONNs can cluster efﬁciently multidimensional data with noise from multiple spaces data. Another application that uses multiple SONNs, each trained with a different dataset, for annotating new video segments and images based on similar annotated content is PicSOM [15,16,24]. Data are manually annotated, initially, and then divided into a set of training data and ground truth classes. PicSOM uses Tree-Structured SelfOrganising Neural Maps (TS-SONMs) [13] in order to organise hierarchically objects, such as still images and video segments, from coarse top levels to detailed bottom levels. Integrating

ARTICLE IN PRESS M.C. Angelides, A.A. Sofokleous / Neurocomputing 71 (2008) 2587–2595

2589

Fig. 1. MPEG-7 MDS in use in COSMOS-7 [3].

multilevel tree-structure image representations with SONNs enables a simple but structured retrieval of images. A similar approach is presented in [7], where a SONN retrieves images organised on a two-level tree. The top level describes the features of the whole image, whereas the second level describes features,

such as colour, shape and size, of segmented regions of the image mapped as child nodes. During retrieval, the SONN returns to the user N related images who then selects those which are relevant. All examples presented in this section show a high degree of stability, clustering accuracy and plasticity.

ARTICLE IN PRESS 2590

M.C. Angelides, A.A. Sofokleous / Neurocomputing 71 (2008) 2587–2595

Fig. 2. Filtering in COSMOS-7 [3].

Our review of research has led to the development of an addon ranking module to COSMOS-7 which can be applied to the ﬁltered results. COSMOS-7 yields video segments that may match exactly the search criteria but only partially the user’s current content information requirements. The module uses two SONNs to cluster users and video segments with reference to a user’s content information preferences and usage history (i.e. according to previous knowledge acquired from usage of the same content by similar users and of similar content by the same user). Fig. 3a depicts a SONN for clustering users into optimal groups through consideration of user attributes and Fig. 3b depicts a SONN for clustering content through consideration of content features. Fig. 3a features the multiple input vectors which carry all users’ preferences such as genre (e.g. sports) and presentation (e.g. modality, resolution, etc.) to n input nodes and w output nodes of user classes on a 2D grid. Fig. 3b features the multiple input vector which carry all content’ features such as genre (e.g. drama, action, etc.), parental guide (U, 12, 15, 18) and objects (e.g. actors, role, etc.) to n input nodes and w output nodes of content classes on a 2D grid. Both grids are orthogonally depicted for simplicity and clarity. Fig. 4 depicts the architecture of the add-on ranking module. The add-on ranking module consists of two ﬁlters, a collaborative and a content ﬁlter, which are responsible for ranking video segments based on collaborative and content

ﬁltering and recommendation principles. Collaborative ﬁltering and recommendation principles dictate that a user should be recommended items that are preferred by users with similar tastes and preferences, whereas content ﬁltering and recommendation principles use a user’s previous history to recommend new items. Filters rely heavily on prior-knowledge, in this case both about users and content, in order to recommend accurately. COSMOS-7 asks every new user to complete a survey which involves rating a number of video segments selected from different clusters. This information is recorded under the user’s content preferences in their initial user model which will aid the collaborative ﬁlter to allocate the user to cluster. However, initial user models may not be sufﬁciently populated with content preferences, and as a result may not provide sufﬁcient information to the collaborative ﬁlter to allocate a new user to the right cluster or for rating segments. Consequently, the collaborative ﬁlter will rely largely on the usage history of other users in the cluster in which the user has been initially allocated to in order to determine if they have been assigned to the right cluster. The initial model evolves over time as a user continues to consume and rate video segments. When user Uf requests content retrieval from COSMOS-7, the collaborative ﬁlter takes as input the set of k unranked video segments, VS1, VS2,y, VSk whose initial rating value is nil: Vset ¼ {(VS1, R(VS1) ¼ nil),y, (VSk, R(VSk) ¼ nil)}.

ARTICLE IN PRESS M.C. Angelides, A.A. Sofokleous / Neurocomputing 71 (2008) 2587–2595

2591

Fig. 3. (a) User cluster SONN, (b) Content cluster SONN.

3.1. Map user to attributes

3.2. Cluster users using SONNs

The collaborative ﬁlter maps users to attributes into a twodimensional matrix. Attributes, described in XML User description ﬁles, include personal information, such as age, sex and location, and preferences, such as conversion, genre and modality. A user’s personal information and preferences are described using MPEG-7 and stored in the usage history library. The matrix will be used by the SONN to cluster users. An XML parser reads each user model and extracts from the UserAttributes.xml ﬁle a numeric value that is given to each user attribute, if not numeric. This ﬁle associates a numeric XPATH value to each alphanumeric attribute, {attribute name, XPATH}. Whilst an XPATH may contain alphanumeric data such as the name of an attribute, e.g. ‘‘y//User/Preferences/ y[@AttributeName]’’ a SONN will only operate on numeric data. Using a Java XML library, each XML User is parsed and the numeric attribute values are input into the matrix, where the ﬁrst column lists all user attributes and each column, thereafter, lists the value of each attribute for a user. Hence, column i+1 is the input vector of the ith user.

Each matrix column of attributes will be used by the SONN to cluster users into groups of users with similar tastes and preferences. The SONN is implemented in MatLab. The numeric input vectors for all users are passed to the SONN which uses the columns to assign users to a class. In order to create a competitive network in Matlab, newc is used for setting the number of neurons, the learning rate and the boundaries of the input values. 3.3. Calculate R(VSi) In order to assign a utility value to each VSi, the system checks how each is rated by similar users and the aggregate function R(VSi) calculates its utility value: 8 nil; if 8U n 2 GðU f Þ ! rðVSi ; U n Þ ¼ nil > > > P < rðVSi ; U n Þ RðVSi Þ ¼ U 2GðU Þ&rðVS n f i ;U n Þanil > > > : jfU n : U n 2 GðU f Þ&rðVSi ; U n Þanilgj

ARTICLE IN PRESS 2592

M.C. Angelides, A.A. Sofokleous / Neurocomputing 71 (2008) 2587–2595

Fig. 4. The add-on ranking module architecture.

where G(Uf) returns all users that belong in the same cluster as user Uf and r(VSi, Un) returns the utility value set of Un for video segment VSi. The latter returns nil if Un has not yet rated VSi. That is, if none of the users in the cluster have rated VSi, R(VSi) will return nil, otherwise the rating value of video segment VSi for user Uf is calculated as the average of the ratings of those users in the same user group G who have rated the video segment VSi. One disadvantage is that new video segments, or even existing ones, which have not yet been rated by anybody, cannot be considered by the collaborative ﬁlter. Thus, until these segments are rated by some users, the ﬁlter will not deliver any rating values. In order to overcome this shortcoming, new or non-rated segments are considered by the content ﬁlter whose objective is to recommend segments which are similar to those which the user consumed previously. Ranking values are stored in the XML Ranking History of each user.

3.4. Map video to features In order to cluster the video segments, the ﬁlter maps the segments to features into a two-dimensional matrix. Video features, described using MPEG-7 and stored in the content history library, include genre, such as action and drama, Parental Guide, video semantics, such as events and objects, and video structure, such as modality, bit rate and format. In similar fashion to its counterpart that maps user to attributes, an XML parser

maps video features to a matrix using the VideoFeatures.xml ﬁle. Using a Java XML library, each XML video is parsed and the numeric attribute values are input into the matrix, where the ﬁrst column lists all features, and each column, thereafter, lists the value of each feature for a video segment. Hence, column i+1 is the input vector of the ith video segment. 3.5. Cluster new videos using SONNs Each matrix column of features will be used by the SONN to cluster video segments into groups of video segments with similar features. The second SONN is also implemented in Matlab and in similar fashion with the ﬁrst SONN. The numeric input vectors for all video segments are passed to the SONN which uses the columns to assign video segments to a class. 3.6. Calculate R(VSi) In order to assign a utility value to each non-rated VSi, VSet0 ¼ {VSi|VSiAVset and R(VSi) ¼ nil}, the system checks how similar segments are rated by the same user and the aggregate function R(VSi) calculates their utility value: P rðVSi U f Þ RðVSi Þ ¼

VSn 2GðVSi Þ

jGðVSi Þj

ARTICLE IN PRESS M.C. Angelides, A.A. Sofokleous / Neurocomputing 71 (2008) 2587–2595

where G(VSi) returns the cluster of VSi and r(VSi, Uf) returns the rating value of the VSi set for user Uf. The rating value of segment VSi for user Uf is calculated as the average rating value of similar segments rated by user Uf. The segments rated by the content ﬁlter are merged with those rated by the collaborative ﬁlter. This is also used to update the usage library. The user can now either choose to view selected or all those segments which have a utility value. Ranking values are stored in the XML Ranking History of each user. The combined result of the collaborative and content ﬁlters is a relative order of video segments through consideration of their ranking value which is recorded in the XML Ranking History. Fig. 5 gives the pseudo-code for the clustering and ranking add on for a set of unranked video segments, Vset, for user Uk. The method applyﬁlter (y) returns VSet that consists of a set of unranked video segments. The module ﬁrst clusters all users and extracts user Uk’s cluster G(Uz). It then rates each video segment while it marks out those that cannot be rated, such as those that are new. In order to rate new segments, the module clusters video segments and then extracts the cluster of each non-rated segment.

2593

took our short course in content modelling with a number of videos from their repository of theatre productions. For the evaluation reported in this section, we have chosen forty videos whose COSMOS-7 content models we have helped the users populate to completion. The COSMOS-7 content models describe in XML video content semantics, such as genre, duration, language, target age, actors, video segments with their start and end frames. Their initial user models were populated from a user survey which included personal questions, such as age, sex and location, and questions on preferences, such as genre and modality. Eighteen user models and 18 COSMOS-7 models were used for training the two SONNs. The size of the 2D output nodes was set to 6, both for the user and content clusters. Fig. 6 describes the training process of the user cluster SONN. The content cluster SONN is trained in a similar way. The weight vector of each SONN neuron is randomly initialised with a value set between 0 and 1.

4. Evaluating the add-on The performance of COSMOS-7 both as a content modelling and ﬁltering tool has been evaluated extensively and the results have been reported in [2]. This section evaluates the performance of the add-on prototype to COSMOS-7 as a clustering and ranking tool, the former through ground truth classes and the latter by calculating average user satisfaction with a group of users. Ground truth classes cannot be used for evaluating the ranking performance of the add-on because unlike the user and content clusters which remain fairly stable over the entire course of interaction, rankings will vary as new users and video segments enter the system and the ﬁltering criteria change as a result and the usage history evolves accordingly. For example, during one round of ranking, a new video segment Vi may be ranked top with reference to a user ﬁlter fk, but may slide to the bottom during another as a result of newer segments entering the system. At ﬁrst, we evaluate the add-on as a ranking tool with our regular group of evaluators before we consider its clustering performance. The end-user satisfaction is a subjective rating and impacts user satisfaction at the time of request. We have tested the ranking performance of the add-on on a windows 2003 server using Pentium 2.2 GHz, 1024 MB Ram and 160 GB HD. We have selected 40 non-specialist users from the staff of the theatre company that participated in the evaluation of COSMOS-7 reported in [2]. For the user evaluation of COSMOS-7’s content modelling and ﬁltering reported in [2], the users under-

Fig. 5. Pseudocode for ranking VSet for user Uk.

Fig. 6. Training of the user cluster SONN.

Fig. 7. Difference in output between COSMOS-7 with and without the add-on ranking.

Fig. 8. Difference in average user satisfaction over time between COSMOS-7 with and without ranking.

ARTICLE IN PRESS 2594

M.C. Angelides, A.A. Sofokleous / Neurocomputing 71 (2008) 2587–2595

Fig. 7 shows the difference in output and Fig. 8 shows the difference in average user satisfaction between COSMOS-7 with and without the add-on ranking module. As the number of users and content grows and usage history evolves, user and content clusters and content ranking values are reﬁned which in turn results in increasing user satisfaction.

Fig. 9 shows the difference in average user satisfaction over 40 segments between COSMOS-7 with and without the add-on ranking module. Whilst user satisfaction is higher with the addon, rating values will decrease proportionally with segment relevance. Fig. 10 shows the average standard deviation across user evaluations over time. The following metric calculates the difference between predicted and actual user rating as a mean absolute error: jVSetj P

AE ¼

Fig. 9. Average user satisfaction between COSMOS-7 with and without add-on ranking.

Fig. 10. Average standard deviation over time.

ABSðRðVSet½iÞ URðVSet½iÞÞ

i¼1

jVSetj

where R(Vset[i]) is system rating and UR(Vset[i]) is user rating for the ith segment. Its drawback is that it considers all segments regardless of the user interests. We have tested the clustering performance of the two SONNs in comparison to the AVL and k-means methods with reference to ground truth classes of content and user clusters sourced from what our research regards as the ideal MPEG-7 content and user models [1]. AVL clusters users through a process of merging pairs of classes in order to achieve the speciﬁed number of clusters, k. The pair of classes to be merged is selected according to the minimum average distance between pairs of users across the two classes (i.e. the nearest neighbour criterion). Initially, the algorithm turns each user into a single-user class. k-mean clusters users through a process of ﬁnding the user with the minimum Euclidean distance to the rest of the users in each class, i.e. the centroid, and then reassigning each user to the cluster of their nearest centroid. Initially, the algorithm clusters the users randomly into k classes. Fig. 11 shows the difference in the results obtained by the application of each of the three clustering methods. The ﬁgure depicts the k level of accuracy of all three methods to determine a set number of clusters with reference to the ground truth classes. The results show that whilst for a low number of clusters SONNs do not outperform AVL and k-means, for six clusters and over SONNs yield a consistently higher clustering accuracy; SONNs manage to allocate more video segments in the right cluster in comparison to k-means and AVL. What the results have also unravelled is that the performance of the three methods converges as the number of clusters increases with SONNs only signiﬁcantly performing better. Whilst our own data set cannot cater for a higher number of clusters, further experimentation may seek to determine an optimal number of clusters. The signiﬁcance of collaborative and content clustering is that content retrieved from schemes such as COSMOS-7 can be ranked

Fig. 11. Comparing the SONN performance against AVL and k-means.

ARTICLE IN PRESS M.C. Angelides, A.A. Sofokleous / Neurocomputing 71 (2008) 2587–2595

according to knowledge gained from usage of the same content by similar users and of similar content by the same user. 5. Concluding discussion Querying COSMOS-7 will return information-rich content that matches closely the user’s search criteria but not the user’s current content information requirements. This paper proposes a ranking module as an add-on to COSMOS-7 that uses SONNs to cluster video content through consideration of knowledge gained from usage of the same content by similar users and of similar content by the same user. This will enable COSMOS-7 to take into account, in addition to the search criteria, a user’s current content information requirements, and, therefore, return content that matches both. The module uses one SONN for clustering users into optimal groups through consideration of user attributes and another SONN for clustering content through consideration of content features. In addition, COSMOS-7’s usage history model is extended to include user and content ranking information. We currently investigate other evolutionary approaches to content ranking, such as genetic programming, and building user proﬁles that consist of user data, such as user current location and demographics, which might provide useful insight into location or movement tracking and habitual preferences [25,26]. Future work with COSMOS-7 may investigate ‘‘intelligent content selection’’, whenever a perfect match cannot be found and the system would need to prioritise the searching and ﬁltering criteria and user preferences in order to select automatically a video [12]. References [1] H. Agius, M.C. Angelides, Closing the content-user gap in MPEG-7: the hanging basket model, Multimedia Systems 13 (2) (2007) 155–172. [2] M.C. Angelides, H. Agius, An MPEG-7 scheme for semantic content modelling and ﬁltering of digital video, ACM Multimedia Systems 11 (4) (2006) 320–339. [3] M.C. Angelides, A.A. Sofokleous, M. Parmar, Classiﬁed ranking of semantic content ﬁltered output using self-organizing neural networks, in: Lecture Notes in Computer Science 4132: Proceedings of the 16th International Conference on Artiﬁcial Neural Networks (ICANN 2006) Part II, 2006, pp. 55–64. [4] J. Assfalg, M. Bertini, C. Colombo, A.D. Bimbo, Semantic annotation of sports videos, IEEE Multimedia 9 (2) (2002) 52–60. [5] A. Azimi, M.R. Delavar, Quality assessment in spatial clustering of data mining, in: Proceedings of the Fifth International Symposium on Spatial Data Quality (ISSDQ 2007), 2007. [6] S. Ben-David, U. von Luxburg, D. Pal, A sober look on clustering stability, in: Proceedings of the 19th Annual Conference on Learning Theory (COLT), 2006, pp. 5–19. [7] T.W.S. Chow, M.K.M. Rahman, S. Wu, Content-based image retrieval by using tree-structured features and multi-layer self-organizing map, Pattern Anal. Appl. 9 (1) (2006) 1–20. [8] D. Deng, Content-based image collection summarization and comparison using self-organizing maps, Pattern Recognition 40 (2) (2007) 718–727. [9] M. Eirinaki, M. Vazirgiannis, Web mining for web personalization, ACM Trans. Internet Technol. (TOIT) 3 (1) (2003) 1–27. [10] A.M. Ferman, J.H. Errico, P. van Beek, M.I. Sezan, Content-based ﬁltering and personalization using structured metadata, in: Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002, pp. 393–393. [11] A.K. Jain, M.N. Murty, P.J. Flynn, Data clustering: a review, ACM Comput. Surveys (CSUR) 31 (3) (1999) 264–323. [12] B. Ko¨hncke, W. Balke, Personalized digital item adaptation in service-oriented environments, in: Proceedings of the First IEEE-CS International Workshop on Semantic Media Adaptation and Personalization (SMAP 2006), 2006, pp. 91–96. [13] P. Koikkalainen, Progress with the tree-structured self-organizing map, in: Proceedings of the 11th European Conference on Artiﬁcial Intelligence (ECAI 94), 1994, pp. 211–215. [14] I. Koprinska, S. Carrato, Temporal video segmentation: a survey, Signal Process: Image Commun. 16 (5) (2001) 477–500. [15] M. Koskela, J. Laaksonen, Semantic annotation of image groups with selforganizing maps, in: Proceedings of the Fourth International Conference on Image and Video Retrieval (CIVR 2005), 2005, pp. 518–527. [16] M. Koskela, J. Laaksonen, Semantic concept detection from news videos with self-organizing maps, in: Proceedings of the Third IFIP Conference on Artiﬁcial Intelligence Applications and Innovations (AIAI 2006), 2006, pp. 591–599.

2595

[17] J.M. Martı´nez, MPEG-7 overview of MPEG-7 description tools, part 2, Multimedia, IEEE 9 (3) (2002) 83–93. [18] J.M. Martı´nez, R. Koenen, F. Pereira, MPEG-7 the generic multimedia content description standard, part 1, IEEE Multimedia 9 (2) (2002) 78–87. [19] H. Matsushita, Y. Nishio, Competing behavior of two kinds of self-organizing maps and its application to clustering, IEICE Trans. Funda. Electron. Commun. Comput. Sci. E09-A (4) (2007) 865–871. [20] A. Naftel, S. Khalid, Classifying spatiotemporal object trajectories using unsupervised learning in the coefﬁcient feature space, Multimedia Systems 12 (3) (2006) 227–238. [21] S.K. Rangarajan, V. Phoha, K.S. Balagani, R.R. Selmic, S.S. Iyengar, Adaptive neural network clustering of web users, Computer 37 (4) (2004) 34–40. [22] A. Rauber, D. Merkl, The SOMLib digital library system, in: Proceedings of the Third European Conference on Research and Advanced Technology for Digital Libraries, 1999, pp. 323–342. [23] R. Rojas, Neural Networks: A Systematic Introduction, Springer, Berlin, 1996, p. 502. [24] M. Sjoberg, J. Laaksonen, V. Viitaniemi, Using image segments in PicSOM CBIR system, in: Proceedings of 13th Scandinavian Conference on Image Analysis (SCIA 2003), 2003, pp. 1106–1113. [25] A.A. Sofokleous, M.C. Angelides, C. Schizas, Mobile computing for mcommerce, in: M. Pagani (Ed.), Encyclopedia of Multimedia Technology and Networking, Idea Group Publishing, Hearshey, PA, USA, 2005, pp. 622–628. [26] A. Sofokleous, M. Angelides, C. Schizas, Mobile computing: technology challenges, constraints, and standards, in: I.K. Ibrahim (Ed.), Handbook of Research on Mobile Multimedia, Idea Group Publishing, Hearshey, PA, USA, 2006, pp. 1–10. [27] T.T. Thuong, C. Roisin, A multimedia model based on structured media and sub-elements for complex multimedia authoring and presentation, International Journal of Software Engineering and Knowledge Engineering 12 (5) (2002) 473–500. [28] F. Trentini, M. Hagenbuchner, A. Sperduti, F. Scarselli, A.C. Tsoi, A selforganising map approach for clustering of XML documents, in: Proceeding of the IEEE International Joint Conference on Neural Networks (IJCNN’06), 2006, pp. 1805–1812. [29] R. Troncy, Integrating structure and semantics into audio-visual documents, in: Proceedings of the second International Semantic Web Conference (ISWC 2003), 2003, pp. 566–581. [30] L. Tseng, L. Ching-Yung, J. Smith R., Using MPEG-7 and MPEG-21 for personalizing video, IEEE Multimedia 11 (1) (2004) 42–52. [31] A. Ultsch, Self organizing neural networks perform different from statistical kmeans clustering, in: Proccedings of GfKl, 1995, pp. 1–13. [32] P. van Beek, J.R. Smith, T. Ebrahimi, T. Suzuki, J. Askelof, Metadata-driven multimedia access, IEEE Signal Process. Mag. 20 (2) (2003) 40–52. [33] J. Vesanto, J. Himberg, M. Siponen, O. Simula, Enhancing SOM based data visualization, in: Proceedings of the Fifth International Conference on Soft Computing and Information/Intelligent Systems, 1998, pp. 64–67. [34] M. Wallace, G. Stamou, Towards a context aware mining of user interests for consumption of multimedia documents, in: Proceedings of the 2002 IEEE International Conference on Multimedia and Expo (ICME’02), 2002, pp. 733–736. [35] L. Wenyin, Z. Chen, F. Lin, H. Zhang, W.Y. Ma, Ubiquitous media agents: a framework for managing personally accumulated multimedia ﬁles, Multimedia Systems 9 (2) (2003) 144–156. Marios C. Angelides is Professor of Computing at Brunel University. He is a Chartered Engineer, Chartered Fellow of the British Computer Society and member of the ACM and the IEEE Computer Society. He holds a B.Sc. in Computing and a Ph.D. in Information Systems both from the London School of Economics, where he begun his career as lecturer in information systems in the late 1980s. He has been researching multimedia for over 15 years and the application of MPEG standards for the last seven. He has published extensively his work in top tier journals such as ACM Multimedia Systems, ACM Personal and Ubiquitous Computing, Multimedia Tools and Applications, IEEE Multimedia, the Computer Journal, Data and Knowledge Engineering, Decision Support Systems, Information and Management. In the last 4 years, he guest edited for IEEE Multimedia, Multimedia Tools and Applications, the Computer Journal and ACM Multimedia Systems. He currently serves on the editorial board of several journals including Multimedia Tools and Applications for which he has been an editorial board member since it begun publication. Anastasis A. Sofokleous is a Ph.D. graduate in Information Systems and Computing from Brunel University. He is a member of the IEEE Computer Society and the British Computer Society. He holds a B.Sc. and an M.Sc. both in Computer Science from the University of Cyprus. His doctoral research focussed on the application of MPEG-21 in the adaptation of video and his early research ﬁndings have been published in refereed journals, edited books and conference proceedings.

Local Collaborative Ranking - Research at Google