APPROACH 2

        <<---Previous Page                                                                                                                                                                                         

 

 Emergent approach

 

Quick Menu:

1-Collaborative Filtering

2-Co-occurrence Analysis

3-Results of Co-occurrence Analysis

4-Issues of Co-occurrence Analysis

 

This approach, discussed also in [2], is the opposite of the first one. Here, no prior taxonomy is assumed, rather, the algorithms try to emerge a taxonomy from the database by clustering songs according to similarity measures. This approach adopts therefore unsupervised learning.

Also, unlike the prescriptive approach,  genre is not assumed to be defined intrinsically here. Similarity measures are not based on intrinsic attributes, but rather on cultural similarity from text documents. Two techniques can be used in this approach: Collaborative Filtering (CF) and Co-occurrence Analysis.

 

Before we describe each technique and the results obtained, we shall quickly mention that some authors computed similarity measures based on intrinsic attributes, but only to assess similarity between individual music pieces, and did not consider the genre recognition problem. Moreover, we already saw in the previous section that genre and intrinsic attributes have little correlation. Hence only similarity measures based on cultural similarity from text documents is considered in this approach.

 

 

COLLABORATIVE FILTERING: (top)

 

Collaborative Filtering has been used for the idea of building a system that could smartly recommend music pieces to its subscribers. Collaborative Filtering, as its name implies, tries to filter the music pieces that it wants to recommend (Filtering) to a given subscriber by making use of all other subscribers (Collaborative). This idea is based the fact that different people have different tastes when it comes to listening to music. If we assign a profile to a given user, then the system can look for all agents having similar profile to the user's. The system then can recommend that user items liked by these agents but not known to the user.

 

However, this approach has two main limitations. The first is that the system gets stuck if the user's profile is very complex. The second one, which is the most important, is that the system will tend to be biased in the long run towards items which are "hits", that is music pieces that are known and liked by a huge fraction of the population. This reduces the chances of other less known music pieces to survive.

 

Now you might ask how the hell is this related to Music Genre Recognition?? Well, eventually it is not!!  People thought that this technique used to select and recommend pieces to users could be linked to some kind of genre similarity. However it has been shown that it is difficult to extract any relevant relationship between the tastes/buying patterns of the users and genres. That's why we turn our attention now to a technique introduced by the authors of [2], which is called Co-Occurrence Analysis.

 

 

CO-OCCURRENCE ANALYSIS: (top)

 

This technique is meant to extract similarity between music titles or artists. This would constitute an attempt to recognize the genre of these music pieces/artists being compared. If you check or recall the small experiment we did in section 'What? - Introduction' , we attempted to define each of  the classical and rock genres by comparing Bach's prelude to Elvis' song. Our experiment showed us that there is strong evidence that these two pieces belong to different genres, but did not really tell us to which genre each piece really belongs. We suggested that we should maybe repeat this experiment over a great number of songs and try to establish relationships between different pieces in order to arrive to closer and closer definition of genre. Co-occurrence analysis is somewhat based on the same idea of finding similarities between music pieces, but the approach it uses is based on analyzing the occurrence of these pieces in different music sources (radio playlists, compilation CDs...).

 

For example, suppose we start our analysis by using data from a radio program. The radio program's playlist is not arbitrary. It is chosen in a way that music pieces in it do not break the identity of the program. For example, if you tune to Radio Nostalgie, you will never hear a contemporary Hard Rock song, rather you would only hear oldies. Then, we could build a matrix where the rows and columns correspond to music titles and the value of the cell (i, j) would be the number of times title i and title j appeared together as neighbors in a Radio Nostalgie program. We can do that by using all  music sources combines, so the value of cell (i, j) would be the number of times title i and title j appeared together in the same music web page, or in the same compilation.

 

It should be clear by now why this technique is called 'co-occurrence analysis'.

 

Now, to be even more precise, we should not only add a 1 to the cell value whenever 2 titles co-occur, but we should also compute a certain distance function. The distance function (or distance metric)  should take into account the fact that 2 titles may actually never co-occur directly , but rather do so indirectly, by each co-occurring with a third title. Moreover, in order to have a good similarity extraction from this process, the experiment should be restricted to a closed corpus of titles S = (T1, ..., TN) that would later be used to compare with human similarity judgments. I other words, the matrix should not be of infinite size. If the matrix contained arbitrarily many titles, then the probability of getting "bad" similarities will tend to increase (much like this well known tip among professors which says that the more you write in an exam, the higher the risk you will get a wrong answer...). Not only will we have bad similarities, but also, we might not be able to find all "good" similarities (the more irrelevant titles considered, the more difficult to find all "good" similarities).

 

So, given the corpus S,  an N-by-N matrix is constructed. The value of the cell (i,i), that is the co-occurrence of a title with itself, is simply the number of times title i occurred in the corpus. The matrix is clearly symmetric, so each title, whether it is taken to be a row or a column, can be viewed as a vector of size N. This vector is normalized to get fair results.

 

RESULTS OF CO-OCCURRENCE ANALYSIS:  (top)

 

As we mentioned earlier, this experiment was done by the authors of [2]. They have determined that 70% of the clusters constructed from this technique show interesting similarities. By interesting it is  meant that specific music genres could quite well be distinguished. The complete results are found in [6].

 

 

ISSUES OF CO-OCCURRENCE ANALYSIS:  (top)

 

-The first issue that they got is that the clusters were not labeled. It is difficult to characterize the nature of the extracted similarities.

 

-The second issue is that  this method works only for titles appearing in the sources used.

 

The nice thing about this technique is that it is able to extract high level similarities between titles and artists, and hence the genres deduced would have a more operational and abstract definition.

 

For those who made it to this line, thank you for your patience hoping you enjoyed reading and learned something out of this tutorial. Now you can proceed to the applet:

 

APPLET

 

 

        <<---Previous Page                                                                                                                                                                                         

 


                        HOME             WHAT ?            WHY ?            HOW?             WHO ?            Class 2005 Projects            .Go to Top