Solving spelling variations in song titles and artists
Consider the following scenarios that could be scrobbled by a user (note: the actual song scrobbled is exactly the same in all cases):
Artist Name // Song Title
- Jay Ant // Fully Focused
- Fly Commons // Fully Focused
- Fly Commons x Christoph Andersson // Fully Focused (feat. Jay Ant & G-Eazy)
- Jay Ant // Fully Focused ft. G-Eazy (prod. Christoph Andersson & Fly Commons)
- Fly Commons // Fully Focused (feat. Jay Ant & G-Eazy)
- Jay Ant x G-Eazy // Fully Focused
- G-Eazy // Fully Focused (feat. Jay Ant)
Assume that the following variations have been scrobbled with the artist and song in opposite fields:
8) Fully Focused (feat. Jay Ant) // Fly Commons
9) Fully Focused (feat. G-Eazy) // Jay Ant
10) Fully Focused // Fly Commons x Jay Ant
At the time of scrobbling, each song would be first checked against 'artists' and if it doesn't exist then create a new artist with a unique ID. Then check against 'songs' and if the song title with matching artist ID already exists, if yes then increase its scrobble count by 1, or if it doesn't exist, create a new song with a unique ID and corresponding artist ID. After the new song has been created, a new camp will need to be created as well, with this song being its only member.
If it is the first time a user has scrobbled a unique song, then the song will be given 1 point.
Initially, each song would have a default autocorrect spelling identical to what was scrobbled. Each song would have an autocorrect field containing one or several song IDs belonging to itself and other songs. Each user will have the ability to assign an autocorrect spelling suggestion* for both the song and the artist of the song he has scrobbled.
The following example will assume that each of the ten users above have manually declared a desired autocorrect spelling for each song that was scrobbled. Below is a list of what each song will now be corrected to:
Artist Name // Song Title
11) Jay Ant // Fully Focused (feat. G-Eazy)
12) Fly Commons // Fully Focused (feat. Jay Ant & G-Eazy)
13) Fly Commons // Fully Focused (feat. Jay Ant & G-Eazy)
14) Jay Ant // Fully Focused (feat. G-Eazy)
15) Jay Ant // Fully Focused ft. G-Eazy (prod. Fly Commons)
16) Jay Ant // Fully Focused (feat. G-Eazy)
17) G-Eazy // Fully Focused (feat. Jay Ant) [this user opted not to make an autocorrect suggestion]
Assume that the following variations have been scrobbled with the artist and song in opposite fields:
18) Fly Commons // Fully Focused (feat. Jay Ant & G-Eazy)
19) Jay Ant // Fully Focused (feat. G-Eazy)
20) Fly Commons // Fully Focused (feat. Jay Ant)
A diagram showing how relationships are formed by submitting new autocorrect suggestions
At the end of it, four camps are generated regarding spelling of this song. All songs that are a member of an individual camp will be considerered the same, with each camp representing a unique song.
Camp 1
- Jay Ant // Fully Focused
- Jay Ant // Fully Focused (feat. G-Eazy)
- Fully Focused (feat. G-Eazy) // Jay Ant
- Jay Ant // Fully Focused ft. G-Eazy (prod. Christoph Andersson & Fly Commons)
Camp 2
- Fly Commons x Christoph Andersson // Fully Focused (feat. Jay Ant & G-Eazy)
- Fly Commons // Fully Focused (feat. Jay Ant & G-Eazy)
- Jay Ant // Fully Focused ft. G-Eazy (prod. Fly Commons)
- Jay Ant x G-Eazy // Fully Focused
- Fully Focused (feat. Jay Ant) // Fly Commons
Camp 3 (Remains unchanged)
- G-Eazy // Fully Focused (feat. Jay Ant)
Camp 4
- Fully Focused // Fly Commons x Jay Ant
- Fly Commons // Fully Focused (feat. Jay Ant)
The next step is choosing a camp leader, which would be done by adding together the number of points and votes. The camp leader is the song with the highest number of points and votes, while all others are considered variations.
Points would be determined by the number of users who have scrobbled the specific song. 1 Point will be given to a song the first time a user scrobbles it. If a user provides an autocorrect title then 1 point will be taken away from the original song and given to the corrected song.
Votes will work similar to reddit with upvoting and downvoting. A point and a vote are weighted the same, and each user is allowed one vote per song regardless if they have scrobbled it before or not.
The page of the camp leader would show a list of variations, while the page of each variation would show a link to the camp leader's page. Biographical info and comments on pages of all spelling variations should be preserved.
A diagram showing an overview of the camps and how leaders/variations would be displayed
Each time a user saves an autocorrect suggestion, the original scrobble and the correction will be compared to existing camps. After comparing, there will be three possible outcome scenarios for each song:
- The original scrobble matches a member of an existing camp, but the corrected scrobble does not.
- The original and corrected scrobbles match a member in the same camp.
- The original and corrected scrobbles match a member in two different camps.
Note: When a unique song is scrobbled, a new camp is created for it. Therefore it is impossible for the original scrobble not to match any camps.
Scenario 1: A new song is created for the corrected scrobble with 1 point, the new song is added to the camp of the original scrobble and the original scrobble loses 1 point.
Scenario 2: The corrected scrobble gains 1 point and the original scrobble loses 1 point.
Scenario 3: The corrected scrobble gains 1 point, the original scrobble loses 1 point, and all members of the original scrobble's camp are moved into the corrected scrobbles camp. The two camps are merged.