Official Project Update #2
Jan. 4th, 2023 09:31 pm![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
![[community profile]](https://www.dreamwidth.org/img/silk/identity/community.png)
It may seem all quiet on this front but the project is indeed progressing! I've found groups in new languages (Scots Gaelic, Irish Gaelic, Hausa, Bengali, Kannada) and sorted through heaps of categories of music artists and bands. (So many Backstreet Boys and NSYNC groups!)
More recently, I've been sorting through TV shows. Anyone wanting to tag a tab of their favorite show might just be able to now. Especially large were Buffy (over 800 groups!), Star Trek (at least 500), X-Files and General Hospital (300-400 each), with dedicated tabs (between 100 and 300 groups) also for Stargate, Xena, Power Rangers, Days of Our Lives, and The Simpsons. Other shows have to share tabs but there are still decent-sized categories for many. It's wonderful to see how many of the groups we saved!
Next up are a few more general/miscellaneous TV shows categories and then I'll be heading into a few Family & Home categories (family-specific groups, genealogy, and home building, for a sample) and the food & drink categories (which will mostly be lots and lots of recipes).
A note on spam groups:
A small but definite percentage of what got saved in the rush of everything was actually a whole bunch of spam groups. These are easily identifiable because their descriptions are what you might call "keysmash" - a mishmash of characters all jumbled together. After sorting out hundreds of these, I've observed there are really two distinct types.
Groups of the most common type were all created in 2011-2012, and follow this pattern:
Name is 5-6 characters.
Descriptions are a string of 13-19 characters (most are on the shorter side).
Summaries are 5-6 characters again, a different string from the name.
Characters are a mix of letters and numbers.
Groups of the less common type were all created in 2009, and follow this pattern:
Name is 13-14 characters.
Descriptions are often quite long and have spaces between "words" of spam of varying length (though a few occasionally have two or three short "words").
Summaries are identical to the group name.
Characters are only letters, no numbers involved.
I've run across a couple groups that look very much like the first pattern described - except the group name and summary are 4-7 characters, and the groups were created in 2010. Only two that I know of, though there are likely a few more already sorted onto Spam tabs early on, before I realized there were multiple distinct patterns at all.
Actual stats:
Now up to 20.00% sorted and 1.17% tagged.
Available tabs:
English: 743
Spanish: 14
Portuguese: 6
Italian: 84
German: 1
French: 2
Chinese: 2
Indonesian/Malay: 25
Arabic: 15
Persian: 5
Turkish: 14
Romanian: 3
Unknown: 62
Spam: 12
Something fun:
I was amused to see that, among the three groups in the Rick Astley category, one was titled "Rick_Roll", and its description was, primarily - you guessed it - the words to "Never Gonna Give You Up." XD
More recently, I've been sorting through TV shows. Anyone wanting to tag a tab of their favorite show might just be able to now. Especially large were Buffy (over 800 groups!), Star Trek (at least 500), X-Files and General Hospital (300-400 each), with dedicated tabs (between 100 and 300 groups) also for Stargate, Xena, Power Rangers, Days of Our Lives, and The Simpsons. Other shows have to share tabs but there are still decent-sized categories for many. It's wonderful to see how many of the groups we saved!
Next up are a few more general/miscellaneous TV shows categories and then I'll be heading into a few Family & Home categories (family-specific groups, genealogy, and home building, for a sample) and the food & drink categories (which will mostly be lots and lots of recipes).
A note on spam groups:
A small but definite percentage of what got saved in the rush of everything was actually a whole bunch of spam groups. These are easily identifiable because their descriptions are what you might call "keysmash" - a mishmash of characters all jumbled together. After sorting out hundreds of these, I've observed there are really two distinct types.
Groups of the most common type were all created in 2011-2012, and follow this pattern:
Name is 5-6 characters.
Descriptions are a string of 13-19 characters (most are on the shorter side).
Summaries are 5-6 characters again, a different string from the name.
Characters are a mix of letters and numbers.
Groups of the less common type were all created in 2009, and follow this pattern:
Name is 13-14 characters.
Descriptions are often quite long and have spaces between "words" of spam of varying length (though a few occasionally have two or three short "words").
Summaries are identical to the group name.
Characters are only letters, no numbers involved.
I've run across a couple groups that look very much like the first pattern described - except the group name and summary are 4-7 characters, and the groups were created in 2010. Only two that I know of, though there are likely a few more already sorted onto Spam tabs early on, before I realized there were multiple distinct patterns at all.
Actual stats:
Now up to 20.00% sorted and 1.17% tagged.
Available tabs:
English: 743
Spanish: 14
Portuguese: 6
Italian: 84
German: 1
French: 2
Chinese: 2
Indonesian/Malay: 25
Arabic: 15
Persian: 5
Turkish: 14
Romanian: 3
Unknown: 62
Spam: 12
Something fun:
I was amused to see that, among the three groups in the Rick Astley category, one was titled "Rick_Roll", and its description was, primarily - you guessed it - the words to "Never Gonna Give You Up." XD