Official Project Update #19
Jun. 10th, 2025 12:59 pm![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
Tagging has slowed somewhat, as people have gotten busy with school and work. There are still over 5000 tabs to tag. However, experienced taggers have tagged nearly 50 apiece. If we had 100 taggers tackle the project right now and stick with it the way some of our volunteers have, it would be finished very shortly!
So this is once more a call to those of you who are interested in seeing this complete: If you want to help, consider either tagging a tab, advertising/recruiting others to tag (see the boost text from this post if you need it), or both!
I spent several years working on the metadata at the expense of working on projects that would help support me financially, so I've had to step back from actively working on it for now. However, if all the tabs get tagged, I will absolutely be there to sort the actual data. I'd love to see it organized and uploaded!
Actual stats:
Now up to 4.02% tagged.
Available tabs (sorted by descending numbers by language):
English: 3208
Unknown*: 544
Spanish: 387
Portuguese: 336
French: 146
Indonesian/Malay: 131
Italian: 84
German: 77
Turkish: 64
Chinese: 59
Arabic: 54
Romanian: 36
Spam**: 32
Persian: 16
Dutch: 12
Filipino: 12
Swedish: 8
Hungarian: 7
Polish: 7
Vietnamese: 6
Bosnian: 3
Finnish: 3
Catalan: 2
Danish: 2
Esperanto: 2
Lithuanian: 2
Norwegian: 2
Russian: 2
Single tabs available:
African: Afrikaans, Chichewa, Hausa, Kinyarwanda, Malagasy, Somali, Swahili, Yoruba
Asian: Acehnese, Armenian, Azerbaijani, Batak Toba, Bengali, Georgian, Gujarati, Hebrew, Hindi/Urdu, Javanese, Kannada, Kapampangan, Kazakh, Korean, Kurdish, Malayalam, Marathi, Mongolian, Sundanese, Tamil, Telugu, Tetum, Thai, Turkmen, Uzbek, Uyghur
European: Albanian, Basque, Croatian, Czech, Estonian, Galician, Greek, Icelandic, Ido, Interlingua, Latin, Latvian, Maltese, Occitan, Slovak, Slovenian, Welsh
All compilation tabs are still available as well.
* As stated in a previous post, the "Unknown" groups are by and large not unknown as far as language goes, but they can't be tagged without looking at the actual messages. Most are clearly in English.
** As stated in a previous post, the spam groups will also need to be looked at a little to confirm spam status, but many are in clear patterns and most of the groups won't need to be looked at once one or two are. I have no idea what language, if any, most of those are. My suspicion is that they were created just for email address harvesting, but we may never know for sure.
Something fun:
I've been using Random.org to generate id numbers for groups in the metadata database to give those in the Discord server a randomized peek at the variety of groups we knew about (and mostly saved). Here's one of the fan groups that came up in that process:
So this is once more a call to those of you who are interested in seeing this complete: If you want to help, consider either tagging a tab, advertising/recruiting others to tag (see the boost text from this post if you need it), or both!
I spent several years working on the metadata at the expense of working on projects that would help support me financially, so I've had to step back from actively working on it for now. However, if all the tabs get tagged, I will absolutely be there to sort the actual data. I'd love to see it organized and uploaded!
Actual stats:
Now up to 4.02% tagged.
Available tabs (sorted by descending numbers by language):
English: 3208
Unknown*: 544
Spanish: 387
Portuguese: 336
French: 146
Indonesian/Malay: 131
Italian: 84
German: 77
Turkish: 64
Chinese: 59
Arabic: 54
Romanian: 36
Spam**: 32
Persian: 16
Dutch: 12
Filipino: 12
Swedish: 8
Hungarian: 7
Polish: 7
Vietnamese: 6
Bosnian: 3
Finnish: 3
Catalan: 2
Danish: 2
Esperanto: 2
Lithuanian: 2
Norwegian: 2
Russian: 2
Single tabs available:
African: Afrikaans, Chichewa, Hausa, Kinyarwanda, Malagasy, Somali, Swahili, Yoruba
Asian: Acehnese, Armenian, Azerbaijani, Batak Toba, Bengali, Georgian, Gujarati, Hebrew, Hindi/Urdu, Javanese, Kannada, Kapampangan, Kazakh, Korean, Kurdish, Malayalam, Marathi, Mongolian, Sundanese, Tamil, Telugu, Tetum, Thai, Turkmen, Uzbek, Uyghur
European: Albanian, Basque, Croatian, Czech, Estonian, Galician, Greek, Icelandic, Ido, Interlingua, Latin, Latvian, Maltese, Occitan, Slovak, Slovenian, Welsh
All compilation tabs are still available as well.
* As stated in a previous post, the "Unknown" groups are by and large not unknown as far as language goes, but they can't be tagged without looking at the actual messages. Most are clearly in English.
** As stated in a previous post, the spam groups will also need to be looked at a little to confirm spam status, but many are in clear patterns and most of the groups won't need to be looked at once one or two are. I have no idea what language, if any, most of those are. My suspicion is that they were created just for email address harvesting, but we may never know for sure.
Something fun:
I've been using Random.org to generate id numbers for groups in the metadata database to give those in the Discord server a randomized peek at the variety of groups we knew about (and mostly saved). Here's one of the fan groups that came up in that process:
BetteMidlerADivineGathering
created 2004-10-23
/Entertainment & Arts/Celebrities/
Looking for a place to stop and talk about the Divine Miss M? Here's your place! Feel free to join, post messages, pictures, files, or even chat! EVERYONE is welcome!
Just remember...keep this place clean and free of hate and negativity. Bashing of Bette Midler or members of this group will NOT be tolerated. Of course everyone is entitled to their own opinions, but if you have something negative to say to another group member, please keep it away from the group...email them personally.
And remember to HAVE FUN!!!
Thanks,
Dusty
61 members