[sticky entry] Sticky: How can I help?

Nov. 26th, 2022 06:42 pm
doranwen: picture of a book with the word logophile (logophile)
[personal profile] doranwen

Who can help?



To finish getting all of the groups tagged and sorted will take a lot of volunteers. We can find a role for anyone who is willing to devote time to this. We can especially use people with at least one of the following skills or traits:

- detail-oriented
- know a language other than English
- have an area of expertise (such as a fandom or a genre of music)
- good at performing difficult searches of the Internet to track down information
- comfortable with installing a program and learning to use it
- good at recruiting others


Later on we may be able to use someone who is skilled with scripts.


Can I help out right now?



Absolutely! Tagging is in full swing and we need all the help we can get. Just head over to our Discord server and follow the directions to get started. (If you don't do Discord, you're welcome to PM me or join the #yahoogroups channel on the Hackint IRC server. I don't check IRC often, though, so if you do that, leave a comment here to let me know to look for you there.)

For more information on what tagging involves, or to find text you can share to boost this, see the following post: https://yahoogroups.dreamwidth.org/6497.html
doranwen: female nerds, rare and precious (Default)
[personal profile] doranwen

When will this project be done?


When the work is done. Can you help get it there? No one is paying anyone to do this, so it will take all of our efforts combined.

How can I help?


Several ways!

1) Offer to tag a tab. Any language we have works, though if you speak one other than English, that's first priority. The smaller languages often have smaller tabs, too, so you could be done in a very short time!

2) Tag Unknown groups. There are several on our Discord server that are waiting for a volunteer to read the messages or summary of them, and decide the best tag. This is very low time-commitment (you can do just one group or two as you have time!) but the efforts will add up over time. It's a great way to get started tagging if you find an entire tab to be intimidating.

3) Tag an Unknown tab. It would save me a lot of time if someone were willing to take a whole tab of Unknown groups and open the data to see what it is. We have a visual tutorial for Sylpheed, a simple freeware email client that will natively import the mbox files that are Yahoo's standard message format. It's not hard to use! You just need an actual computer (a phone or tablet won't work for this). And you could make use of the thread feature in our Unknown groups channel to enlist help if you're stuck on one.

4) Volunteer to be pinged to use Google Translate on messages for Unknown groups. Some groups are in languages we don't have a volunteer for, but which Google Translate will handle correctly. All you have to do here is copy/paste them into Google Translate and copy/paste the translated messages back into the thread, so that tagging volunteers can read them in English. Easy! It will save us so much time if we have someone to ping for this.

5) Help with languages we don't have a volunteer for. For instance, we currently have Romanian and Farsi groups waiting for someone to read them and summarize the content. (Indonesian and Arabic are likely to turn up and we currently have no volunteers for those two either.) Google Translate adds an additional step that slows things down. If you can read them natively and help, offer to be pinged whenever there are groups in your language.

6) Volunteer to reach out to find speakers of the more obscure languages. A number of groups were in languages that Google Translate does not have in its data banks; we cannot read these groups at all. In some cases, we're not even sure of the language identification. Someone has to find speakers of each of those languages to ask them to read the messages and summarize them so we can tag them.

If you're interested in getting this project complete and you haven't volunteered to help, please consider changing that, and finding one of the ways above that you can contribute to the project. :)
doranwen: female nerds, rare and precious (Default)
[personal profile] doranwen
While we can use any language skill (there are tabs in over a hundred different languages!), we have a particular need for help with Farsi. Google Translate expects it to be written in its version of the Arabic script, but much of the Farsi on Yahoo Groups was written in the Latin alphabet, which Google Translate won't process for that language. (My last attempt was guessed as "Bengali" despite being clearly Farsi, and was naturally rendered as almost complete gibberish.) This means we absolutely need a native speaker to read the text and tell us what's being said.

Are you able to do this? Do you know someone who can? Please pass this request on and let them know to check out our Discord server to volunteer their help.

One of the Unknown groups currently being worked on is in Farsi and we're stuck without help from someone who can read it.

EDIT: I have been able to make use of this converter but it's an extra step that slows everything down. Native speakers are more accurate, plus they recognize and understand context details that can make all the difference with tagging.
doranwen: female nerds, rare and precious (Default)
[personal profile] doranwen
I've set up a leaderboard on the Discord server now, that will show the number of groups each volunteer has tagged (complete tabs only count). If anyone is competitive, the number to beat for top spot is 11,707 groups! (That's 1.05% of all the groups.)

I've also set up the Unknown group tagging on the server for any volunteers who are interested in helping a little but don't feel up to tagging an entire tab. Each Unknown group is given a thread in the channel for Unknown group tagging, in which I copy and paste a few of the messages we have, or even a screenshot of the files within the files.zip, if that will be helpful. Anyone who can is invited to read, give additional information or insight into the topic, and/or attempt to tag the group, either on the spreadsheet directly (they can DM me for the link) or by simply listing in the thread which tags they would use. Contributions will be noted on a leaderboard just for Unknown tagging.

If you haven't tagged any groups yet, could you set aside just 5 minutes a day to help us with the Unknown groups?*

* The small scale helping as described above is limited to Discord users only; if you don't use Discord you'll need to claim an entire tab and work with the data yourself.
doranwen: female nerds, rare and precious (Default)
[personal profile] doranwen
Tagging has slowed somewhat, as people have gotten busy with school and work. There are still over 5000 tabs to tag. However, experienced taggers have tagged nearly 50 apiece. If we had 100 taggers tackle the project right now and stick with it the way some of our volunteers have, it would be finished very shortly!

So this is once more a call to those of you who are interested in seeing this complete: If you want to help, consider either tagging a tab, advertising/recruiting others to tag (see the boost text from this post if you need it), or both!

I spent several years working on the metadata at the expense of working on projects that would help support me financially, so I've had to step back from actively working on it for now. However, if all the tabs get tagged, I will absolutely be there to sort the actual data. I'd love to see it organized and uploaded!


Actual stats:

Now up to 4.02% tagged.

Available tabs (sorted by descending numbers by language):

English: 3208
Unknown*: 544
Spanish: 387
Portuguese: 336
French: 146
Indonesian/Malay: 131
Italian: 84
German: 77
Turkish: 64
Chinese: 59
Arabic: 54
Romanian: 36
Spam**: 32
Persian: 16
Dutch: 12
Filipino: 12
Swedish: 8
Hungarian: 7
Polish: 7
Vietnamese: 6
Bosnian: 3
Finnish: 3
Catalan: 2
Danish: 2
Esperanto: 2
Lithuanian: 2
Norwegian: 2
Russian: 2

Single tabs available:
African: Afrikaans, Chichewa, Hausa, Kinyarwanda, Malagasy, Somali, Swahili, Yoruba
Asian: Acehnese, Armenian, Azerbaijani, Batak Toba, Bengali, Georgian, Gujarati, Hebrew, Hindi/Urdu, Javanese, Kannada, Kapampangan, Kazakh, Korean, Kurdish, Malayalam, Marathi, Mongolian, Sundanese, Tamil, Telugu, Tetum, Thai, Turkmen, Uzbek, Uyghur
European: Albanian, Basque, Croatian, Czech, Estonian, Galician, Greek, Icelandic, Ido, Interlingua, Latin, Latvian, Maltese, Occitan, Slovak, Slovenian, Welsh

All compilation tabs are still available as well.

* As stated in a previous post, the "Unknown" groups are by and large not unknown as far as language goes, but they can't be tagged without looking at the actual messages. Most are clearly in English.

** As stated in a previous post, the spam groups will also need to be looked at a little to confirm spam status, but many are in clear patterns and most of the groups won't need to be looked at once one or two are. I have no idea what language, if any, most of those are. My suspicion is that they were created just for email address harvesting, but we may never know for sure.


Something fun:

I've been using Random.org to generate id numbers for groups in the metadata database to give those in the Discord server a randomized peek at the variety of groups we knew about (and mostly saved). Here's one of the fan groups that came up in that process:

BetteMidlerADivineGathering
created 2004-10-23
/Entertainment & Arts/Celebrities/
Looking for a place to stop and talk about the Divine Miss M? Here's your place! Feel free to join, post messages, pictures, files, or even chat! EVERYONE is welcome!
Just remember...keep this place clean and free of hate and negativity. Bashing of Bette Midler or members of this group will NOT be tolerated. Of course everyone is entitled to their own opinions, but if you have something negative to say to another group member, please keep it away from the group...email them personally.
And remember to HAVE FUN!!!
Thanks,
Dusty
61 members
doranwen: female nerds, rare and precious (Default)
[personal profile] doranwen
We've got some great taggers at work, but we still need far more to finish. If you're interested in seeing it get done, please consider tagging a tab if you haven't already. Joining the Discord server and following the instructions is the easiest way to do it, but you can also PM me here, message me on IRC's Hackint network (find me in #yahoosucks), or even message me on Reddit (Doranwen there as well).

If you can't tag a tab, maybe you can post the boost text from this post somewhere to get more eyes on it? Or just point people to this comm and say "hey, they could really use some help"? :)

If you know of anyone who might enjoy working on one of the smaller languages, that would be great as well. Many of the single tabs have only a handful of groups on them, making it a fairly easy and quick job for a native speaker. We're happy to help them get started!


Actual stats:

Now up to 3.73% tagged.

Available tabs (sorted by descending numbers by language):

English: 3219
Unknown: 544
Spanish: 388
Portuguese: 336
French: 146
Indonesian/Malay: 131
Italian: 89
German: 77
Turkish: 64
Chinese: 59
Arabic: 54
Romanian: 36
Spam: 32
Persian: 16
Dutch: 12
Filipino: 12
Swedish: 8
Hungarian: 7
Polish: 7
Vietnamese: 6
Bosnian: 3
Finnish: 3
Catalan: 2
Danish: 2
Esperanto: 2
Lithuanian: 2
Norwegian: 2
Russian: 2

Single tabs available:
African: Afrikaans, Chichewa, Hausa, Kinyarwanda, Malagasy, Somali, Swahili, Yoruba
Asian: Acehnese, Armenian, Azerbaijani, Batak Toba, Bengali, Georgian, Gujarati, Hebrew, Hindi/Urdu, Javanese, Kannada, Kapampangan, Kazakh, Korean, Kurdish, Malayalam, Marathi, Mongolian, Sundanese, Tamil, Telugu, Tetum, Thai, Turkmen, Uzbek, Uyghur
European: Albanian, Basque, Breton, Croatian, Czech, Estonian, Galician, Greek, Icelandic, Ido, Interlingua, Latin, Latvian, Maltese, Occitan, Slovak, Slovenian, Welsh

All compilation tabs are still available as well.

Note: The "Unknown" groups are by and large not unknown as far as language goes, but they can't be tagged without looking at the actual messages. Most are clearly in English.

The spam tabs will also need to be looked at a little to confirm spam status, but many are in clear patterns and most of the groups won't need to be looked at once one or two are. I have no idea what language, if any, most of those are. My suspicion is that they were created just for email address harvesting, but we may never know for sure.
doranwen: female nerds, rare and precious (Default)
[personal profile] doranwen
The tagging process is moving out of beta mode; while minor improvements may be made over time, we are hopeful that no big changes will need to be made from here on.

We therefore invite any and all willing volunteers to come help. Attention to detail, the ability to work in Google Sheets/Docs, and the willingness to ask questions when in doubt are all you need.


What does tagging involve?



Tagging is the process of reading through group metadata (names, descriptions, and a couple other useful fields) and deciding what the group is about. Then you copy/paste the correct information into specific columns. You will be assigned a 'tab' of groups to keep the process manageable. Each tab has between 100 and 300 groups on it, from one or more categories on Yahoo. You can work at your own pace and drop in when you have time.

The guidelines for the process are in a Google Doc and cover most of the situations you'll encounter. If you run across something difficult, we encourage you to paste it in the correct channel on our Discord server and ask for advice.


Can I ask to tag something specific?



Yes, but you're limited to a) categories that Yahoo actually had, and b) things that haven't been tagged yet. Not all fandoms that were popular during that time had their own categories, and except for the very large (such as Harry Potter, Star Trek, Dragon Ball, or Backstreet Boys), most will be on a tab with other fandoms. (When you volunteer to claim a tab, you're volunteering to tag all of the groups on that tab, not only the categories you're interested in.) Nonfandom areas also vary wildly in their size; there are enormous quantities of classmate and alumni groups, software groups, genealogy groups, romance groups, recycling groups, and adoption groups for instance, but there may only be one or two groups for a particular automotive make or health condition.

Tagging isn't exceptionally difficult, however, and once you successfully complete one tab, you will probably find the second one even easier. Plus, you never know what gems you may run across while you do it!


What if I'm not comfortable tagging groups? Is there any other way to help?



Yes, there are many other ways to help.

1. Importing
We need people who are willing to install an email program called Sylpheed on their computers to import Yahoo Groups into a format that will help us tag. Not everyone can install the program, so you can help with importing. That way we have a steady flow of groups for people to tag.

2. Languages
If you can read a language other than English, you can help on the server by opting into a role for your language. Then if someone is trying to tag groups in that language, they can call on you for help with anything confusing.

3. Boost!
We also need more visibility on this project, which brings us to…


Can I advertise this somewhere I know?



Yes, please! There are about a million groups left to tag, on over 5000 tabs. The more volunteers we have, the fewer tabs each person has to tag.

If it's useful, feel free to share this text in quotes (change the first word to whatever number is accurate, since time marches on - the project started in 2019):

Five years ago, with little notice, Yahoo announced Yahoo Groups was being deleted. An army of archivists swung into action and saved nearly a million groups in all - 14 terabytes of data.

The next step of the Save Yahoo Groups project is tagging the groups.

We need volunteers who:

* are detail-oriented and careful

* ask lots of questions when in doubt

* can use Google Sheets/Docs at a basic level

Also helpful:

* able to read languages other than English

* able to install a simple program and follow a visual guide to importing mbox files

* extensive knowledge about a particular subject

If this interests you, check out our Dreamwidth community and volunteer to help: https://yahoogroups.dreamwidth.org/profile
doranwen: female nerds, rare and precious (Default)
[personal profile] doranwen
The metadata sorting is finally done! You have no idea how happy I am to have that part off my plate.

So what's next? Well, for the next week or so you won't see anything here, as I encourage beta taggers to finish up outstanding tabs (and deal with a few tasks of my own that I've been putting off). There are also several main categories that haven't had a thorough testing by beta taggers (Recreation & Sports, Hobbies & Crafts, Health & Wellness, and Schools & Education), so if you've been waiting to help, here's your invitation to pick one of those and try tagging a tab of it.

Once the main categories have been tested, then I'll post a message that you can link to, so you can invite people you know who might be interested in doing this. The more volunteers who help, the sooner this will be done and can be uploaded to the Internet Archive.

The three key things we need in volunteers (besides the time to complete the task) are:
  • carefulness / attention to detail

  • a willingness to ask questions when in any doubt

  • and

  • the ability to use Google Sheets / Docs at a basic level.


Specialty knowledge in an area (including being able to read another language) is a bonus but not necessary. Discord is the easiest way to connect with us but Dreamwidth PMs, Google Chat, or even IRC can work instead if need be.

Actual stats:

Now up to 2.89% tagged.

Available tabs (sorted by descending numbers by language):

English: 3264
Unknown: 544
Spanish: 389
Portuguese: 336
French: 146
Indonesian/Malay: 131
Italian: 89
German: 77
Turkish: 64
Chinese: 59
Arabic: 54
Romanian: 36
Spam: 32
Persian: 16
Dutch: 12
Filipino: 12
Swedish: 8
Hungarian: 7
Polish: 7
Vietnamese: 6
Bosnian: 3
Finnish: 3
Catalan: 2
Danish: 2
Esperanto: 2
Lithuanian: 2
Norwegian: 2
Russian: 2


Besides the list of available tabs above, we have one tab each (often well under 100 groups - many don't even have 10!) of the following languages:

African: Afrikaans, Chichewa, Hausa, Kinyarwanda, Malagasy, Somali, Swahili, Yoruba
Asian: Acehnese, Armenian, Azerbaijani, Batak Toba, Bengali, Georgian, Gujarati, Hebrew, Hindi/Urdu, Javanese, Kannada, Kapampangan, Kazakh, Korean, Kurdish, Malayalam, Marathi, Mongolian, Sundanese, Tamil, Telugu, Tetum, Thai, Turkmen, Uzbek, Uyghur
European: Albanian, Basque, Breton, Croatian, Czech, Estonian, Galician, Greek, Icelandic, Ido, Interlingua, Latin, Latvian, Maltese, Occitan, Slovak, Slovenian, Welsh


There are also special compilation tabs with one or two groups each of a variety of lesser-used languages, one for each of African, Asian, European, North American, Oceania, and South America. These tabs are very small, all being 20 groups or fewer.

Of special notice is a tab of around 200 groups from this family of languages (Zo, Tedim, Hakha, Mara, etc.). If you know of anyone who can read any of these, please put them in contact with us! Google Translate can only recognize and understand Mizo and the rest must be manually identified (a very difficult task for someone who doesn't speak or read any of them). Translating them is next to impossible. (Some don't even have dictionaries available online.)
doranwen: female nerds, rare and precious (Default)
[personal profile] doranwen
I finished all the German categories and have been going through Portuguese ones. Oddly enough, unlike the other categories, the Portuguese ones started out somewhat jumbled, though it seems to have stabilized mostly going through top-level categories one at a time (but in reverse order alphabetically, chunk by chunk). After the Portuguese, the only language directory that's left is French, which I suspect will be relatively smaller overall.

I did find a new language—Cape Verdean Creole—I think! Google Translate thought it was Brazilian Portuguese but it's definitely not and all research that volunteers did for me seems to turn up Cape Verdean Creole, so I'm assuming that must be it. Hopefully we can find someone who can read it in order to tag it properly.

This will be, I believe, the final update for metadata sorting until I'm finished. If my calculations are correct, the 95% mark will occur in the middle of the final category, which is the massive NULL one (groups for which we have no information on category path, category, or categoryid). Sorting this category will go remarkably fast, because there are only three possibilities: we have at least a partial description which can be used to tag (likely to be very rare), we have no info but we have the GMD so the mboxes can be looked at (also probably rare), or we have no info at all (the most likely). The first can stay on the appropriate language tab, the second and third go to Unknown. After that, I have only the final 1-2% to handle - finalizing all the smaller language tabs (including some that turned out to have far more groups than I imagined and will require splitting into multiple tabs). I can't wait to be done with this stage!


Actual stats:

Now up to 90.00% sorted and 2.89% tagged.

Available tabs:

English: 3254
Spanish: 388
Portuguese: 269
Italian: 88
German: 76
French: 8
Chinese: 59
Indonesian/Malay: 130
Arabic: 54
Persian: 16
Turkish: 63
Romanian: 35
Unknown: 281
Spam: 31


Something fun:

Yahoo Groups were used to host hundreds of groups containing downloads of custom content for The Sims. Such groups could be found in many languages, as the group "the_sims_downloads" shows:

Comunidade destinada à downloads do jogo The Sims 1,2,3 e no futuro 4.
Este grupo aqui no yahoo, é para fazer downloads dos jogos da série The Sims(TS1,TS2 e no futuro TS3).
Aqui terão vários tipos de downloads e também dicas sobre os jogos da série The Sims.
doranwen: female nerds, rare and precious (Default)
[personal profile] doranwen
I'm continuing to sort jumbled categories—mostly English, with patches of Spanish and the occasional Chinese category. I've hit quite a few categories for fanfic for specific fandoms as well as categories for fanfic for broad media types (TV shows, movies, comics & animation, music artists, etc.).

I've also hit the German categories, which are very easy to sort, as they have little to pull off (a little English, a little other languages, a little Unknown - but very, very little). I've sorted all of the German categories that were in order and am currently going through the jumbled ones they added afterwards.


Actual stats:

Now up to 85.00% sorted and 2.89% tagged.

Available tabs:

English: 3254
Spanish: 388
Portuguese: 14
Italian: 88
German: 61
French: 8
Chinese: 59
Indonesian/Malay: 130
Arabic: 54
Persian: 16
Turkish: 63
Romanian: 35
Unknown: 279
Spam: 31


Something fun:

The group "dr202" was just one of many that offered manuals or other downloads for keyboards (at least some of which may be the only places such files can be found anymore):

Information, Mods, Patches, pictures, manuals etc. A place for people to post information regarding electronic instruments. sh-32, dw8000, dx200, an200, dr202, sk-1, theremin, arp, yamaha, korg, moog, casio, ANYTHING! :)
doranwen: female nerds, rare and precious (Default)
[personal profile] doranwen
I'm still sorting jumbled categories, often sets of related ones, such as sports by location, or individual family groups by letter. I've had sets of dog breeds, soccer/football categories, and schools in various countries to sort, along with scattered TV shows and movies, and lots of actors and music artists/bands. There have also been quite a few role playing categories, as well as categories specifically for fanfic for various fandoms.

I found a couple new natural languages—Luganda and Ladino—as well as some constructed languages.


Actual stats:

Now up to 80.01% sorted and 2.81% tagged.

Available tabs:

English: 3112
Spanish: 349
Portuguese: 13
Italian: 88
German: 5
French: 8
Chinese: 52
Indonesian/Malay: 127
Arabic: 52
Persian: 15
Turkish: 61
Romanian: 35
Unknown: 252
Spam: 31


Something fun:

Anyone into history or costuming might find "HistoricCostuming_EdwardianWW1" interesting (it was one out of a whole series, with groups for every era imaginable):

This group is for the discussion of, sharing research about and reconstruction of Historic Costume/Fashion during the Edwardian period through World War 1 (1900-1920AD). This is a public group that any serious person is welcome to take part in. I encourage using the resources available on the Yahoo Groups site to share information.
doranwen: female nerds, rare and precious (Default)
[personal profile] doranwen
I've continued sorting through jumbles of categories - mostly English, with the occasional Spanish or Chinese category thrown in. (Oddly enough, I can't remember seeing any Italian categories mixed in - perhaps they'd planned enough room in the categoryid numbers to fit them all in before the English ones began?) I've been seeing lots of actors and music artists, quite a few movies and TV shows, random nonfandom categories that hadn't been created originally, and role playing categories for various fandoms that were created either after the fact or at the same time as the category for the fandom.

Found more new languages—Konkani and Gilbertese!


Actual stats:

Now up to 75.00% sorted and 2.71% tagged.

Available tabs:

English: 2920
Spanish: 315
Portuguese: 12
Italian: 88
German: 5
French: 7
Chinese: 44
Indonesian/Malay: 119
Arabic: 49
Persian: 15
Turkish: 57
Romanian: 34
Unknown: 228
Spam: 30


Something fun:

For anyone who loves fonts, the group "fontmaniacs" sounded like fun:

If you're one of those crazy folks who spends hours on the web and in newsgroups searching for more and more and more wonderfully diverse ways of making letters on paper, then this club is for you. Come here to trade fonts, talk about creating them, or just talk about collecting them.
doranwen: female nerds, rare and precious (Default)
[personal profile] doranwen
This has still been a big jumble of English categories - everything from military groups to dog breed groups to various actors, music artists, and authors (of which they had a set by letter, like the previous sets for actors, music artists, movies, and TV shows). I also hit the Chinese categories, which are slower to sort (detangling the Unknown from groups that are in English and Chinese both from groups that are only in Chinese but use English for fandom titles). Since I can't easily tell what they are unless I copy/paste them into Google Translate, any Chinese speakers offering to tag (and I hope there are some eventually!) will definitely need to look up which cat_ids they want, so I can identify the correct tab. Fortunately there weren't many Chinese groups initially—mostly in fandom categories, especially computer & video games—so I got through them and back to the jumble of mostly English categories with the occasional Spanish one. They do pop up now and then later on, but in smaller and more manageable batches.

Found a few new languages - Venetian, Xhosa, Lojban, Monda, Aranese, and Coptic!


Actual stats:

Now up to 70.00% sorted and 2.67% tagged.

Available tabs:

English: 2692
Spanish: 296
Portuguese: 11
Italian: 88
German: 5
French: 7
Chinese: 44
Indonesian/Malay: 114
Arabic: 47
Persian: 14
Turkish: 53
Romanian: 33
Unknown: 208
Spam: 30


Something fun:

Some group descriptions just make me chuckle, like that of "Cat_Trek":

These are the voyages of the starship Catstongue. Their mission: to boldly meow where no cat has meowed hitherto. Come join the crew as we explore the deepest darkest furtherest reaches of space and time in our neverending quest for the lost planet of Catalonia. You will be assigned a Trek identity, and you can contribute to the continuing adventures being recorded in the Captain's Log. OR you can just join to have a good read! The latest cast list can be found in the shared files, under the cast list folder! New members might like to look here before they choose a character for themselves.
doranwen: female nerds, rare and precious (Default)
[personal profile] doranwen
The last batch had me sorting through Spanish categories for Business & Finance, Computers & Internet, and Arts & Entertainment. This batch had the Spanish equivalents of Music, more Arts & Entertainment, Family & Home, Games, Government & Politics, Health, Hobbies & Crafts, Sports & Recreation, Religion & Beliefs, Schools & Education, Science, and Romance & Relationships (which definitely had some explicit content). And then it went back to jumbled English categories (with the rare Spanish one), with lots of blocks of regional categories on various topics, as well as sets of categories by letter for movies and for TV shows.

I found some new languages—Aymara, Occitan, Waray, and something I couldn't identify whatsoever. If anyone knows what language this is in, let me know!
FREECOM - Kayuttuq ulsa Ulinda karg Seesco irfdy FREE COMMUNICATING WORLD!
Asklad i fardad um in!

It was in an Australian cultural category, but that doesn't necessarily mean it has anything to do with Australia, given the high percentage of groups that were miscategorized.


Actual stats:

Now up to 65.00% sorted and 2.67% tagged.

Available tabs:

English: 2491
Spanish: 286
Portuguese: 10
Italian: 87
German: 4
French: 6
Chinese: 8
Indonesian/Malay: 107
Arabic: 44
Persian: 13
Turkish: 50
Romanian: 31
Unknown: 189
Spam: 30


Something fun:

As an LOTR fan, the group "ainulindale" sounds absolutely fascinating:

Esta lista pretende ser una herramienta de trabajo para la creación de un corpus de música tolkienista, así como creaciónn en danzas tradicionales o medievales.
doranwen: female nerds, rare and precious (Default)
[personal profile] doranwen
Among the many varied categories this time was a whole series of categories for music artists in general, by letter. A had lots of Aaliyah and Avril Lavigne, for instance, while M had My Chemical Romance and Michael Jackson among many, many others. There was also a similar series of categories for actors, and for computer & video games, though the last set was much smaller overall.

I also finally hit the Spanish categories, as you can tell by how the number of available Spanish tabs shot up suddenly. These are by far easier to sort than English tabs, for although I have to be careful to spot the occasional Catalan and Basque groups mixed in (and once in a blue moon, Galician), there's little Unknown (I can't skim the Spanish as easily as I do English, and I'm less certain of whether someone will be able to tag them or not based on what's there, and have chosen to err more on assuming they can), no spam so far, and virtually none of the languages so prevalent in the English categories (Indonesian, Arabic, Turkish, Persian, etc., very few other languages at all). The Spanish fandom categories are almost entirely Spanish with a very rare Catalan group; I mostly see the odd other languages in nonfandom categories.

I did find a new language - Mayan!


Actual stats:

Now up to 60.01% sorted and 2.67% tagged.

Available tabs:

English: 2390
Spanish: 126
Portuguese: 10
Italian: 87
German: 4
French: 6
Chinese: 8
Indonesian/Malay: 102
Arabic: 43
Persian: 13
Turkish: 49
Romanian: 31
Unknown: 184
Spam: 30


Something fun:

Sometimes the interesting part about a group was who created it or was part of it, like with "heroesiiimapmakers":

Greetings. This club is for Heroes of Might and Magic III, IV and V fan's that like making their own maps. Here you can get and exchange tips and tricks on how to make a great map for the game. I am a former Level Designer for New World Computing's Heroes of Might and Magic II and III series. Though I no longer do it professionally but I still love making maps as a hobby. This club is not affiliated with New World Computing, The 3DO Company or Ubisoft in any way. This is strictly group for fans.
Now that Heroes V is here let's meet here often and share our initial impressions of the new game.
doranwen: female nerds, rare and precious (Default)
[personal profile] doranwen
I'm still wading through a wild variety of categories, many highly specific. There was, for instance, a category under Star Wars/Characters that was just for Chewbacca. Mostly, however, this batch was a lot of categories under Regional, for subcategories along various lines such as Cultures & Community, Religion & Beliefs, Government & Politics, or Schools & Education. There were also large numbers of small categories under Business & Finance, automotive makes, music artists, and actors, as well as various fandoms that must have had their categories created later.

I also found a couple new languages—Nepali and Chuukese! And a couple new spam types in the /Government & Politics/Intelligence/ category (less interesting, lol).


Actual stats:

Now up to 55.01% sorted and 2.65% tagged.

Available tabs:

English: 2228
Spanish: 32
Portuguese: 10
Italian: 87
German: 4
French: 6
Chinese: 8
Indonesian/Malay: 99
Arabic: 42
Persian: 13
Turkish: 48
Romanian: 30
Unknown: 173
Spam: 30


Something fun:

Someone who was old enough to have a typewriter (or who have parents who did) may find the group "ibmselectrics" interesting:

Finally!! A place where people can worship, complain, and just plain mingle about the machine that hogged 75% of the typewriter market at one time, the one and only, IBM Selectric!
doranwen: female nerds, rare and precious (Default)
[personal profile] doranwen
Halfway done with sorting! I hope the second half goes quicker than the first.

These categories have been quite a wild jumble of mostly smaller categories that Yahoo created later. More Cultures & Community, more Regional, more Health & Wellness, more Music, more Entertainment & Arts, more Computers & Internet, more Science… and yes, more Schools & Education and Romance & Relationships. I've even run across a new language: Georgian!

As I sort, I'm reminded afresh of some of the reasons why this project is important. True, some groups, like the ones filling the Technical Support category (for long-obsolete devices and operating systems), are relics of another time and are largely useless to modern users. But others - such as the ones sharing tubes for Paint Shop Pro (which, as far as I can tell, can still use the old files) - may still be relevant to people today. Not to mention the sheer amount of creative work of all types that is preserved only in the messages and files of various groups.


Actual stats:

Now up to 50.01% sorted and 2.45% tagged.

Available tabs:

English: 2001
Spanish: 29
Portuguese: 16
Italian: 87
German: 3
French: 5
Chinese: 8
Indonesian/Malay: 88
Arabic: 39
Persian: 12
Turkish: 43
Romanian: 29
Unknown: 155
Spam: 26


Something fun:

Someone will surely find the group "lostcities" interesting:

This group explores the legends and reports of lost cities, lost continents, lost communities and lost peoples around the world, and highlights the real-life expeditions that have set out to find them.

Did Plato's Atlantis exist, and if so where was it? What happened to British explorer Percy Fawcett, who vanished in the Amazon while searching for a lost city? Is it time for a revival of "lost race" novelists like Canada's James DeMille ("A Strange Manuscript Found in a Copper Cylinder" - 1888) and America's William Starbuck Mayo ("Kaloolah, Or Journeyings to the Djebel Kumri" - 1849)?

The illustration at right is Maxfield Parrish's "City of Brass" (1909), a bookplate rendition of the Saharan lost city depicted in the Thousand and One Nights.

Facebook group: http://www.facebook.com/group.php?gid=247655422288
doranwen: female nerds, rare and precious (Default)
[personal profile] doranwen
This update has been a bit slow in coming, mainly due to life busyness. However, I at least finally finished sorting through Schools & Education groups, have sorted through Science groups, and then waded through Romance & Relationships (which had an inordinate amount of porn, naturally). The Romance & Relationships/Adult category, in particular, had approximately a 2:1 ratio of porn to non-porn, the latter being a mix of alternative lifestyle groups, fandom groups (which might have explicit fanfic or fanart), and random other groups that somehow ended up in there (such as a Nigerian ladies' golf association group!). Unfortunately, not all of the non-porn groups were saved, but at least a decent number were, mainly the fandom ones.

The Romance & Relationships/Romance category was possibly the most tedious so far of all categories, because not only did I have to pull off at least two to three times the number of groups that got to stay on the sheet I worked off (so much Unknown, porny Unknown, and Arabic, plus a good number with encoding issues), I couldn't tell apart many of those types from a quick scan due to the sheer quantity of HTML tags present. I frequently had to double-click to expand the description in order to tell if it was purely Unknown, if there was Arabic script or encoding symbols, etc.

Given that, I was relieved to finally finish the Romance & Relationships categories and move on to all the categories that Yahoo created after the fact. When put in sequential order, the earlier categoryid numbers belonged to categories who, next to each other, were all related under the same main category. However, Yahoo's people in charge of Groups clearly realized after creating them that more categories were needed. Suddenly the categories are in smaller blocks - occasionally a stretch of 10 or 20 related categories, but often just one or two categories, completely isolated from anything else related. It's meant a lot more variety in sorting, which is delightful after the tedium of Schools & Education and Romance & Relationships. I even found a group in a new language—Tongan.

I also discovered a new type of spam group - found only in the Schools & Education/Other category so far. The groups have a 5-14 character keysmash (including digits) type name and summary, and a description that reads "Dont know anything". I find it somewhat ironic that such a description is found in a spam group type that's only in a Schools & Education category…


Actual stats:

Now up to 45.06% sorted and 1.75% tagged.

Available tabs:

English: 1803
Spanish: 26
Portuguese: 15
Italian: 86
German: 3
French: 4
Chinese: 8
Indonesian/Malay: 75
Arabic: 37
Persian: 11
Turkish: 38
Romanian: 27
Unknown: 144
Spam: 22


Something fun:

The group "MEDTC-DISCUSS" looked a bit out of place in the Teaching and Methods category, but would almost certainly be of interest to someone…

This is a moderated discussion group about publications and other resources on medieval and early modern clothing, dress accessories, and textiles (including tools and processes).

Selections specifically concern clothing and textiles as a subset of material culture (not furniture or pottery, for example). Time and place focus is Europe and the Mediterranean, approximately 500 to 1600 CE. Emphasis is on scholarly and academic work, as opposed to "craft" or theatrical resources. Works under discussion include monographs, journal articles, theses, archaeology reports, and other published (and unpublished) resources, in any language, as well as events such as symposia, conferences, and museum exhibits substantially devoted to clothing and textiles of this period. Posts may be in any language, though English is preferred.

Membership requires moderator permission. Spam will not be tolerated. List members are invited to send notifications of any sources that they have encountered. Active scholars are encouraged to send announcements of their own publications and presentations. Please keep posts on-topic; discussion of personal projects (other than publications), reproduction techniques, supplies, social or re-enactment events, etc. should be taken to more appropriate fora.

This is NOT a SCA or reenactment list but that of academia. References to reenactment organizations... and one's membership in them... are strongly discouraged.
doranwen: female nerds, rare and precious (Default)
[personal profile] doranwen
Well, I hit some of the real tedium. Schools & Education - including two of the five largest categories - generally for class groups, whether it's the "everyone taking this section of Chemistry" or the "everyone graduating from this school in this year" type. They weren't very interesting to sort and will probably be less interesting to actually tag, but they have to be gone through. The smaller categories specifically for educators will be more interesting, I suspect, but the classmates/alumni sorts are really, really not - and there are a LOT of them. That, plus real life projects and events, is why this update has been later than usual.

I didn't find any new languages, but I did find a group in Armenian which used the Armenian alphabet. That was interesting! The other two Armenian groups I'd seen didn't, and I haven't seen Armenian enough to recognize it without the alphabet, but with it, I was able to ID it quickly; the alphabet is quite distinctive.


Actual stats:

Now up to 40.04% sorted and 1.57% tagged.

Available tabs:

English: 1614
Spanish: 24
Portuguese: 13
Italian: 86
German: 3
French: 4
Chinese: 7
Indonesian/Malay: 67
Arabic: 28
Persian: 10
Turkish: 34
Romanian: 25
Unknown: 108
Spam: 20


Something fun:

While most of the classmates groups were quite boring, there was the occasional oddity, like the group called "school-wedgies":
Do you get wedgies at school? Do you give wedgies at school?
If you do this is the group for you!
doranwen: female nerds, rare and precious (Default)
[personal profile] doranwen
It's been a busy few weeks! First I sorted through hobbies & craft groups, everything from collecting autographs to soapmaking to model trains to ham radio to knitting. Then I sorted categories of groups related to various issues and causes, from human rights to the environment to community service/volunteering. Then it was sports and outdoor hobbies - cars, hiking, baseball, soccer/football, etc. After that it was a whole lot of religion & belief-related groups - atheism, Buddhism, Christianity, Islam, and much more. I still have more religion & belief categories to sort through before I'm done with them.

Being out of the cultural categories, I'm not encountering many new languages (only Yiddish this time), though the religious categories had the occasional language which I've seen fewer than ten groups for.



Actual stats:

Now up to 35.07% sorted and 1.52% tagged. (Yay for being over 1/3 done with the metadata sorting now!)

Available tabs:

English: 1409
Spanish: 20
Portuguese: 10
Italian: 86
German: 2
French: 4
Chinese: 7
Indonesian/Malay: 46
Arabic: 27
Persian: 9
Turkish: 25
Romanian: 10
Unknown: 101
Spam: 20



Something fun:

There was a group called "potatocannons", with the following description:
This is a club for all you potato projectile loving people. This club is dedicated to the furtherment of potato cannon science.
doranwen: female nerds, rare and precious (Default)
[personal profile] doranwen
I've been sorting through games, games, and more games. Particularly notable was the sheer number of groups for Sims CC and Freedom Force meshes and skins. (I was one of the ones who specifically hunted for Sims CC groups so that at least does not surprise me one bit.) And there were an enormous number of RPGs, more than I ever knew existed before. I have literally dozens of tabs of purely RPGs awaiting tagging.

After the games were a host of categories for law and lawyers, military, and politics. Then a set of categories for health-related groups—doctors, support groups for medical conditions, fitness & weight-related, pregnancy-related, etc. I've just finished those and have begun sorting groups for hobbies and collecting.


Actual stats:

Now up to 30.01% sorted and 1.40% tagged.

Available tabs:

English: 1169
Spanish: 18
Portuguese: 9
Italian: 85
German: 2
French: 3
Chinese: 7
Indonesian/Malay: 38
Arabic: 22
Persian: 8
Turkish: 22
Romanian: 9
Unknown: 93
Spam: 17


Something fun:

The group "allyourbasearebelongtous" made me chuckle, with description below:

In A.D. 2101
War was beginning.
Captain: What happen ?
Operator: Somebody set up us the bomb
Operator: We get signal
Captain: What !
Operator: Main screen turn on
Captain: It's You !!
Cats: How are you gentlemen !!
Cats: All your base are belong to us
Cats: You are on the way to destruction
Captain: What you say !!
Cats: You have no chance to survive make your time
Cats: HA HA HA HA ....
Cats: Take off every 'zig'
Captain: You know what you doing
Captain: Move 'zig'
Captain: For great justice
Put related pictures in the Photos section.


Fortunately, we were able to save this group's Photos section. (This was only true of ~5% of the groups we saved.)

(There was also a group devoted to the Zero Wing game, titled "someonesetupusthebomb".)
Page generated Mar. 22nd, 2026 12:55 am
Powered by Dreamwidth Studios