AFK Data

A website dedicated to providing accessible statistics and interactive visualizations.

Safari Week 2023

Exposition

As a long-time shiny Pokemon hunter and full-time Analytics Engineer, I’ve always thought it would be interesting to see aggregated statistics for shiny hunting community events. Thus, for Safari Week 2023, I challenged myself to create a visualization that displays easily digestible information about people’s shiny hunts, such as which Pokemon they shiny hunted and how many encounters it took for them to find the shiny. Since the shiny hunting community’s main source of communication is Twitter, I used Tweepy, a Python wrapper that accesses Twitter’s API, to extract everyone’s shiny hunts. In order to determine which tweets to extract, I pulled tweets that contained the hashtags “#safariweek” or “#safariweek2023”. Of course this is not perfect as it may exclude tweets containing valid shinies (but were posted without the appropriate hashtag), but the tradeoff is probably worth it as we also exclude shinies that were found during Safari Week but are not actual safari shinies. Another limitation of this is that the Twitter API only allows applications to extract tweets from the last seven days. Since I had started this project on June 6th, I missed some safari shinies that people had found before May 31st. Furthermore, I included tweets containing the hashtag “#safariweek” on June 11th (I had previously limited my query to only “#safariweek2023” tweets) which in turn means I may be missing tweets from before June 4th that used the hashtag “#safariweek” but not “#safariweek2023”.

Once I had extracted these tweets, I looked through them and realized that the people within the shiny hunting community have VERY different ways of tweeting about their shinies! Some people include the Pokemon name and number of encounters, but a lot of people don’t include either piece (or even both pieces) of information. Therefore, instead of creating a hard-coded text extraction program, I decided to utilize OpenAI’s text model Davinci to extract the information I wanted out of these tweets. After performing some initial extractions, I quickly realized the model was too generalized to do a good job with this hyper-specific use-case. Thus I used fine-tuning to further train the model by giving it a dataset that I had manually classified, thereby “teaching” the model the exact kind of output I wanted. You can see some examples of the model’s extraction efforts below:

Shiny Rhyhorn in LeafGreen after 20,884 RE's! Caught after 1 ball! My 2nd shiny of safari week.#SafariWeek2023 #SafariWeek pic.twitter.com/1zg4pv58R2
— Dark Amps (@AmpsDark) June 8, 2023

Extracted information: “Rhyhorn: 20884” → Correct!

Second shiny of #safariweek2023, managed to actually record this one fully this time pic.twitter.com/daFEK4VHob
— CannedWolfMeat (@Canned_Wolfmeat) June 8, 2023

Extracted information: “No Pokemon found” → Correct given that the text doesn’t mention any specific Pokemon or encounters

Yesterday we were able to find this majestic Magneton! #safariweek2023 pic.twitter.com/wupgnwbFTQ
— Joesby (@Joesby07) June 8, 2023

Extracted information: “Magneton: 0” → Correct! Pokemon name was found but no encounters were given

SAFARI WEEK BB! 1hr in and I caught a Shiny Lapras and Fail a Shiny Abra! What is my luck tonight!!!#safariweek2023 & @pkmnpride https://t.co/YHUTv3vU2A pic.twitter.com/TF3r98MLUA
— Sentyal | PokéTuber (@Sentyal) June 3, 2023

Extracted information: “Lapras: 0 and Abra: 0” → Correct! The model successfully extracted two Pokemon from one tweet!

#safariweek2023 hunters, time for a Safari Week check in!!

I’ll go first:

Total Encounters: 12,616
Wins:0
Fails:1
Most recent Shiny: Riolu, 6224

Targets: Riolu, exeggcute, Pinsir, chansey, anything in Emerald.#shinypokemon #shiny #pokemon
— gr3atScotty (@gr3atscotty) June 10, 2023

Extracted information: “Riolu: 6224” → Correct! Somehow the model did not get confused by the 12,616 encounter number, instead opting for the correct number 6224

Spinda is chilling in the irl safari.

Shout out to those of us who hunted for safari week and came up short. It's still great to hunt alongside the community!#spinda #safariweek2023 pic.twitter.com/oNSajELZdI
— The Daily Spinda (@TheDailySpinda) June 7, 2023

Extracted information: “Spinda: 0” → Incorrect but understandable why it got confused here

My wife found her first 8192 shiny today! Welcome Puff Daddy♀ who will be very loved. She doesn't have twitter but with #safariweek2023 wrapping up she wanted me to share with everyone. #pokemon #ShinyPokemon pic.twitter.com/maFgkHlnca
— Beanhurst (@beanhurstpkmn) June 10, 2023

Extracted information: “Puff Daddy: 8192” → Incorrect; Puff Daddy is (unfortunately) not a valid Pokemon name, and 8192 is not the correct number of encounters. Even though technically this tweet contains a shiny Bibarel, this should have simply been “No Pokemon found” since the text contains neither a valid Pokemon name nor an encounter number.

As you can see, the model does a surprisingly (and scarily?) good job at extracting the information I need! Of course the model is not perfect, and as such I’ve had to manually go through some of the extractions and update them to more accurately align with the tweet text. However, if the tweet is formatted in a straightforward way, the model will accurately extract data almost 100% of the time. Note that a straightforward format for a tweet is something like this, where the tweet is in English, Pokemon name is easily identifiable, and the encounter number is close to the Pokemon name and/or a phrase designating the number of encounters (i.e. “encounters”, “REs”, “RAs”, etc). If you notice any incorrect information in the chart(s) above, or if you’d like a tweet included in the chart(s), please feel free to fill out the form below.

After this data had been extracted and cleaned, I did a couple of transformations to it to make it more digestible for Highcharts to consume. You can find the script where I do all of these data extractions, cleans, transformations, and more here.

My future goals for this project are to continue training the model so that it can be used to extract other pieces of information (such as the game the Pokemon was hunted on, or if the shiny hunter successfully caught the Pokemon or not), and I’d also like to create more ways to visualize shiny hunters’ efforts, so if you have any ideas or would like to see something specific don’t hesitate to reach out (Twitter DMs usually work best)!

Request Form

Credits

Matt Brandl for organizing and coordinating Safari Week
Msikma for the shiny Pokemon icons used above
Tweepy for making a great, easy to use Tweet scraper for Python
OpenAI for creating our AI overlords
Highcharts for their super pretty visualizations
✨ The shiny Pokemon community for being awesome (and for tweeting their shinies) ✨

AFK Data

Safari Week 2023

Shiny Pokemon Explorer

Features Drilldowns

Preview:

Encounter Histogram

Preview:

Rare Encounters

Preview:

Highlighted Hunts

Preview:

Exposition

Request Form

Credits