A website dedicated to providing accessible statistics and interactive visualizations.
As a long-time shiny Pokemon hunter and full-time Analytics Engineer, I’ve always thought it would be interesting to see aggregated statistics for shiny hunting community events. Thus, for Safari Week 2023, I challenged myself to create a visualization that displays easily digestible information about people’s shiny hunts, such as which Pokemon they shiny hunted and how many encounters it took for them to find the shiny. Since the shiny hunting community’s main source of communication is Twitter, I used Tweepy, a Python wrapper that accesses Twitter’s API, to extract everyone’s shiny hunts. In order to determine which tweets to extract, I pulled tweets that contained the hashtags “#safariweek” or “#safariweek2023”. Of course this is not perfect as it may exclude tweets containing valid shinies (but were posted without the appropriate hashtag), but the tradeoff is probably worth it as we also exclude shinies that were found during Safari Week but are not actual safari shinies. Another limitation of this is that the Twitter API only allows applications to extract tweets from the last seven days. Since I had started this project on June 6th, I missed some safari shinies that people had found before May 31st. Furthermore, I included tweets containing the hashtag “#safariweek” on June 11th (I had previously limited my query to only “#safariweek2023” tweets) which in turn means I may be missing tweets from before June 4th that used the hashtag “#safariweek” but not “#safariweek2023”.
Once I had extracted these tweets, I looked through them and realized that the people within the shiny hunting community have VERY different ways of tweeting about their shinies! Some people include the Pokemon name and number of encounters, but a lot of people don’t include either piece (or even both pieces) of information. Therefore, instead of creating a hard-coded text extraction program, I decided to utilize OpenAI’s text model Davinci to extract the information I wanted out of these tweets. After performing some initial extractions, I quickly realized the model was too generalized to do a good job with this hyper-specific use-case. Thus I used fine-tuning to further train the model by giving it a dataset that I had manually classified, thereby “teaching” the model the exact kind of output I wanted. You can see some examples of the model’s extraction efforts below:
Shiny Rhyhorn in LeafGreen after 20,884 RE's! Caught after 1 ball! My 2nd shiny of safari week.#SafariWeek2023 #SafariWeek pic.twitter.com/1zg4pv58R2
— Dark Amps (@AmpsDark) June 8, 2023
Second shiny of #safariweek2023, managed to actually record this one fully this time pic.twitter.com/daFEK4VHob
— CannedWolfMeat (@Canned_Wolfmeat) June 8, 2023
Yesterday we were able to find this majestic Magneton! #safariweek2023 pic.twitter.com/wupgnwbFTQ
— Joesby (@Joesby07) June 8, 2023
SAFARI WEEK BB! 1hr in and I caught a Shiny Lapras and Fail a Shiny Abra! What is my luck tonight!!!#safariweek2023 & @pkmnpridehttps://t.co/YHUTv3vU2A pic.twitter.com/TF3r98MLUA
— Sentyal | PokéTuber (@Sentyal) June 3, 2023
#safariweek2023 hunters, time for a Safari Week check in!!
— gr3atScotty (@gr3atscotty) June 10, 2023
I’ll go first:
Total Encounters: 12,616
Wins:0
Fails:1
Most recent Shiny: Riolu, 6224
Targets: Riolu, exeggcute, Pinsir, chansey, anything in Emerald.#shinypokemon #shiny #pokemon
Spinda is chilling in the irl safari.
— The Daily Spinda (@TheDailySpinda) June 7, 2023
Shout out to those of us who hunted for safari week and came up short. It's still great to hunt alongside the community!#spinda #safariweek2023 pic.twitter.com/oNSajELZdI
My wife found her first 8192 shiny today! Welcome Puff Daddy♀ who will be very loved. She doesn't have twitter but with #safariweek2023 wrapping up she wanted me to share with everyone. #pokemon #ShinyPokemon pic.twitter.com/maFgkHlnca
— Beanhurst (@beanhurstpkmn) June 10, 2023
As you can see, the model does a surprisingly (and scarily?) good job at extracting the information I need! Of course the model is not perfect, and as such I’ve had to manually go through some of the extractions and update them to more accurately align with the tweet text. However, if the tweet is formatted in a straightforward way, the model will accurately extract data almost 100% of the time. Note that a straightforward format for a tweet is something like this, where the tweet is in English, Pokemon name is easily identifiable, and the encounter number is close to the Pokemon name and/or a phrase designating the number of encounters (i.e. “encounters”, “REs”, “RAs”, etc). If you notice any incorrect information in the chart(s) above, or if you’d like a tweet included in the chart(s), please feel free to fill out the form below.
After this data had been extracted and cleaned, I did a couple of transformations to it to make it more digestible for Highcharts to consume. You can find the script where I do all of these data extractions, cleans, transformations, and more here.
My future goals for this project are to continue training the model so that it can be used to extract other pieces of information (such as the game the Pokemon was hunted on, or if the shiny hunter successfully caught the Pokemon or not), and I’d also like to create more ways to visualize shiny hunters’ efforts, so if you have any ideas or would like to see something specific don’t hesitate to reach out (Twitter DMs usually work best)!