AFK Data

A website dedicated to providing accessible statistics and interactive visualizations.


Created and maintained by Alec Howard

Safari Week 2023

Shiny Pokemon Explorer

Features Drilldowns

Preview:


Encounter Histogram

Preview:


Rare Encounters

Preview:


Highlighted Hunts

Preview:


Exposition

As a long-time shiny Pokemon hunter and full-time Analytics Engineer, I’ve always thought it would be interesting to see aggregated statistics for shiny hunting community events. Thus, for Safari Week 2023, I challenged myself to create a visualization that displays easily digestible information about people’s shiny hunts, such as which Pokemon they shiny hunted and how many encounters it took for them to find the shiny. Since the shiny hunting community’s main source of communication is Twitter, I used Tweepy, a Python wrapper that accesses Twitter’s API, to extract everyone’s shiny hunts. In order to determine which tweets to extract, I pulled tweets that contained the hashtags “#safariweek” or “#safariweek2023”. Of course this is not perfect as it may exclude tweets containing valid shinies (but were posted without the appropriate hashtag), but the tradeoff is probably worth it as we also exclude shinies that were found during Safari Week but are not actual safari shinies. Another limitation of this is that the Twitter API only allows applications to extract tweets from the last seven days. Since I had started this project on June 6th, I missed some safari shinies that people had found before May 31st. Furthermore, I included tweets containing the hashtag “#safariweek” on June 11th (I had previously limited my query to only “#safariweek2023” tweets) which in turn means I may be missing tweets from before June 4th that used the hashtag “#safariweek” but not “#safariweek2023”.

Once I had extracted these tweets, I looked through them and realized that the people within the shiny hunting community have VERY different ways of tweeting about their shinies! Some people include the Pokemon name and number of encounters, but a lot of people don’t include either piece (or even both pieces) of information. Therefore, instead of creating a hard-coded text extraction program, I decided to utilize OpenAI’s text model Davinci to extract the information I wanted out of these tweets. After performing some initial extractions, I quickly realized the model was too generalized to do a good job with this hyper-specific use-case. Thus I used fine-tuning to further train the model by giving it a dataset that I had manually classified, thereby “teaching” the model the exact kind of output I wanted. You can see some examples of the model’s extraction efforts below:

As you can see, the model does a surprisingly (and scarily?) good job at extracting the information I need! Of course the model is not perfect, and as such I’ve had to manually go through some of the extractions and update them to more accurately align with the tweet text. However, if the tweet is formatted in a straightforward way, the model will accurately extract data almost 100% of the time. Note that a straightforward format for a tweet is something like this, where the tweet is in English, Pokemon name is easily identifiable, and the encounter number is close to the Pokemon name and/or a phrase designating the number of encounters (i.e. “encounters”, “REs”, “RAs”, etc). If you notice any incorrect information in the chart(s) above, or if you’d like a tweet included in the chart(s), please feel free to fill out the form below.

After this data had been extracted and cleaned, I did a couple of transformations to it to make it more digestible for Highcharts to consume. You can find the script where I do all of these data extractions, cleans, transformations, and more here.

My future goals for this project are to continue training the model so that it can be used to extract other pieces of information (such as the game the Pokemon was hunted on, or if the shiny hunter successfully caught the Pokemon or not), and I’d also like to create more ways to visualize shiny hunters’ efforts, so if you have any ideas or would like to see something specific don’t hesitate to reach out (Twitter DMs usually work best)!

Request Form


Credits