How to Scrape from Google Play Store for Sentiment Analysis

 Hi again!

Recently I've been tweaking around with a library specifically used to scrape Google Play Store. It is quite easy and simple. We can use the data to practice for sentiment analysis project. As according to HBR, online reviews are transforming the way consumers choose products and services. So this is quite a big deal for competing developers, since reviews can give consumers the confidence they need to use certain apps, in the case of Google Play Store. Aren't we all related  with this too? :D

imagesource : theverge.com
imgsource : theverge.com

Other thing is, I also had some fun to customize scraped data using Pygments. Which I will also explain in this post. IDK if this is rather important tho. But I think it will be helpful if we want to spice up the boring json looks with some flying colors. 

So here we go :)

The library is called : google-play-scraper. It is basically a Node.js scraper to get data from Google Play created by JoMingyu. You can go to his github page here.

This time I used colab.research.google for the IDE, since this library seems does not compatible in conda environment. It needs installation using pip install  (if I missing out of this information, please help to share). Also I don't run python directly through my OS, so this is the best option I can have for now. 

First let's install the library. 

For this example, I will scrape one of the Indonesian credit solution service app provided by a local bank. We will need to insert the name of the app. You can get it on it's link. It is between '=' and before '&hl'. In this case, the app name is : com.mtf.calculator.external



I imported some libraries and getting the app info by using below script only. Isn't that quite handy? 

If we print the info variable, this is that we got. The data is on Json. 


We can save them to csv just right away. 

Next we will scrape the reviews. I'll be using just the default parameters here. There are other couple parameters that we can customize, depends on what we need. Actually, Google Play Store is limiting  only up to 200 reviews which can be fetched at a time. Thus the http requests is one per 200 reviews. You can bear in mind, if you want to fetch from Whatsapp Messenger, which has 127 millions of reviews, so it will take about 635.000 requests for all of it. BAM. 

For this case we got 1128 reviews. Below is the scraped data result for the app review:


Now I'm going to use Pygments to beautify the look of our json, by running below script. We can specify the colors, background colors, borders and so on for every token elements (like key, number, string, comment, etc). This library also have some presets that we can choose. 

The result is.....

app info :

App reviews :

I know it may doesn't look cool with those color combinations. But hey, why don't you try it? ;D


Thanks for reading ! <3 

For the .ipynb file, feel free to take a look on my github page.


Source :

https://github.com/JoMingyu/google-play-scraper

https://hbr.org/2019/11/designing-better-online-review-systems

Credits for Venelin Valkov on curiousily.com, for the tutorial.

Comments