Analysis of E-Commerce Data: Ratings

This is the second post I’ve written using the dataset “Sales of summer clothes in E-commerce Wish” by Jeffrey Mvutu Mabilama, the dataset is licensed under CC BY 4.0. The first post covers the use of merchant profile pictures and can be found here.

The 5-star rating system is ubiquitous, seen on just about every platform from ride-sharing to online shopping, if you’re starting out as an e-commerce merchant, you can expect to have a rating for each of your products and as a merchant overall. Let’s gain a better understanding of the 5-star system by going over the graphic I’ve created below.

A graphic showing the tendency of product ratings to concentrate around 4.0 as the number of ratings increases. On the right is a plot showing how often each individual rating appears in the data.

There are two concepts at work in the graphic above that we should discuss, the first is the law of large numbers, and the second is selection bias. If you were imagining a 5-star rating system, you’d imagine that 3 would be average, 4 would be above average, and 5 would be spectacular. But, looking at the graphic, the chart on the left shows that we can estimate the average to be closer to 4.0, checking the data we find the actual average to be 3.82.

The law of large numbers allows us to make this estimate of the mean fairly easily, the law of large numbers states that as the size of the sample increases, the mean of the sample more accurately represents the mean of the population. So, when we see that when a merchant has relatively few ratings then their rating tends to be unpredictable, but when a merchant has many ratings, they tend to end up around 4.0.

Our previous observation seems to tell us something else about the nature of the ratings in our dataset, that there doesn’t appear to be much variance between vendors. Look at the four vendors stacked around 17,000 – 18,000 ratings. If the one on top is very good and ends up at 4.5, and the one on bottom is not very good and ends up at 3.5, is that enough of a spread to allow consumers to make a meaningful decision?

This question allows us to look more into selection bias, which states that the process for selection participants is not truly random and therefore not likely indicative of reality. In this case we can look at the graph on the right, how likely is it that the vast majority of consumers ordering from Wish truly had a 5-star experience? I’d venture to say not very, but we still see that 5-star ratings tower over any other rating. This is a known issue with surveys and ratings, you’re much more likely to have only the happiest and angriest customers leave a rating than you are to get many responses from customers who had an “average” experience.

The takeaway here? Always take your ratings with a grain of salt, the 5-star rating seems to be the default on many platforms. If you’d like to build the graphic above, here’s some python code for you to modify:

# Create figure
fig = plt.figure(figsize = (13,7))

# Create grid
gs = GridSpec(3,3, figure = fig)

# Add axes to grid
ax1 = fig.add_subplot(gs[:,:2])
ax2 = fig.add_subplot(gs[:,-1])

# ratings by ratings count
ax1.scatter(df['rating_count'], df['rating'])
ax1.set_title('Rating by Ratings Count')
ax1.set_xlabel('Number of Ratings Received')

# count of ratings by rating
q = df.filter(items=['rating_five_count', 'rating_four_count', 'rating_three_count', 'rating_two_count', 'rating_one_count'])
q.columns = [5,4,3,2,1], q.sum()/1000)
ax2.set_title('Count of Ratings by Rating')
ax2.set_ylabel('Count of Ratings (thousands)')

# Show figure
plt.savefig('Ratings_Count.png', dpi = 300)

Leave a Reply