In this short piece, I use public Wikipedia data, Python programming, and network analysis to extract and draw up a network of Oscar-winning actors and actresses.
All images were created by the author.
Wikipedia, as the largest free, crowdsourced online encyclopedia, serves as a tremendously rich data source on various public domains. Many of these domains, from film to politics, involve various layers of networks underneath, expressing different sorts of social phenomena such as collaboration. Due to the approaching Academy Awards Ceremony, here I show the example of Oscar-winning actors and actresses on how we can use simple Pythonic methods to turn Wiki sites into networks.
First, let’s take a look at how, for instance, the Wiki list of all Oscar-winning actors is structured:
This subpage nicely shows all the people who have ever received an Oscar and have been granted a Wiki profile (most likely, no actors and actresses were missed by the fans). In this article, I focus on acting, which can be found in the following four subpages — including main and supporting actors and actresses:
urls = { 'actor' :'https://en.wikipedia.org/wiki/Category:Best_Actor_Academy_Award_winners',
'actress' : 'https://en.wikipedia.org/wiki/Category:Best_Actress_Academy_Award_winners',
'supporting_actor' : 'https://en.wikipedia.org/wiki/Category:Best_Supporting_Actor_Academy_Award_winners',
'supporting_actress' : 'https://en.wikipedia.org/wiki/Category:Best_Supporting_Actress_Academy_Award_winners'}
Now let’s write a simple block of code that checks each of these four listings, and using the packages urllib and beautifulsoup, extracts the name of all artists:
from urllib.request import urlopen
import bs4 as bs
import re# Iterate across the four categories
people_data = []
for category, url in urls.items():
# Query the name listing page and…
Be the first to comment