In this post, I am going through a lecture note I found on Quantecon page.

Source: Quantecon

import pandas as pd
pd.set_option('display.max_columns', 6)
# Reduce decimal points to 2

pd.options.display.float_format = '{:,.2f}'.format

realwage = pd.read_csv(
    'https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/pandas_panel/realwage.csv'
)
realwage.head()  # Show first 5 rows
Unnamed: 0 Time Country Series Pay period value
0 0 2006-01-01 Ireland In 2015 constant prices at 2015 USD PPPs Annual 17,132.44
1 1 2007-01-01 Ireland In 2015 constant prices at 2015 USD PPPs Annual 18,100.92
2 2 2008-01-01 Ireland In 2015 constant prices at 2015 USD PPPs Annual 17,747.41
3 3 2009-01-01 Ireland In 2015 constant prices at 2015 USD PPPs Annual 18,580.14
4 4 2010-01-01 Ireland In 2015 constant prices at 2015 USD PPPs Annual 18,755.83

pivot_table

realwage = realwage.pivot_table(
    values='value', index='Time', columns=['Country', 'Series', 'Pay period'])
realwage.head()
Country Australia ... United States
Series In 2015 constant prices at 2015 USD PPPs In 2015 constant prices at 2015 USD exchange rates ... In 2015 constant prices at 2015 USD PPPs In 2015 constant prices at 2015 USD exchange rates
Pay period Annual Hourly Annual ... Hourly Annual Hourly
Time
2006-01-01 20,410.65 10.33 23,826.64 ... 6.05 12,594.40 6.05
2007-01-01 21,087.57 10.67 24,616.84 ... 6.24 12,974.40 6.24
2008-01-01 20,718.24 10.48 24,185.70 ... 6.78 14,097.56 6.78
2009-01-01 20,984.77 10.62 24,496.84 ... 7.58 15,756.42 7.58
2010-01-01 20,879.33 10.57 24,373.76 ... 7.88 16,391.31 7.88

5 rows × 128 columns

to_datetime

realwage.index = pd.to_datetime(realwage.index)
realwage.index
DatetimeIndex(['2006-01-01', '2007-01-01', '2008-01-01', '2009-01-01',
               '2010-01-01', '2011-01-01', '2012-01-01', '2013-01-01',
               '2014-01-01', '2015-01-01', '2016-01-01'],
              dtype='datetime64[ns]', name='Time', freq=None)

stack

realwage.stack(level='Country')
Series In 2015 constant prices at 2015 USD PPPs In 2015 constant prices at 2015 USD exchange rates
Pay period Annual Hourly Annual Hourly
Time Country
2006-01-01 Australia 20,410.65 10.33 23,826.64 12.06
Belgium 21,042.28 10.09 20,228.74 9.70
Brazil 3,310.51 1.41 2,032.87 0.87
Canada 13,649.69 6.56 14,335.12 6.89
Chile 5,201.65 2.22 3,333.76 1.42
... ... ... ... ... ...
2016-01-01 Slovenia 14,520.80 6.96 10,533.06 5.05
Spain 12,317.41 5.06 10,191.91 4.19
Turkey 12,074.76 5.79 6,741.96 3.23
United Kingdom 17,568.33 8.44 21,352.73 10.26
United States 14,892.12 7.16 14,892.12 7.16

335 rows × 4 columns

realwage['2015'].stack(level='Country')
Series In 2015 constant prices at 2015 USD PPPs In 2015 constant prices at 2015 USD exchange rates
Pay period Annual Hourly Annual Hourly
Time Country
2015-01-01 Australia 21,715.53 10.99 25,349.90 12.83
Belgium 21,588.12 10.35 20,753.48 9.95
Brazil 4,628.63 2.00 2,842.28 1.21
Canada 16,536.83 7.95 17,367.24 8.35
Chile 6,633.56 2.80 4,251.49 1.81
... ... ... ... ...
Slovenia 14,512.81 6.96 10,527.26 5.05
Spain 12,170.47 5.00 10,070.33 4.14
Turkey 10,062.42 4.82 5,618.36 2.69
United Kingdom 17,125.45 8.23 20,814.46 10.01
United States 15,080.00 7.25 15,080.00 7.25

32 rows × 4 columns

realwage_c = realwage['2015'].stack(level=[1, 2]).transpose()  # no time index
realwage_c
Time 2015-01-01
Series In 2015 constant prices at 2015 USD PPPs In 2015 constant prices at 2015 USD exchange rates
Pay period Annual Hourly Annual Hourly
Country
Australia 21,715.53 10.99 25,349.90 12.83
Belgium 21,588.12 10.35 20,753.48 9.95
Brazil 4,628.63 2.00 2,842.28 1.21
Canada 16,536.83 7.95 17,367.24 8.35
Chile 6,633.56 2.80 4,251.49 1.81
... ... ... ... ...
Slovenia 14,512.81 6.96 10,527.26 5.05
Spain 12,170.47 5.00 10,070.33 4.14
Turkey 10,062.42 4.82 5,618.36 2.69
United Kingdom 17,125.45 8.23 20,814.46 10.01
United States 15,080.00 7.25 15,080.00 7.25

32 rows × 4 columns

.xs

realwage_f = realwage.xs(
    ('Hourly', 'In 2015 constant prices at 2015 USD exchange rates'),
    level=('Pay period', 'Series'),
    axis=1)
realwage.head()
Country Australia ... United States
Series In 2015 constant prices at 2015 USD PPPs In 2015 constant prices at 2015 USD exchange rates ... In 2015 constant prices at 2015 USD PPPs In 2015 constant prices at 2015 USD exchange rates
Pay period Annual Hourly Annual ... Hourly Annual Hourly
Time
2006-01-01 20,410.65 10.33 23,826.64 ... 6.05 12,594.40 6.05
2007-01-01 21,087.57 10.67 24,616.84 ... 6.24 12,974.40 6.24
2008-01-01 20,718.24 10.48 24,185.70 ... 6.78 14,097.56 6.78
2009-01-01 20,984.77 10.62 24,496.84 ... 7.58 15,756.42 7.58
2010-01-01 20,879.33 10.57 24,373.76 ... 7.88 16,391.31 7.88

5 rows × 128 columns

realwage_f.head()
Country Australia Belgium Brazil ... Turkey United Kingdom United States
Time
2006-01-01 12.06 9.70 0.87 ... 2.27 9.81 6.05
2007-01-01 12.46 9.82 0.92 ... 2.26 10.07 6.24
2008-01-01 12.24 9.87 0.96 ... 2.22 10.04 6.78
2009-01-01 12.40 10.21 1.03 ... 2.28 10.15 7.58
2010-01-01 12.34 10.05 1.08 ... 2.30 9.96 7.88

5 rows × 32 columns

realwage.xs(('Annual', 'Brazil'), level=[2, 0], axis=1)  # () not []
Series In 2015 constant prices at 2015 USD PPPs In 2015 constant prices at 2015 USD exchange rates
Time
2006-01-01 3,310.51 2,032.87
2007-01-01 3,525.45 2,164.86
2008-01-01 3,664.39 2,250.18
2009-01-01 3,934.77 2,416.21
2010-01-01 4,145.69 2,545.73
... ... ...
2012-01-01 4,498.38 2,762.30
2013-01-01 4,616.93 2,835.10
2014-01-01 4,636.71 2,847.25
2015-01-01 4,628.63 2,842.28
2016-01-01 4,753.60 2,919.02

11 rows × 2 columns

realwage.xs(('Annual', 'Brazil'), level=[2, 0], axis=1)  # () not []
Series In 2015 constant prices at 2015 USD PPPs In 2015 constant prices at 2015 USD exchange rates
Time
2006-01-01 3,310.51 2,032.87
2007-01-01 3,525.45 2,164.86
2008-01-01 3,664.39 2,250.18
2009-01-01 3,934.77 2,416.21
2010-01-01 4,145.69 2,545.73
... ... ...
2012-01-01 4,498.38 2,762.30
2013-01-01 4,616.93 2,835.10
2014-01-01 4,636.71 2,847.25
2015-01-01 4,628.63 2,842.28
2016-01-01 4,753.60 2,919.02

11 rows × 2 columns

realwage_c.xs('Annual', level='Pay period', axis=1)
Time 2015-01-01
Series In 2015 constant prices at 2015 USD PPPs In 2015 constant prices at 2015 USD exchange rates
Country
Australia 21,715.53 25,349.90
Belgium 21,588.12 20,753.48
Brazil 4,628.63 2,842.28
Canada 16,536.83 17,367.24
Chile 6,633.56 4,251.49
... ... ...
Slovenia 14,512.81 10,527.26
Spain 12,170.47 10,070.33
Turkey 10,062.42 5,618.36
United Kingdom 17,125.45 20,814.46
United States 15,080.00 15,080.00

32 rows × 2 columns

worlddata = pd.read_csv(
    'https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/pandas_panel/countries.csv',
    sep=';')
worlddata.head()
Country (en) Country (de) Country (local) ... Deathrate Life expectancy Url
0 Afghanistan Afghanistan Afganistan/Afqanestan ... 13.70 51.30 https://www.laenderdaten.info/Asien/Afghanista...
1 Egypt Ägypten Misr ... 4.70 72.70 https://www.laenderdaten.info/Afrika/Aegypten/...
2 Åland Islands Ålandinseln Åland ... 0.00 0.00 https://www.laenderdaten.info/Europa/Aland/ind...
3 Albania Albanien Shqipëria ... 6.70 78.30 https://www.laenderdaten.info/Europa/Albanien/...
4 Algeria Algerien Al-Jaza’ir/Algérie ... 4.30 76.80 https://www.laenderdaten.info/Afrika/Algerien/...

5 rows × 17 columns

worlddata = worlddata[['Country (en)', 'Continent']]
worlddata.head(2)
Country (en) Continent
0 Afghanistan Asia
1 Egypt Africa
worlddata = worlddata.rename(
    columns={'Country (en)': 'Country',
             'Continent': 'Continent'})
worlddata
Country Continent
0 Afghanistan Asia
1 Egypt Africa
2 Åland Islands Europe
3 Albania Europe
4 Algeria Africa
... ... ...
243 Wallis and Futuna Oceania
244 Christmas Island Australia
245 Western Sahara Africa
246 Central African Republic Africa
247 Cyprus Asia

248 rows × 2 columns

realwage_f.head()
Country Australia Belgium Brazil ... Turkey United Kingdom United States
Time
2006-01-01 12.06 9.70 0.87 ... 2.27 9.81 6.05
2007-01-01 12.46 9.82 0.92 ... 2.26 10.07 6.24
2008-01-01 12.24 9.87 0.96 ... 2.22 10.04 6.78
2009-01-01 12.40 10.21 1.03 ... 2.28 10.15 7.58
2010-01-01 12.34 10.05 1.08 ... 2.30 9.96 7.88

5 rows × 32 columns

realwage_f.transpose().head()
Time 2006-01-01 00:00:00 2007-01-01 00:00:00 2008-01-01 00:00:00 ... 2014-01-01 00:00:00 2015-01-01 00:00:00 2016-01-01 00:00:00
Country
Australia 12.06 12.46 12.24 ... 12.67 12.83 12.98
Belgium 9.70 9.82 9.87 ... 10.01 9.95 9.76
Brazil 0.87 0.92 0.96 ... 1.21 1.21 1.24
Canada 6.89 6.96 7.24 ... 8.22 8.35 8.48
Chile 1.42 1.45 1.44 ... 1.76 1.81 1.91

5 rows × 11 columns

merged = pd.merge(
    realwage_f.transpose(),
    worlddata,
    how='left',
    left_index=True,
    right_on='Country')
merged.head()
2006-01-01 00:00:00 2007-01-01 00:00:00 2008-01-01 00:00:00 ... 2016-01-01 00:00:00 Country Continent
17 12.06 12.46 12.24 ... 12.98 Australia Australia
23 9.70 9.82 9.87 ... 9.76 Belgium Europe
32 0.87 0.92 0.96 ... 1.24 Brazil South America
100 6.89 6.96 7.24 ... 8.48 Canada North America
38 1.42 1.45 1.44 ... 1.91 Chile South America

5 rows × 13 columns

merged[merged.Continent.isnull()]
2006-01-01 00:00:00 2007-01-01 00:00:00 2008-01-01 00:00:00 ... 2016-01-01 00:00:00 Country Continent
247 3.42 3.74 3.87 ... 5.28 Korea NaN
247 0.23 0.45 0.39 ... 0.55 Russian Federation NaN
247 1.50 1.64 1.71 ... 2.08 Slovak Republic NaN

3 rows × 13 columns

fillna

missing_continents = {
    'Korea': 'Asia',
    'Russian Federation': 'Europe',
    'Slovak Republic': 'Europe'
}

merged.Continent = merged['Continent'].fillna(
    merged.Country.map(missing_continents))

merged
2006-01-01 00:00:00 2007-01-01 00:00:00 2008-01-01 00:00:00 ... 2016-01-01 00:00:00 Country Continent
17 12.06 12.46 12.24 ... 12.98 Australia Australia
23 9.70 9.82 9.87 ... 9.76 Belgium Europe
32 0.87 0.92 0.96 ... 1.24 Brazil South America
100 6.89 6.96 7.24 ... 8.48 Canada North America
38 1.42 1.45 1.44 ... 1.91 Chile South America
... ... ... ... ... ... ... ...
198 3.92 3.88 3.96 ... 5.05 Slovenia Europe
200 3.99 4.10 4.14 ... 4.19 Spain Europe
227 2.27 2.26 2.22 ... 3.23 Turkey Asia
241 9.81 10.07 10.04 ... 10.26 United Kingdom Europe
240 6.05 6.24 6.78 ... 7.16 United States North America

32 rows × 13 columns

merged[merged.Country == "Korea"]
2006-01-01 00:00:00 2007-01-01 00:00:00 2008-01-01 00:00:00 ... 2016-01-01 00:00:00 Country Continent
247 3.42 3.74 3.87 ... 5.28 Korea Asia

1 rows × 13 columns

merged.Country.map(missing_continents) creates an array of NaN for non missing continent and values for missing

fillna receives an array and replace the missings with the values from the input array

to_replace

list_country = ['Central America', 'North America', 'South America']
for country in list_country:
    merged.Continent.replace(
        to_replace=country, value='Americas', inplace=True)
merged.head()
2006-01-01 00:00:00 2007-01-01 00:00:00 2008-01-01 00:00:00 ... 2016-01-01 00:00:00 Country Continent
17 12.06 12.46 12.24 ... 12.98 Australia Australia
23 9.70 9.82 9.87 ... 9.76 Belgium Europe
32 0.87 0.92 0.96 ... 1.24 Brazil Americas
100 6.89 6.96 7.24 ... 8.48 Canada Americas
38 1.42 1.45 1.44 ... 1.91 Chile Americas

5 rows × 13 columns

set_index

merged = merged.set_index(['Continent', 'Country']).sort_index()
merged.head()
2006-01-01 00:00:00 2007-01-01 00:00:00 2008-01-01 00:00:00 ... 2014-01-01 00:00:00 2015-01-01 00:00:00 2016-01-01 00:00:00
Continent Country
Americas Brazil 0.87 0.92 0.96 ... 1.21 1.21 1.24
Canada 6.89 6.96 7.24 ... 8.22 8.35 8.48
Chile 1.42 1.45 1.44 ... 1.76 1.81 1.91
Colombia 1.01 1.02 1.01 ... 1.13 1.13 1.12
Costa Rica nan nan nan ... 2.41 2.56 2.63

5 rows × 11 columns

merged.columns = pd.to_datetime(merged.columns)
merged.columns = merged.columns.rename('Time')
merged
Time 2006-01-01 00:00:00 2007-01-01 00:00:00 2008-01-01 00:00:00 ... 2014-01-01 00:00:00 2015-01-01 00:00:00 2016-01-01 00:00:00
Continent Country
Americas Brazil 0.87 0.92 0.96 ... 1.21 1.21 1.24
Canada 6.89 6.96 7.24 ... 8.22 8.35 8.48
Chile 1.42 1.45 1.44 ... 1.76 1.81 1.91
Colombia 1.01 1.02 1.01 ... 1.13 1.13 1.12
Costa Rica nan nan nan ... 2.41 2.56 2.63
... ... ... ... ... ... ... ... ...
Europe Russian Federation 0.23 0.45 0.39 ... 0.61 0.56 0.55
Slovak Republic 1.50 1.64 1.71 ... 2.08 2.08 2.08
Slovenia 3.92 3.88 3.96 ... 5.01 5.05 5.05
Spain 3.99 4.10 4.14 ... 4.10 4.14 4.19
United Kingdom 9.81 10.07 10.04 ... 9.72 10.01 10.26

32 rows × 11 columns

merged = merged.transpose()
merged.head()
Continent Americas ... Europe
Country Brazil Canada Chile ... Slovenia Spain United Kingdom
Time
2006-01-01 0.87 6.89 1.42 ... 3.92 3.99 9.81
2007-01-01 0.92 6.96 1.45 ... 3.88 4.10 10.07
2008-01-01 0.96 7.24 1.44 ... 3.96 4.14 10.04
2009-01-01 1.03 7.67 1.52 ... 4.08 4.32 10.15
2010-01-01 1.08 7.94 1.56 ... 4.81 4.30 9.96

5 rows × 32 columns

index.get_level_values

import matplotlib.pyplot as plt
import matplotlib
matplotlib.style.use('seaborn')
%matplotlib inline

merged.mean().sort_values(ascending=False).plot(
    kind='bar', title="Average real minimum wage 2006 - 2016")

# Set country labels
country_labels = merged.mean().sort_values(
    ascending=False).index.get_level_values('Country').tolist()
plt.xticks(range(0, len(country_labels)), country_labels)
plt.xlabel('Country')

plt.show()

png

columns.get_level_values

merged.columns.get_level_values('Continent').tolist()
['Americas',
 'Americas',
 'Americas',
 'Americas',
 'Americas',
 'Americas',
 'Americas',
 'Asia',
 'Asia',
 'Asia',
 'Asia',
 'Australia',
 'Australia',
 'Europe',
 'Europe',
 'Europe',
 'Europe',
 'Europe',
 'Europe',
 'Europe',
 'Europe',
 'Europe',
 'Europe',
 'Europe',
 'Europe',
 'Europe',
 'Europe',
 'Europe',
 'Europe',
 'Europe',
 'Europe',
 'Europe']
merged.mean(axis=1).plot()
<matplotlib.axes._subplots.AxesSubplot at 0x1184282e8>

png

merged.head()
Continent Americas ... Europe
Country Brazil Canada Chile ... Slovenia Spain United Kingdom
Time
2006-01-01 0.87 6.89 1.42 ... 3.92 3.99 9.81
2007-01-01 0.92 6.96 1.45 ... 3.88 4.10 10.07
2008-01-01 0.96 7.24 1.44 ... 3.96 4.14 10.04
2009-01-01 1.03 7.67 1.52 ... 4.08 4.32 10.15
2010-01-01 1.08 7.94 1.56 ... 4.81 4.30 9.96

5 rows × 32 columns

merged.mean(level='Country', axis=1)
Country Australia Belgium Brazil ... Turkey United Kingdom United States
Time
2006-01-01 12.06 9.70 0.87 ... 2.27 9.81 6.05
2007-01-01 12.46 9.82 0.92 ... 2.26 10.07 6.24
2008-01-01 12.24 9.87 0.96 ... 2.22 10.04 6.78
2009-01-01 12.40 10.21 1.03 ... 2.28 10.15 7.58
2010-01-01 12.34 10.05 1.08 ... 2.30 9.96 7.88
... ... ... ... ... ... ... ...
2012-01-01 12.60 9.95 1.18 ... 2.43 9.72 7.48
2013-01-01 12.64 10.04 1.21 ... 2.48 9.65 7.38
2014-01-01 12.67 10.01 1.21 ... 2.51 9.72 7.26
2015-01-01 12.83 9.95 1.21 ... 2.69 10.01 7.25
2016-01-01 12.98 9.76 1.24 ... 3.23 10.26 7.16

11 rows × 32 columns

merged.mean(level='Continent', axis=1).plot()
<matplotlib.axes._subplots.AxesSubplot at 0x11a24a198>

png

merged.drop('Americas', level=0, axis=1)
Continent Asia ... Europe
Country Israel Japan Korea ... Slovenia Spain United Kingdom
Time
2006-01-01 5.77 5.69 3.42 ... 3.92 3.99 9.81
2007-01-01 6.03 5.75 3.74 ... 3.88 4.10 10.07
2008-01-01 5.92 5.79 3.87 ... 3.96 4.14 10.04
2009-01-01 5.84 5.99 4.00 ... 4.08 4.32 10.15
2010-01-01 5.68 6.14 3.99 ... 4.81 4.30 9.96
... ... ... ... ... ... ... ...
2012-01-01 5.82 6.35 4.18 ... 4.94 4.12 9.72
2013-01-01 5.94 6.44 4.38 ... 4.98 4.09 9.65
2014-01-01 5.91 6.39 4.64 ... 5.01 4.10 9.72
2015-01-01 6.31 6.48 4.93 ... 5.05 4.14 10.01
2016-01-01 6.59 6.65 5.28 ... 5.05 4.19 10.26

11 rows × 25 columns

merged.stack().describe()
Continent Americas Asia Australia Europe
count 69.00 44.00 22.00 200.00
mean 3.19 4.70 11.03 5.15
std 3.02 1.56 1.58 3.82
min 0.52 2.22 8.44 0.23
25% 1.03 3.37 9.56 2.02
50% 1.44 5.48 11.27 3.54
75% 6.96 5.95 12.45 9.70
max 8.48 6.65 12.98 12.39
grouped = merged.groupby(level='Continent', axis=1)
grouped
<pandas.core.groupby.DataFrameGroupBy object at 0x119fccb70>
grouped.size()
Continent
Americas      7
Asia          4
Australia     2
Europe       19
dtype: int64

groups.keys

grouped.groups.keys()
dict_keys(['Australia', 'Asia', 'Europe', 'Americas'])
grouped.get_group('Asia')['2015'].unstack()
Continent  Country  Time      
Asia       Israel   2015-01-01   6.31
           Japan    2015-01-01   6.48
           Korea    2015-01-01   4.93
           Turkey   2015-01-01   2.69
dtype: float64

grouped.get_group

import seaborn as sns

continents = grouped.groups.keys()

for continent in continents:
    sns.kdeplot(grouped.get_group(continent)[
                '2015'].unstack(), label=continent, shade=True)

plt.title('Real minimum wages in 2015')
plt.xlabel('US dollars')
plt.show()

png

Example 1

employ = pd.read_csv("https://lectures.quantecon.org/_downloads/employ.csv")
employ.set_index(pd.to_datetime(employ.DATE), inplace=True)
employ.drop(['Unnamed: 0', 'DATE'], axis=1, inplace=True)
employ.dropna(how='all', inplace=True)
employ.head()
GEO AGE UNIT SEX INDIC_EM Value
DATE
2007-01-01 European Union (28 countries) From 15 to 24 years Thousand persons Total Active population 26,839.00
2007-01-01 European Union (28 countries) From 15 to 24 years Thousand persons Total Total employment (resident population concept ... 22,669.00
2007-01-01 European Union (28 countries) From 15 to 24 years Thousand persons Males Active population 14,665.00
2007-01-01 European Union (28 countries) From 15 to 24 years Thousand persons Males Total employment (resident population concept ... 12,430.00
2007-01-01 European Union (28 countries) From 15 to 24 years Thousand persons Females Active population 12,173.00
employ = employ.pivot_table(index='DATE', columns=[
                            'AGE', 'GEO', 'SEX', 'UNIT'], values='Value')
employ
AGE From 15 to 24 years ... From 55 to 64 years
GEO Austria ... United Kingdom
SEX Females Males ... Males Total
UNIT Percentage of total population Thousand persons Percentage of total population ... Thousand persons Percentage of total population Thousand persons
DATE
2007-01-01 53.30 264.50 59.95 ... 2,390.00 58.35 4,199.50
2008-01-01 53.75 267.00 60.25 ... 2,443.00 58.90 4,272.00
2009-01-01 53.35 265.00 59.35 ... 2,444.00 58.90 4,294.00
2010-01-01 51.45 254.50 59.60 ... 2,417.00 58.60 4,290.00
2011-01-01 52.30 258.00 60.80 ... 2,392.00 58.20 4,273.50
2012-01-01 52.85 260.00 60.10 ... 2,407.50 59.60 4,330.00
2013-01-01 52.55 257.50 59.35 ... 2,448.00 61.30 4,446.50
2014-01-01 52.65 257.00 57.50 ... 2,488.50 62.25 4,551.00
2015-01-01 51.40 249.50 57.35 ... 2,545.50 63.30 4,689.00
2016-01-01 51.80 250.00 56.55 ... 2,633.00 64.60 4,873.50

10 rows × 720 columns

employ.xs('Males', level=2, axis=1)
AGE From 15 to 24 years ... From 55 to 64 years
GEO Austria Belgium ... Turkey United Kingdom
UNIT Percentage of total population Thousand persons Percentage of total population ... Thousand persons Percentage of total population Thousand persons
DATE
2007-01-01 59.95 293.50 33.00 ... 944.50 67.55 2,390.00
2008-01-01 60.25 297.00 32.85 ... 1,005.00 68.50 2,443.00
2009-01-01 59.35 291.00 31.15 ... 1,066.50 68.20 2,444.00
2010-01-01 59.60 291.50 31.25 ... 1,155.00 67.15 2,417.00
2011-01-01 60.80 296.00 30.90 ... 1,283.00 66.25 2,392.00
2012-01-01 60.10 294.50 31.40 ... 1,370.50 67.45 2,407.50
2013-01-01 59.35 291.00 29.50 ... 1,402.00 68.70 2,448.00
2014-01-01 57.50 280.50 28.40 ... 1,497.50 69.35 2,488.50
2015-01-01 57.35 280.50 28.90 ... 1,580.00 70.05 2,545.50
2016-01-01 56.55 283.00 27.35 ... 1,738.50 71.10 2,633.00

10 rows × 240 columns

for col in employ.columns.names:
    print(col, employ.columns.get_level_values(col).unique())
AGE Index(['From 15 to 24 years', 'From 25 to 54 years', 'From 55 to 64 years'], dtype='object', name='AGE')
GEO Index(['Austria', 'Belgium', 'Bulgaria', 'Croatia', 'Cyprus', 'Czech Republic',
       'Denmark', 'Estonia', 'Euro area (17 countries)',
       'Euro area (18 countries)', 'Euro area (19 countries)',
       'European Union (15 countries)', 'European Union (27 countries)',
       'European Union (28 countries)', 'Finland',
       'Former Yugoslav Republic of Macedonia, the', 'France',
       'France (metropolitan)',
       'Germany (until 1990 former territory of the FRG)', 'Greece', 'Hungary',
       'Iceland', 'Ireland', 'Italy', 'Latvia', 'Lithuania', 'Luxembourg',
       'Malta', 'Netherlands', 'Norway', 'Poland', 'Portugal', 'Romania',
       'Slovakia', 'Slovenia', 'Spain', 'Sweden', 'Switzerland', 'Turkey',
       'United Kingdom'],
      dtype='object', name='GEO')
SEX Index(['Females', 'Males', 'Total'], dtype='object', name='SEX')
UNIT Index(['Percentage of total population', 'Thousand persons'], dtype='object', name='UNIT')
employ.columns = employ.columns.swaplevel(0, 1)
employ = employ.sort_index(axis=1)
employ
GEO Austria ... United Kingdom
AGE From 15 to 24 years ... From 55 to 64 years
SEX Females Males ... Males Total
UNIT Percentage of total population Thousand persons Percentage of total population ... Thousand persons Percentage of total population Thousand persons
DATE
2007-01-01 53.30 264.50 59.95 ... 2,390.00 58.35 4,199.50
2008-01-01 53.75 267.00 60.25 ... 2,443.00 58.90 4,272.00
2009-01-01 53.35 265.00 59.35 ... 2,444.00 58.90 4,294.00
2010-01-01 51.45 254.50 59.60 ... 2,417.00 58.60 4,290.00
2011-01-01 52.30 258.00 60.80 ... 2,392.00 58.20 4,273.50
2012-01-01 52.85 260.00 60.10 ... 2,407.50 59.60 4,330.00
2013-01-01 52.55 257.50 59.35 ... 2,448.00 61.30 4,446.50
2014-01-01 52.65 257.00 57.50 ... 2,488.50 62.25 4,551.00
2015-01-01 51.40 249.50 57.35 ... 2,545.50 63.30 4,689.00
2016-01-01 51.80 250.00 56.55 ... 2,633.00 64.60 4,873.50

10 rows × 720 columns

geo_list = employ.columns.get_level_values('GEO').unique().tolist()
countries = [x for x in geo_list if not x.startswith('Euro')]
employ = employ[countries]
employ.columns.get_level_values('GEO').unique()
Index(['Austria', 'Belgium', 'Bulgaria', 'Croatia', 'Cyprus', 'Czech Republic',
       'Denmark', 'Estonia', 'Finland',
       'Former Yugoslav Republic of Macedonia, the', 'France',
       'France (metropolitan)',
       'Germany (until 1990 former territory of the FRG)', 'Greece', 'Hungary',
       'Iceland', 'Ireland', 'Italy', 'Latvia', 'Lithuania', 'Luxembourg',
       'Malta', 'Netherlands', 'Norway', 'Poland', 'Portugal', 'Romania',
       'Slovakia', 'Slovenia', 'Spain', 'Sweden', 'Switzerland', 'Turkey',
       'United Kingdom'],
      dtype='object', name='GEO')
geog = employ.columns.get_level_values('GEO').unique().tolist()
geog2 = [i for i in geog if not i.startswith('Euro')]  # starts
employ = employ[geog2]
employ_f = employ.xs(('Percentage of total population'),
                     level=('UNIT'),
                     axis=1)
employ_f.head()
GEO Austria ... United Kingdom
AGE From 15 to 24 years ... From 55 to 64 years
SEX Females Males Total ... Females Males Total
DATE
2007-01-01 53.30 59.95 56.60 ... 49.35 67.55 58.35
2008-01-01 53.75 60.25 56.95 ... 49.60 68.50 58.90
2009-01-01 53.35 59.35 56.30 ... 49.90 68.20 58.90
2010-01-01 51.45 59.60 55.55 ... 50.30 67.15 58.60
2011-01-01 52.30 60.80 56.55 ... 50.40 66.25 58.20

5 rows × 306 columns

employ_f = employ_f.drop('Total', level='SEX', axis=1)
employ_f
GEO Austria ... United Kingdom
AGE From 15 to 24 years From 25 to 54 years ... From 25 to 54 years From 55 to 64 years
SEX Females Males Females ... Males Females Males
DATE
2007-01-01 53.30 59.95 78.60 ... 89.90 49.35 67.55
2008-01-01 53.75 60.25 79.35 ... 89.65 49.60 68.50
2009-01-01 53.35 59.35 80.25 ... 88.70 49.90 68.20
2010-01-01 51.45 59.60 80.65 ... 88.40 50.30 67.15
2011-01-01 52.30 60.80 81.50 ... 88.80 50.40 66.25
2012-01-01 52.85 60.10 82.20 ... 89.30 52.00 67.45
2013-01-01 52.55 59.35 82.50 ... 89.35 54.15 68.70
2014-01-01 52.65 57.50 82.40 ... 90.10 55.40 69.35
2015-01-01 51.40 57.35 82.35 ... 90.10 56.85 70.05
2016-01-01 51.80 56.55 82.75 ... 90.60 58.30 71.10

10 rows × 204 columns

box = employ_f['2015'].unstack().reset_index()
box
GEO AGE SEX DATE 0
0 Austria From 15 to 24 years Females 2015-01-01 51.40
1 Austria From 15 to 24 years Males 2015-01-01 57.35
2 Austria From 25 to 54 years Females 2015-01-01 82.35
3 Austria From 25 to 54 years Males 2015-01-01 89.10
4 Austria From 55 to 64 years Females 2015-01-01 39.50
... ... ... ... ... ...
199 United Kingdom From 15 to 24 years Males 2015-01-01 55.25
200 United Kingdom From 25 to 54 years Females 2015-01-01 78.25
201 United Kingdom From 25 to 54 years Males 2015-01-01 90.10
202 United Kingdom From 55 to 64 years Females 2015-01-01 56.85
203 United Kingdom From 55 to 64 years Males 2015-01-01 70.05

204 rows × 5 columns

sns.boxplot(
    x="AGE", y=0, hue="SEX", data=box, palette=("husl"), showfliers=False)
<matplotlib.axes._subplots.AxesSubplot at 0x11ae58b38>

png