Dataset of adult income
 DataSet Overveiw

Each row is labelled as either having a salary greater than ">50K" or "<=50K".
This Data set is split into two CSV files, named adult-training.csv and adult-test.csv.
 To Build a binary classifier on the training dataset to predict the column income_bracket which has two possible values ">50K" and "<=50K" and evaluate the accuracy of the classifier with the test dataset.
categorical_columns = [workclass, education, marital_status, occupation, relationship, race, gender, native_country]
continuous_columns = [age, education_num, capital_gain, capital_loss, hours_per_week]
A set of reasonably clean records was extracted using the following conditions: ((AAGE>16) && (AGI>100) && (AFNLWGT>1)&& (HRSWK>0))

Prediction task is to determine whether a person makes over 50K a year.

Dataset Source: https://archive.ics.uci.edu/ml/datasets/census+income, http://mlr.cs.umass.edu/ml/machine-learning-databases/adult/

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import warnings
warnings.filterwarnings("ignore")
%matplotlib inline

train_file = "adult-training.csv"

columns = ['Age','Workclass','fnlgwt','Education','Education_num','Marital_Status',
           'Occupation','Relationship','Race','Sex','Capital_Gain','Capital_Loss',
           'Hours/Week','Native_country','Income']

#collapse-hide

train = pd.read_csv(train_file, names=columns)
train.head()

train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32561 entries, 0 to 32560
Data columns (total 15 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   Age             32561 non-null  int64 
 1   Workclass       32561 non-null  object
 2   fnlgwt          32561 non-null  int64 
 3   Education       32561 non-null  object
 4   Education_num   32561 non-null  int64 
 5   Marital_Status  32561 non-null  object
 6   Occupation      32561 non-null  object
 7   Relationship    32561 non-null  object
 8   Race            32561 non-null  object
 9   Sex             32561 non-null  object
 10  Capital_Gain    32561 non-null  int64 
 11  Capital_Loss    32561 non-null  int64 
 12  Hours/Week      32561 non-null  int64 
 13  Native_country  32561 non-null  object
 14  Income          32561 non-null  object
dtypes: int64(6), object(9)
memory usage: 3.7+ MB

train.shape

(32561, 15)

train.describe()

# Replacing '?' with nan 
train.replace(' ?', np.nan, inplace=True)

train.isnull().sum()

Age                  0
Workclass         1836
fnlgwt               0
Education            0
Education_num        0
Marital_Status       0
Occupation        1843
Relationship         0
Race                 0
Sex                  0
Capital_Gain         0
Capital_Loss         0
Hours/Week           0
Native_country     583
Income               0
dtype: int64

Missing Data:

Workclass(1836), Occupation(1843), Native_country(583)

Important: All the missing data belongs to Categorical data

train['Income'].value_counts()

 <=50K    24720
 >50K      7841
Name: Income, dtype: int64

sns.countplot(train['Income'])
plt.title("Count of Income Category")
plt.show()

Warning: Dataset is Imbalenced with Majority class label <=50k.

- 75.91% data points labeled <=50k

24.08% data points labeled <50k

# Gender distribution
sns.countplot(train['Sex'])
plt.title("Count of Sex Category")
plt.show()

sns.stripplot(x='Sex', y='Hours/Week', data=train,hue='Income',marker='X')

<matplotlib.axes._subplots.AxesSubplot at 0x19ac5a67ec8>

# Workclass
wclass_plot = sns.countplot(train['Workclass'])
wclass_plot.set_xticklabels( wclass_plot.get_xticklabels(),rotation=50, ha="right")
plt.title("Count Plot of Workclass")

Text(0.5, 1.0, 'Count Plot of Workclass')

Private class working people are overall High in count

train['Education'].value_counts()

 HS-grad         10501
 Some-college     7291
 Bachelors        5355
 Masters          1723
 Assoc-voc        1382
 11th             1175
 Assoc-acdm       1067
 10th              933
 7th-8th           646
 Prof-school       576
 9th               514
 12th              433
 Doctorate         413
 5th-6th           333
 1st-4th           168
 Preschool          51
Name: Education, dtype: int64

# Occupation
occ_plot = sns.countplot(train['Occupation'])
occ_plot.set_xticklabels(occ_plot.get_xticklabels(), rotation=40, ha="right")
plt.title("Count Plot of Occupation")

Text(0.5, 1.0, 'Count Plot of Occupation')

fig, axs = plt.subplots(ncols=2, nrows=4, figsize=(20, 20))
plt.subplots_adjust(hspace=0.68)
fig.delaxes(axs[3][1])
fig.suptitle('Subplot of Various Categorical Variables') 

# Workclass
wc_plot = sns.countplot(train['Workclass'], ax=axs[0][0])
wc_plot.set_xticklabels(wc_plot.get_xticklabels(), rotation=40, ha="right")
axs[0][0].title.set_text('Count Plot of Workclass')

# Native country
nc_plot = sns.countplot(train['Native_country'], ax=axs[0][1])
nc_plot.set_xticklabels(nc_plot.get_xticklabels(), rotation=72, ha="right")
axs[0][1].title.set_text('Count plot of Native_country')

# Education
ed_plot = sns.countplot(train['Education'], ax=axs[1][0])
ed_plot.set_xticklabels(ed_plot.get_xticklabels(), rotation=40, ha="right")
axs[1][0].title.set_text('Count Plot of Education')

# Marital status
ms_plot = sns.countplot(train['Marital_Status'], ax=axs[1][1])
ms_plot.set_xticklabels(ms_plot.get_xticklabels(), rotation=40, ha="right")
axs[1][1].title.set_text('Count Plot of Martial Status')

# Relationship
rel_plot = sns.countplot(train['Relationship'], ax=axs[2][0])
rel_plot.set_xticklabels(rel_plot.get_xticklabels(), rotation=40, ha="right")
axs[2][0].title.set_text('Count Plot of Relationship')

# Race
race_plot = sns.countplot(train['Race'], ax=axs[2][1])
race_plot.set_xticklabels(race_plot.get_xticklabels(), rotation=40, ha="right")
axs[2][1].title.set_text('Count Plot of Race')

# Occupation
occ_plot = sns.countplot(train['Occupation'], ax=axs[3][0])
occ_plot.set_xticklabels(occ_plot.get_xticklabels(), rotation=40, ha="right")
axs[3][0].title.set_text('Count Plot of Occupation')

Note: #### Majority count aggrigation in each column:

- Workclass:

- Private : 22696

Native_country:
- United-States : 29170
Education:
- Hs-grad : 10501
Marital_Status:
- Married-civ-spouse : 14976
Relationship
- Husband : 13193
Race
- White : 27816
Occupation:
- Prof-specialty : 4140

Note: #### Minority count aggrigation in each column:

- Workclass:

- Never-worked : 7

Native_country:
- Holand-Netherlands : 1
Education:
- Preschool : 51
Marital_Status:
- Married-AF-spouse : 23
Relationship
- Other-relative : 981
Race:
- other : 271
Occupation:
- Armed-Forces : 9

plt.figure(figsize=(20, 6))
sns.countplot(train['Marital_Status'], hue=train['Income'])
plt.title("Count Plot of Maritial Status with Hue Income")

Text(0.5, 1.0, 'Count Plot of Maritial Status with Hue Income')

Most of the Never Married people are under Income of <=50k

plt.figure(figsize=(20, 6))
sns.countplot(train['Relationship'], hue=train['Income'])
plt.title("Count Plot of Relationship with Hue Income")

Text(0.5, 1.0, 'Count Plot of Relationship with Hue Income')

plt.figure(figsize=(20, 6))
sns.countplot(train['Age'], hue=train['Income'])
plt.title("Count Plot of Age with Hue Income")

Text(0.5, 1.0, 'Count Plot of Age with Hue Income')

sns.set_style("whitegrid")
sns.pairplot(train, hue="Income", size=3)
plt.show()

#collapse-hide

# Age with Income
sns.FacetGrid(train, hue="Income", size=6) \
   .map(sns.distplot, "Age") \
   .add_legend();
plt.show();

# Education_num with Education_num
sns.FacetGrid(train, hue="Income", size=6) \
   .map(sns.distplot, "Education_num") \
   .add_legend();
plt.show();

# Education_num with Capital_Gain
sns.FacetGrid(train, hue="Income", size=7) \
   .map(sns.distplot, "Capital_Gain") \
   .add_legend();
plt.show();

# Education_num with Capital_Loss
sns.FacetGrid(train, hue="Income", size=7) \
   .map(sns.distplot, "Capital_Loss") \
   .add_legend();
plt.show();

# Education_num with Hours/Week
sns.FacetGrid(train, hue="Income", size=7) \
   .map(sns.distplot, "Hours/Week") \
   .add_legend();
plt.show();

[Report]Univariate Analysis

Dataset is Imbalenced with Majority class label <=50k.

75.91% data points labeled <=50k
24.08% data points labeled <50k

Missing Data:

Workclass(1836), Occupation(1843), Native_country(583)
 All belongs to Categorical data

Workclass
- Majority:
  - Private Class, 22696
- Minority:
  - Never-worked, 7
  - Without-pay, 14
  - Federal-gov, 960
Native Country
- Majority:
  - United-States, 29170
- Minority:
  - Holand-Netherlands, 1
  - Scotland, 12
- Missing Data:
  - ?, 583
Education
- Majority:
  - HS-grad, 10501
  - Some-college, 7291
  - Bachelors, 5355
- Minority:
  - Preschool, 51
  - 1st-4th, 168
  - 5th-6th, 333
Martial Status
- Majority:
  - Married-civ-spouse, 14976
  - Never-married, 10683
  - Divorced, 4443
- Minority:
  - Married-AF-spouse, 23
  - Married-spouse-absent, 418
Relationship
- Majority:
  - Husband, 13193
  - Not-in-family, 8305
- Minority:
  - Other-relative, 981
  - Wife, 1568
Race
- Majority:
  - White, 27816
  - Black, 3124
- Minority:
  - Other, 271
  - Amer-Indian-Eskimo, 311
Occupation
- Majority:
  - Prof-specialty, 4140
  - Craft-repair, 4099
  - Exec-managerial, 4066
- Minority:
  - Armed-Forces, 9
  - Priv-house-serv, 149
  - Protective-serv, 649
- Missing Data:
  - ?, 1843

Majority count aggrigation in each column:

Workclass:
- Private : 22696
Native_country:
- United-States : 29170
Education:
- Hs-grad : 10501
Marital_Status:
- Married-civ-spouse : 14976
Relationship
- Husband : 13193
Race
- White : 27816
Occupation:
- Prof-specialty : 4140

Minority count aggrigation in each column:

Workclass:
- Never-worked : 7
Native_country:
- Holand-Netherlands : 1
Education:
- Preschool : 51
Marital_Status:
- Married-AF-spouse : 23
Relationship
- Other-relative : 981
Race:
- other : 271
Occupation:
- Armed-Forces : 9

train_df = pd.read_csv("adult-training.csv", names=columns)
# Repalcing '?' to nan
#train_df.replace(' ?', np.nan, inplace=True)

Bivariate Analysis

Questions:

Which workclass people are earning the most?
Which level of educated people are earning the most?
Which martial category people are earning the most?
people form which occupation category are earning the most?
People form wich relation category are earning the most?
Which gender people are earning the most?
Which Race of people are earning the most?
People belongs to which Native country are earning the most?

Income

changing Income into 0's and 1's

train_df['Income'] = train['Income'].apply(lambda x: 1 if x==' >50K' else 0)

Workclass

Replaceing NaNs with 0s

train_df['Workclass'].fillna(' 0', inplace=True)

sns.factorplot(x="Workclass", y="Income", data=train_df, kind="bar", size = 6, 
palette = "muted")
plt.xticks(rotation=45);
plt.title("Bar plot of Work Class VS Income")

Text(0.5, 1, 'Bar plot of Work Class VS Income')

People from Self-emp-inc are earning the most

Education

sns.factorplot(x="Education",y="Income",data=train_df,kind="bar", size = 7, 
palette = "muted")
plt.xticks(rotation=60);
plt.title("Bar plot of Education VS Income")

Text(0.5, 1, 'Bar plot of Education VS Income')

All the Grade Education can be combined in to Primary as a single feature
 ref: https://www.kaggle.com/kost13/us-income-logistic-regression/comments

def primary(x):
    if x in [' 1st-4th', ' 5th-6th', ' 7th-8th', ' 9th', ' 10th', ' 11th', ' 12th']:
        return ' Primary'
    else:
        return x

train_df['Education'] = train_df['Education'].apply(primary)

sns.factorplot(x="Education", y="Income", data=train_df, kind="bar", size=7,
              palette="muted")
plt.xticks(rotation=60);

Combinded [' 1st-4th', ' 5th-6th', ' 7th-8th', ' 9th', ' 10th', ' 11th', ' 12th'] to single feature Primary

Doctorates and Prof-school people has Hihger Income >50k

Education num

sns.factorplot(x="Education_num",y="Income",data=train_df,kind="bar", size = 6, 
palette = "muted")
plt.xticks(rotation=60);
plt.title("Factorplot of Education VS Income")

Text(0.5, 1, 'Factorplot of Education VS Income')

Relation Higher the Education_num give better Income

Martial Status

sns.factorplot(x="Marital_Status",y="Income",data=train_df,kind="bar", size = 5, 
palette = "muted")
plt.xticks(rotation=60);

print(train_df['Marital_Status'].value_counts())
plt.title("Factor plot of Martial Status VS Income")

 Married-civ-spouse       14976
 Never-married            10683
 Divorced                  4443
 Separated                 1025
 Widowed                    993
 Married-spouse-absent      418
 Married-AF-spouse           23
Name: Marital_Status, dtype: int64

Text(0.5, 1, 'Factor plot of Martial Status VS Income')

People belonging to Married-civ-spouse are earning the most.

Occupation

#filing NaNs in Occupation with 0
train_df['Occupation'].replace(' ?', ' 0', inplace=True)

train_df['Occupation'].value_counts()

 Prof-specialty       4140
 Craft-repair         4099
 Exec-managerial      4066
 Adm-clerical         3770
 Sales                3650
 Other-service        3295
 Machine-op-inspct    2002
 0                    1843
 Transport-moving     1597
 Handlers-cleaners    1370
 Farming-fishing       994
 Tech-support          928
 Protective-serv       649
 Priv-house-serv       149
 Armed-Forces            9
Name: Occupation, dtype: int64

sns.factorplot(x="Occupation",y="Income",data=train_df,kind="bar", size = 8, 
palette = "muted")
plt.xticks(rotation=60);
plt.title("Factor plot of Occupation VS Income")

Text(0.5, 1, 'Factor plot of Occupation VS Income')

people belonging to Exec-managerial occupation are earning the most

Relationship

sns.factorplot(x="Relationship", y="Income", data=train_df, size=5, kind="bar",
palette="muted")
plt.xticks(rotation=60)
plt.title("Factorplot of Relationship vs Income")
print(train_df['Relationship'].value_counts())

 Husband           13193
 Not-in-family      8305
 Own-child          5068
 Unmarried          3446
 Wife               1568
 Other-relative      981
Name: Relationship, dtype: int64

People belonging to wife category of relationship are earning the most

Race

sns.factorplot(x="Race", y="Income", data=train_df, size=5, kind="bar",
palette="muted")
plt.xticks(rotation=60)
plt.title("Factorplot of Race VS Income")
print(train_df['Race'].value_counts())

 White                 27816
 Black                  3124
 Asian-Pac-Islander     1039
 Amer-Indian-Eskimo      311
 Other                   271
Name: Race, dtype: int64

People belonging to Asian-Pac-Islander are earning the most in Race

sex

sns.factorplot(x="Sex", y="Income", data=train_df,size=5,kind="bar",
palette="muted")
plt.xticks(rotation=60)
plt.title("Factorplot of Sex VS Income")

print(train_df['Sex'].value_counts())

 Male      21790
 Female    10771
Name: Sex, dtype: int64

Male gender are earning the most

Native country

There Exist 583 Unknown values replacing with 0

train_df['Native_country'].replace(' ?', ' 0', inplace=True)

#collapse-hide

sns.factorplot(x="Native_country", y="Income", data=train_df,size=13,kind="bar",
palette="muted")
plt.xticks(rotation=80)

print(train_df['Native_country'].value_counts())

 United-States                 29170
 Mexico                          643
 0                               583
 Philippines                     198
 Germany                         137
 Canada                          121
 Puerto-Rico                     114
 El-Salvador                     106
 India                           100
 Cuba                             95
 England                          90
 Jamaica                          81
 South                            80
 China                            75
 Italy                            73
 Dominican-Republic               70
 Vietnam                          67
 Guatemala                        64
 Japan                            62
 Poland                           60
 Columbia                         59
 Taiwan                           51
 Haiti                            44
 Iran                             43
 Portugal                         37
 Nicaragua                        34
 Peru                             31
 France                           29
 Greece                           29
 Ecuador                          28
 Ireland                          24
 Hong                             20
 Cambodia                         19
 Trinadad&Tobago                  19
 Thailand                         18
 Laos                             18
 Yugoslavia                       16
 Outlying-US(Guam-USVI-etc)       14
 Hungary                          13
 Honduras                         13
 Scotland                         12
 Holand-Netherlands                1
Name: Native_country, dtype: int64

train_df.columns

Index(['Age', 'Workclass', 'fnlgwt', 'Education', 'Education_num',
       'Marital_Status', 'Occupation', 'Relationship', 'Race', 'Sex',
       'Capital_Gain', 'Capital_Loss', 'Hours/Week', 'Native_country',
       'Income'],
      dtype='object')

train_df['Native_country'].value_counts()

 United-States                 29170
 Mexico                          643
 0                               583
 Philippines                     198
 Germany                         137
 Canada                          121
 Puerto-Rico                     114
 El-Salvador                     106
 India                           100
 Cuba                             95
 England                          90
 Jamaica                          81
 South                            80
 China                            75
 Italy                            73
 Dominican-Republic               70
 Vietnam                          67
 Guatemala                        64
 Japan                            62
 Poland                           60
 Columbia                         59
 Taiwan                           51
 Haiti                            44
 Iran                             43
 Portugal                         37
 Nicaragua                        34
 Peru                             31
 France                           29
 Greece                           29
 Ecuador                          28
 Ireland                          24
 Hong                             20
 Cambodia                         19
 Trinadad&Tobago                  19
 Thailand                         18
 Laos                             18
 Yugoslavia                       16
 Outlying-US(Guam-USVI-etc)       14
 Hungary                          13
 Honduras                         13
 Scotland                         12
 Holand-Netherlands                1
Name: Native_country, dtype: int64

People from Iran are earning the most

colormap = plt.cm.magma
plt.figure(figsize=(16,16))
plt.title('Pearson Correlation of Features', y=1.05, size=15)
sns.heatmap(train_df.corr(),linewidths=0.1,vmax=1.0, square=True, cmap=colormap, linecolor='white', annot=True)

<matplotlib.axes._subplots.AxesSubplot at 0x7fbe9e5fe080>

[Bivariate][Report] Answers with perfomed bivariate analysis

Which workclass people are earning the most?
- Self-emp-inc
Which level of educated people are earning the most?
- Doctorates and Prof-school
Which martial category people are earning the most?
- Married-civ-spouse
people form which occupation category are earning the most?
- Exec-managerial
People form wich relation category are earning the most?
- Wife
Which gender people are earning the most?
- Men
Which Race of people are earning the most?
- Asian-Pac-Islander
People belongs to which Native country are earning the most?
- Iran

Mulitvariate Analysis, pivoting

Questions:

Specific Counts of each in different workclass belongs to various education on Income basis
Specific Counts of each in different workclass belongs to various education on Gender basis

train_mult_index = train_df.set_index(keys = ['Income','Education','Native_country']).sort_index()

train_mult_index.tail()

train_mult_index.loc[(1, " Primary", " United-States"),].count()[0]

202

# People having Income >50K with Primary Education in United-Sates: 202

train_mult_index.stack().to_frame()

#collapse-hide

train_df

iec_data = train_df.loc[:,("Income", "Education", "Workclass")]

iec_data

iec_data.pivot_table(values='Income', index='Education', aggfunc='count', margins_name='Income')

iec_data[iec_data.Income == 1].pivot_table(values='Income', index='Education', aggfunc='count', margins_name='Income')

#collapse-hide

iec_data_pivot = iec_data[iec_data.Income == 1].pivot_table(values='Income', index='Education', aggfunc='count', margins_name='Income')

plt.figure(figsize=(16, 8))
sns.heatmap(iec_data_pivot, annot=True, fmt='.1f', cbar_kws= {'label':'Income range in categories'}, cmap='coolwarm')
plt.title('Incomes of various educated categories in Income wise')

Text(0.5, 1, 'Incomes of various educated categories in Income wise')

#collapse-hide

train_df[train_df.Income == 1].pivot_table(values='Income', index=['Native_country', 'Education'], aggfunc='count')

gen_in_df = train_df.where(train_df.Income == 1).pivot_table(values=['Income'], 
                                                             index='Education',
                                                             columns='Workclass', 
                                                             aggfunc='count')

#collapse-hide
gen_in_df.sort_index()

plt.figure(figsize=(16, 8))
sns.heatmap(gen_in_df.sort_index(), annot=True, fmt='.1f', cbar_kws= {'label':'Income range in categories'}, cmap='coolwarm')
plt.title('Incomes of various educated categories in Income wise')

Text(0.5, 1, 'Incomes of various educated categories in Income wise')

Bachelors of Education field in Private Worclass are 1495.0 Income count

gen_sex_df = train_df.where(train_df.Income == 1).pivot_table(values=['Sex'], 
                                                             index='Education',
                                                             columns='Workclass', 
                                                             aggfunc='count')

gen_sex_df

#collapse-hide

plt.figure(figsize=(16, 8))
sns.heatmap(gen_sex_df, annot=True, fmt='.1f', cbar_kws= {'label':'Income range in categories'}, cmap='coolwarm')
plt.title('Incomes of various educated categories in Gender wise')

Text(0.5, 1, 'Incomes of various educated categories in Gender wise')

Bachelors of Education field in Private Worclass are in marjority of Gender count basis

train_df.Sex.value_counts()

 Male      21790
 Female    10771
Name: Sex, dtype: int64

gen_in_df.index.names

FrozenList(['Education'])

gen_in_df.loc[:,'Sex']

[Report] Multivaiate Analysis

Specific Counts of each in different workclass belongs to various education on Income basis
- Bachelors of Education field in Private Worclass are 1495.0 Income count
Specific Counts of each in different workclass belongs to various education on Gender basis
- Bachelors of Education field in Private Worclass are in marjority of Gender count basis

	Age	fnlgwt	Education_num	Capital_Gain	Capital_Loss	Hours/Week
count	32561.000000	3.256100e+04	32561.000000	32561.000000	32561.000000	32561.000000
mean	38.581647	1.897784e+05	10.080679	1077.648844	87.303830	40.437456
std	13.640433	1.055500e+05	2.572720	7385.292085	402.960219	12.347429
min	17.000000	1.228500e+04	1.000000	0.000000	0.000000	1.000000
25%	28.000000	1.178270e+05	9.000000	0.000000	0.000000	40.000000
50%	37.000000	1.783560e+05	10.000000	0.000000	0.000000	40.000000
75%	48.000000	2.370510e+05	12.000000	0.000000	0.000000	45.000000
max	90.000000	1.484705e+06	16.000000	99999.000000	4356.000000	99.000000

	Age	Workclass	fnlgwt	Education	Education_num	Marital_Status	Occupation	Relationship	Race	Sex	Capital_Gain	Hours/Week	Native_country	Income
0	39	State-gov	77516	Bachelors	13	Never-married	Adm-clerical	Not-in-family	White	Male	2174	40	United-States	<=50K
1	50	Self-emp-not-inc	83311	Bachelors	13	Married-civ-spouse	Exec-managerial	Husband	White	Male	0	13	United-States	<=50K
2	38	Private	215646	HS-grad	9	Divorced	Handlers-cleaners	Not-in-family	White	Male	0	40	United-States	<=50K
3	53	Private	234721	11th	7	Married-civ-spouse	Handlers-cleaners	Husband	Black	Male	0	40	United-States	<=50K
4	28	Private	338409	Bachelors	13	Married-civ-spouse	Prof-specialty	Wife	Black	Female	0	40	Cuba	<=50K

			Age	Workclass	fnlgwt	Education_num	Marital_Status	Occupation	Relationship	Race	Sex	Capital_Gain	Capital_Loss	Hours/Week
Income	Education	Native_country
1	Some-college	United-States	30	Self-emp-not-inc	176185	10	Married-spouse-absent	Craft-repair	Own-child	White	Male	0	0	60
		United-States	53	Private	304504	10	Married-civ-spouse	Transport-moving	Husband	White	Male	0	1887	45
		United-States	46	Private	42251	10	Married-civ-spouse	Sales	Husband	White	Male	0	0	45
		United-States	46	Private	364548	10	Married-civ-spouse	Exec-managerial	Husband	White	Male	0	0	48
		Yugoslavia	36	Self-emp-inc	337778	10	Married-civ-spouse	Exec-managerial	Husband	White	Male	0	0	60

	Income
Education
Assoc-acdm	1067
Assoc-voc	1382
Bachelors	5355
Doctorate	413
HS-grad	10501
Masters	1723
Preschool	51
Primary	4202
Prof-school	576
Some-college	7291

	Income
Workclass	?	Federal-gov	Local-gov	Private	Self-emp-inc	Self-emp-not-inc	State-gov
Education
Assoc-acdm	6	19	28	170	18	18	6
Assoc-voc	13	15	25	256	19	21	12
Bachelors	45	95	162	1495	171	163	90
Doctorate	11	15	17	132	29	31	71
HS-grad	46	73	90	1119	119	179	49
Masters	18	47	173	534	57	59	71
Primary	9	2	10	163	15	40	5
Prof-school	8	23	19	171	78	106	18
Some-college	35	82	93	923	116	107	31