Developer 101 | Data Science Uncovered

The following notebook is an analysis of an online webinar organised by Sathyabama Coding Club

  • The data used here is private
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")

%matplotlib inline
gotom_data = pd.read_excel("Developer 101_ Data Science Uncovered Attendees.xls")
reg_data = pd.read_excel("Developer 101 _ Data Science Uncovered (Responses).xlsx")
gotom_data.head(5)
Developer 101: Data Science Uncovered Attendees Unnamed: 1 Unnamed: 2 Unnamed: 3 Unnamed: 4 Unnamed: 5 GoToMeeting
0 Summary NaN NaN NaN NaN NaN NaN
1 Meeting Date Meeting Duration Number of Attendees Meeting ID NaN NaN NaN
2 May 9, 2020 7:06 AM PDT 90 minutes 54 905-245-013 NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN NaN
4 Details NaN NaN NaN NaN NaN NaN
attn_data = gotom_data.iloc[6:,:5]
attn_data.columns = gotom_data.iloc[5:,:5].iloc[[0]].values.reshape(5,)
attn_data.reset_index(drop=True, inplace=True)
attn_data.head()
Name Email Address Join Time Leave Time Time in Session (minutes)
0 #ReligiousCorona lol@gmail.com 7:51 AM 8:07 AM 15
1 #ReligiousCorona lol@gmail.com 8:07 AM 8:33 AM 25
2 AJ NaN 7:30 AM 7:30 AM 0
3 AJ NaN 7:30 AM 8:33 AM 62
4 ANKITKUMAR SINGH ankitk.as51@gmail.com 7:30 AM 7:31 AM 1
SESSION_DURATION = int(attn_data['Time in Session (minutes)'].max())
print("Session Duration is Minutes: ", SESSION_DURATION)
Session Duration is Minutes:  90

Analysis on Registration Data

Univariate Analysis

no_of_regs = len(reg_data)
REG_COUNT = no_of_regs
print("No of Registrations : ", REG_COUNT)
No of Registrations :  186

Observation:

No of registrations: 73

reg_data.Batch.value_counts()
2021            76
Professional    69
2022            36
2023             5
Name: Batch, dtype: int64
sns.countplot(x="Batch", data=reg_data)
plt.title("Barplot on Academic year participation")
Text(0.5, 1.0, 'Barplot on Academic year participation')

Observation

Students from 2022 are higher than 2021 and 2023

- 2021 > Professional >> 2022 >> 2023 
sns.countplot(x="Have you ever worked with Data Science before?", data=reg_data)
plt.title("Barplot over the background of participants in Data Science")

print(reg_data['Have you ever worked with Data Science before?'].value_counts())
No     115
Yes     71
Name: Have you ever worked with Data Science before?, dtype: int64
web_mob_yes = reg_data["Have you ever worked with Data Science before?"].value_counts()[1]
web_mob_no = reg_data["Have you ever worked with Data Science before?"].value_counts()[0]
print("The percentage of people joined the webinar who has worked with Data Science bofore :", 
      (web_mob_yes/no_of_regs)*100)
print("The percentage of people joined the webinar who has never worked with Data Scinece before :", 
      (web_mob_no/no_of_regs)*100)
The percentage of people joined the webinar who has worked with Data Science bofore : 38.17204301075269
The percentage of people joined the webinar who has never worked with Data Scinece before : 61.82795698924731
sns.countplot(x="Knowledge of Python Programming language ", 
              data=reg_data)
<matplotlib.axes._subplots.AxesSubplot at 0x1a0f9995108>
sns.countplot(x="Do you think Math is Required for Machine Learning?", 
              data=reg_data)
<matplotlib.axes._subplots.AxesSubplot at 0x1a0fb994ac8>
reg_data["Where do you wish to use Data Science skills?"].value_counts().plot(kind='barh', figsize=(10,10))
<matplotlib.axes._subplots.AxesSubplot at 0x1a0fd177408>

Observation:

  • Majority of the participants Wished for Learning Data Science for:
    • Business Analytics
    • Predictions/ Forecast
    • Natural Language Processing
    • Computer Vision

Analysis on Attendes data

attn_data.head()
Name Email Address Join Time Leave Time Time in Session (minutes)
0 #ReligiousCorona lol@gmail.com 7:51 AM 8:07 AM 15
1 #ReligiousCorona lol@gmail.com 8:07 AM 8:33 AM 25
2 AJ NaN 7:30 AM 7:30 AM 0
3 AJ NaN 7:30 AM 8:33 AM 62
4 ANKITKUMAR SINGH ankitk.as51@gmail.com 7:30 AM 7:31 AM 1
ATTENDEES_COUNT = len(attn_data['Name'].value_counts())
ATTENDEES_COUNT
54

Observation

Number of attendees without duplicates: 54

len(attn_data.groupby(by=attn_data.Name, axis=1).sum())
65
attn_data.groupby(['Name', 'Time in Session (minutes)']).sum().iloc[:,:0].head(20)
Name Time in Session (minutes)
#ReligiousCorona 15
25
AJ 0
62
ANKITKUMAR SINGH 1
Abhiram 1
64
Abhishek's Mac Book Pro 15
Aditya 18
Aditya Gowrish Menti 68
Akash M 60
Alok Kumar 61
Amit 1
Anand 50
Anonymous 4
5
BVN PRANEETH 53
Bhavesh 89
Chetan 9
Deepansh 0
# Converting the 'Time in Session (minutes)' column values to int
attn_data['Time in Session (minutes)'] = pd.to_numeric(attn_data['Time in Session (minutes)'])
type(attn_data['Time in Session (minutes)'].iloc[0])
numpy.int64
(attn_data['Time in Session (minutes)'] == attn_data['Time in Session (minutes)'].iloc[0]).all()
False
def time_agg(group_series):
    if (group_series==group_series.iloc[0]).all():
        return group_series.iloc[0]
    else:
        return group_series.sum()
attn_data.groupby('Name', as_index=False).agg(time_agg)[['Name', 'Join Time', 'Leave Time', 'Time in Session (minutes)']]
Name Join Time Leave Time Time in Session (minutes)
0 #ReligiousCorona 7:51 AM8:07 AM 8:07 AM8:33 AM 40
1 AJ 7:30 AM 7:30 AM8:33 AM 62
2 ANKITKUMAR SINGH 7:30 AM 7:31 AM 1
3 Abhiram 7:31 AM7:30 AM 8:36 AM7:31 AM 65
4 Abhishek's Mac Book Pro 8:01 AM 8:17 AM 15
5 Aditya 7:30 AM 7:49 AM 18
6 Aditya Gowrish Menti 7:29 AM 8:37 AM 68
7 Akash M 7:22 AM 8:23 AM 60
8 Alok Kumar 7:36 AM 8:37 AM 61
9 Amit 7:49 AM 7:51 AM 1
10 Anand 7:47 AM 8:37 AM 50
11 Anonymous 7:07 AM7:23 AM 7:13 AM7:28 AM 9
12 BVN PRANEETH 7:44 AM 8:37 AM 53
13 Bhavesh 7:07 AM 8:37 AM 89
14 Chetan 7:38 AM 7:47 AM 9
15 Deepansh 7:35 AM 8:37 AM7:35 AM 62
16 Devyash Bordia 7:27 AM 8:11 AM 43
17 Dikshita Basu 7:19 AM 8:37 AM 78
18 Dinesh L 7:30 AM 8:32 AM 61
19 Dr Zakir Naik 7:43 AM 7:51 AM 7
20 Fireflies.ai Notetaker 7:28 AM 8:30 AM 61
21 Gaurav 7:27 AM 8:29 AM 61
22 Gupta, Anuj 7:32 AM 7:42 AM 9
23 HK 7:45 AM 8:14 AM 29
24 Hardik Gupta 7:40 AM7:38 AM 8:37 AM7:39 AM 58
25 Himanshu Tamboli 7:10 AM7:28 AM 7:16 AM7:50 AM 26
26 Kajjal 7:19 AM 8:37 AM 78
27 Kamal Sharma 8:00 AM 8:37 AM 37
28 Kav 7:47 AM 8:03 AM 16
29 Mohammed Faraz 7:52 AM 8:24 AM 32
30 Mostlyinsane 7:27 AM 8:20 AM 53
31 Mugunthan 7:33 AM 8:33 AM 60
32 NIKHIL 7:39 AM7:45 AM 7:44 AM7:53 AM 12
33 Neeraj Jayaram 7:33 AM 8:34 AM 61
34 Pruthvi Shetty 8:04 AM 8:37 AM 32
35 Ranjith 7:26 AM 7:34 AM 8
36 Rehan Razak 7:15 AM7:18 AM7:31 AM 7:17 AM7:18 AM8:37 AM 68
37 Revanth 7:10 AM 7:15 AM 5
38 Roshan Pandey 7:21 AM 7:56 AM 34
39 Sagar Parida 7:26 AM 8:09 AM 42
40 Sanjana Birari 7:39 AM 7:41 AM 1
41 Santhosh 7:29 AM 7:46 AM 16
42 Santosh Kumar 7:36 AM 8:37 AM 61
43 Sourav Kumar 7:26 AM7:31 AM 7:31 AM8:37 AM 70
44 Sri Harish 7:06 AM 8:37 AM 90
45 Suryanshu Singh 7:16 AM 8:22 AM 65
46 Teja Kummarikuntla 7:06 AM 8:37 AM 90
47 keshav 7:40 AM 8:37 AM 57
48 reconnecting.... 7:30 AM 8:37 AM 67
49 sahib pratap singh 7:29 AM 8:37 AM 68
50 sateesh sabbineni 8:18 AM 8:32 AM 13
51 sneha gupta 7:23 AM 8:37 AM 74
52 sourabh kumar_KOLKATA_ID_3420 7:32 AM 7:43 AM 10
53 user 7:36 AM 8:18 AM 41
atten_group_df = attn_data[['Name', 'Time in Session (minutes)', 'Email Address']].groupby('Name', as_index=False).agg(time_agg)
atten_group_df.sort_values(by=['Time in Session (minutes)'],ascending=False, inplace=True)
sns.factorplot(x="Name", y="Time in Session (minutes)", 
               data=atten_group_df, kind="bar", 
               size = 15, aspect=2,
               palette = "muted")

# for value in plot:
#     height = value.get_height()
#     plt.text(value.get_x() + value.get_width()/2.,
#              1.002*height,'%d' % int(height), ha='center', va='bottom')

plt.xticks(rotation=45);

Individual time spent analysis of attendes

sns.factorplot(x="Name", y="Time in Session (minutes)", 
               data=atten_group_df[atten_group_df["Time in Session (minutes)"] >= SESSION_DURATION//2], 
               kind="bar", 
               size = 8, aspect=2,
               palette = "muted")

plt.xticks(rotation=45);
atten_group_df[atten_group_df["Time in Session (minutes)"] >= SESSION_DURATION//2][['Name', 'Time in Session (minutes)']].set_index('Name')
Time in Session (minutes)
Name
Sri Harish 90
Teja Kummarikuntla 90
Bhavesh 89
Dikshita Basu 78
Kajjal 78
sneha gupta 74
Sourav Kumar 70
Rehan Razak 68
sahib pratap singh 68
Aditya Gowrish Menti 68
reconnecting.... 67
Suryanshu Singh 65
Abhiram 65
AJ 62
Deepansh 62
Neeraj Jayaram 61
Dinesh L 61
Alok Kumar 61
Fireflies.ai Notetaker 61
Gaurav 61
Santosh Kumar 61
Mugunthan 60
Akash M 60
Hardik Gupta 58
keshav 57
Mostlyinsane 53
BVN PRANEETH 53
Anand 50
atten_group_df
Name Time in Session (minutes)
44 Sri Harish 90
46 Teja Kummarikuntla 90
13 Bhavesh 89
17 Dikshita Basu 78
26 Kajjal 78
51 sneha gupta 74
43 Sourav Kumar 70
36 Rehan Razak 68
49 sahib pratap singh 68
6 Aditya Gowrish Menti 68
48 reconnecting.... 67
45 Suryanshu Singh 65
3 Abhiram 65
1 AJ 62
15 Deepansh 62
33 Neeraj Jayaram 61
18 Dinesh L 61
8 Alok Kumar 61
20 Fireflies.ai Notetaker 61
21 Gaurav 61
42 Santosh Kumar 61
31 Mugunthan 60
7 Akash M 60
24 Hardik Gupta 58
47 keshav 57
30 Mostlyinsane 53
12 BVN PRANEETH 53
10 Anand 50
16 Devyash Bordia 43
39 Sagar Parida 42
53 user 41
0 #ReligiousCorona 40
27 Kamal Sharma 37
38 Roshan Pandey 34
29 Mohammed Faraz 32
34 Pruthvi Shetty 32
23 HK 29
25 Himanshu Tamboli 26
5 Aditya 18
28 Kav 16
41 Santhosh 16
4 Abhishek's Mac Book Pro 15
50 sateesh sabbineni 13
32 NIKHIL 12
52 sourabh kumar_KOLKATA_ID_3420 10
14 Chetan 9
11 Anonymous 9
22 Gupta, Anuj 9
35 Ranjith 8
19 Dr Zakir Naik 7
37 Revanth 5
9 Amit 1
40 Sanjana Birari 1
2 ANKITKUMAR SINGH 1
len(atten_group_df[atten_group_df["Time in Session (minutes)"] >= SESSION_DURATION//2].set_index('Name')['Time in Session (minutes)'])
28
registerd_attendes_ratio = (ATTENDEES_COUNT/REG_COUNT) * 100
print("Percentage of Students registered and attended the session {}".format(registerd_attendes_ratio))
Percentage of Students registered and attended the session 29.03225806451613

Summary [Report]

Registration Data Analysis

  • Name of the Event: Developer 101 | Data Science Uncovered
  • No of registrations: 186
  • Registration Count with Batch filter

    • 2021 : 76
    • Professional : 69
    • 2022 : 36
    • 2023 : 5
  • No of registrations With out prior knowledge of Data Scinece : 50 [61.82795698924731%]

  • No of registrations With prior knowledge on Data Science : 71 [38.17204301075269%]
  • No of registrations who are Beginners in Python Programming language : 76
  • No of registrations who are Intermediate in Python Programming language : 98
  • No of registrations who are Advanced in Python Programming language: 12
  • Registrations wish to use Data Science in
    • Business Analytics, Prediction / Forecast, Natural Language Processing, Computer Vision 39
    • Business Analytics, Prediction / Forecast 24
    • Business Analytics, Prediction / Forecast, Natural Language Processing 16
    • Prediction / Forecast, Natural Language Processing, Computer Vision 15
    • Business Analytics 14
    • Prediction / Forecast, Natural Language Processing 11
    • Computer Vision 11
    • Natural Language Processing, Computer Vision 10
    • Prediction / Forecast 10
    • Business Analytics, Computer Vision 7
    • Business Analytics, Natural Language Processing 6
    • Natural Language Processing 4
    • Prediction / Forecast, Computer Vision 4
    • Business Analytics, Natural Language Processing, Computer Vision 3
    • Business Analytics, Prediction / Forecast, Computer Vision 2
    • Natural Language Processing, Computer Vision, Deep Learning 1
    • Business Analytics, Prediction / Forecast, Information Security 1
      • Business Analytics, Prediction / Forecast, Natural Language Processing, - Computer Vision, AI, Implementation in Web Apps 1
    • Computer Vision, GAN 1
    • Business Analytics, Prediction / Forecast, Natural Language Processing, Computer Vision, Almost every area 1
    • Business Analytics, Prediction / Forecast, Use in data analysis for payments , health science and machine learning 1
    • Business Analytics, Prediction / Forecast, Natural Language Processing, Computer Vision, BFSI,Medical health care, image processing etc.. 1
    • AI Music Composer 1
    • Business Analytics, Prediction / Forecast, Natural Language Processing, Computer Vision, Specific Industry 1
    • Business Analytics, Prediction / Forecast, Natural Language Processing, Computer Vision, Recommendation 1

Webinar Attendes Data Analysis

  • No of Attendees: 54
  • No of students spent more than half in the session: 28
  • Percentage of Students registered and attended the session 29.03225806451613%
  • Students spent more than half in the session
      Sri Harish
      Teja Kummarikuntla
      Dikshita Basu
      Kajjal
      sneha gupta
      Sourav Kumar
      Rehan Razak
      sahib pratap singh
      Aditya Gowrish Menti
      reconnecting....
      Suryanshu Singh
      Abhiram 
      AJ  
      Deepansh
      Neeraj Jayaram  
      Dinesh L    
      Alok Kumar  
      Fireflies.ai Notetaker
      Gaurav  
      Santosh Kumar
      Mugunthan   
      Akash M 
      Hardik Gupta
      keshav  
      Mostlyinsane
      BVN PRANEETH
      Anand