Download udemy, inc. complete introduction to business data analysis

Udemy Data Analytics Project

I carried out data analysis on Udemy courses from different subjects to understand where opportunities to increase revenue may lie, and track performance of courses.

Project overview & Business problem:

I am working as a data analyst for the education tech company Udemy. I have been asked by my manager, Head of Curriculum at Udemy, to present the data on course revenue, and I have been provided with data on courses from different topics to understand where opportunities to increase revenue may lie, and track the performance of courses.

My manager has suggested encouraging Web Development courses to charge more because she believes that these are the most popular courses. She needs to send a report to the CEO in the next three weeks with recommendations on how we can increase our next quarterly earnings.

You can get the datasets using this link

Tools used to carry out this project include:

  1. Microsoft Office Excel: Which I used primarily to clean the datasets. I carried out the data cleaning using the power query editor.
  2. Microsoft SQL Server: I used this tool to write SQL codes in other to generate key insights into the data.
  3. Microsoft Power BI: This used to carry out visualization on the insights from my analysis on MS SQL Server.

Data Cleaning Process:

Data cleaning is the process of preparing data for analysis by removing or modifying data that is incorrect, incomplete, irrelevant, duplicated, or improperly formatted.

Data cleaning plays an important role in the analytical process and making sure that the answers we uncover are reliable and of a high quality.

Below are the data cleaning processes I carried out using the power query editor in Microsoft Office Excel:

  1. I split the “published timestamp” column using the split column by delimiter tool.
Published timestamp column with date and timeSplit column using the delimiter “T”Result from split column by delimiter.

2. I changed the data type for each column to their correct data formats.

Made Course Title column a text (string) data type.

3. I added a conditional column which tells whether a course is “Free or Paid”. I used the data in the price column to populate this column.

If price = 0 output will be “Free”, else “Paid”Result of the conditional column.

4. I deleted irrelevant columns that would not be useful for my analysis.

Removal of URL column

5. I ensured that I had clear and concise names for headers and used dashes or underscores in between words to make it easier to parse later on.

6. I removed duplicate rows of data.

7. I removed blank rows.

Tasks carried out on Microsoft SQL Server.

After carrying out the above data cleaning processes on Microsoft Excel, I imported the cleaned datasets into Microsoft SQL Server Management Studio to write queries that would aid in generating key insights in the data.

Asides writing queries to generate insights into the data, I also wrote queries to append the datasets using the UNION operator to make it a whole dataset, and I wrote queries to create a new column called “Revenue”. Revenue calculated as Price * Number of Subscribers.

You can hit this link which will direct you to my GitHub where you can view my SQL Codes.

Analysis and Insights.

Note: In Microsoft Power BI, I wrote DAX formulas to create a calendar table, the calendar table contained the date, year, month, month number (to sort the month in order), and the quarters (Q1, Q2, Q3, and Q4). I also wrote DAX formulas to create Measures that I used to generate the visuals.

  • From a total of 3,676 courses, 3,365 (91.54%) are paid courses while 311 (8.46%) are free courses.
  • As regards subject ratings, the subject graphic design has the highest average rating with a value of 0.73, followed by business finance with an average rating of 0.69. Musical instrument has the lowest average rating of 0.31.
  • Web Development on an average has the highest content duration with an average value of 5.59 minutes, while Musical Instrument has the lowest average content duration with a value of 2.85 minutes.
  • Four out of the top 5 courses that generated the highest revenue are courses under the subject Web Development, the third highest revenue generating course is a course under the subject Musical Instruments.
  • Web Development generated the highest total revenue with a value of $631,082,465, while Musical Instruments generated the lowest total revenue with a value of $53,359,055.
  • Courses in the All Levels category have the highest number of subscribers, followed by Beginner level courses. Expert level courses have the smallest amount of subscribers.
  • Of the 3,676 available in our database, there are a total of 1,925 All Levels courses, 1,271 Beginner Level courses, 422 Intermediate Level courses, and 58 Expert level courses.
  • Revenue over the years grew steadily from 2011 ($11,643,429) to 2015 ($314,510,395) and dropped slightly in 2016 to a value of $277,694,340, but there was a huge drop in revenue generation between the years 2016 and 2017. In 2017 a total revenue of $92,970,175 was generated.

Power BI Visualization:

Recommendations:

To increase our next quarterly earnings we need to:

  1. Convert more free courses to paid courses.
  2. Review the course content for musical instruments courses, because from the results of the analysis musical instruments courses are doing poorly in terms revenue generation and ratings.
  3. Graphic Design courses should be reviewed also because the revenue generated from it is really low as compared to Business Finance and Web Development courses.
  4. Create more Expert Level courses and Intermediate Level courses. Right now we have a total of 422 intermediate level courses and 58 Expert level courses.