We hear about people using programming to make analysis on whether certain medications are effective, derive insights to understand customer profiles, draw predictions to know how much a property will cost 10 years down the road, and the list goes on. As I was in the midst of chasing a girl, it suddenly struck me if it’s possible to use programming to do something special for my love confession, especially since the girl wasn’t an expert in this area (if she is, I guess you have to up your game haha). …
Having left my previous job in June earlier this year, I began my job hunting and before I knew it, it turned out to be a 4 months-long journey. During these 4 months, it was a mundane process every single day of logging into LinkedIn, scrolling job vacancy websites (Jobstreet, MyCareersFuture, etc.), and waiting for my phone to ring hoping it’s a recruiter or company telling me I’m being shortlisted for a job. …
One of the most common Machine Learning algorithms in the world of data science is Decision Trees because it’s easy to implement and understand even if you have limited knowledge of how Machine Learning works. An extension to the Decision Tree algorithm is Random Forests, which is simply growing multiple trees at once, and choosing the most common or average value as the final result. Both of them are classification algorithms that categorize the data into distinct classes. This article will introduce both algorithms in detail, and implementing them in Pyspark.
A decision tree classifier, as the name suggests, makes…
In one of my previous post, I’ve designed a simple Morse Code Decoder in Python which is capable of accepting user inputs and outputting them in their original alphanumerical form. One of the limitations of the decoder is that it does not allow the user to input sentences. Remember that we have chosen to represent each alphabet or number using a series of ‘0’ and ‘1’, where ‘0’ represents a dot, and ‘1’ represents a dash. Each alphabet or number is then separated by a ‘*’ in our decoder, as shown in the screenshot below.
Open data sources are one of the best gifts for data scientists or analysts as they allow them to draw valuable insights for free, without having to worry about the data licenses. Twitter is one of the most popular social media application in the world as it’s free, and also allow users to tweet on any topics that come to their mind. This article will focus on how can we use Twitter through R programming to extract valuable insights and communicate these findings to the relevant stakeholders using Tableau.
“How might we help the communication practitioners to get actionable insights…
Upon graduating from my master’s in Australia, I managed to clinch myself a job in a big tech firm as a data analyst. The remuneration package was awesome and the office perks were nothing but amazing. I get to wake up later than the typical office starting hour of 9 am, and was able to commute to and fro work after peak hours. The best part of it, I’m working on something that was my passion, data analytics. Life’s good isn’t it? Nope, things started to turn bad a few weeks into the new job.
I began to feel constantly…
In my previous post, we were exploring graphical approaches in Python to perform Exploratory Data Analysis (EDA). Line charts, regression lines and the fanciful motion charts were discussed on how they could be used to gather insights on Population, Income and Gender Equality in Education data. The data however, were relatively small of about 200 rows. In this post, we will explore bigger data of around 12 million records, and look into other ways to perform EDA in Python.
The dataset we will be using contains data on health and dental plans offered to individuals and small businesses through the…
Exploratory Data Analysis (EDA) is one of the most important aspect in every data science or data analysis problem. It provides us greater understanding on our data and can possibly unravel hidden insights that aren’t that obvious to us. The first article I’ve wrote on Medium is also on performing EDA in R, you can check it out here. This post will focus more on graphical EDA in Python using matplotlib, regression line and even motion chart!
The dataset we are using for this article can be obtained from Gapminder, and drilling down into Population, Gender Equality in Education and…
Tableau and R are two common data visualisation tools where the former is known for it’s simple and beginner-friendly functions, and the latter for it’s extensive user interaction possibilities. How do we decide which visualisation tool is easier to implement or more effective in conveying the key insights to the relevant stakeholders? This article will look into this, and hopefully arrive at a common consensus for all.
The dataset we will be using contains Coral Bleaching percentages located in Great Barrier Reef from 2010 to 2017. There’s a total of 5 different coral types, mainly Blue Corals, Hard Corals, Sea…
Having attended numerous data scientist job interviews, I was asked this particular question 75% of the time:
Can you tell me the key differences between Linear Regression and Logistic Regression?
To be honest, you can easily google the answer to this question as it’s really common in the world of data science, but I thought I should try writing a post to discuss the differences and list them down in order of importance so that you can just quote the few most important ones, we all know how stressful it is to prepare for a job interview, much more remembering…
Data Analytics | Artificial Intelligence | Data Visualization | Perspective | https://www.linkedin.com/in/tankahwang/