Image for post
Image for post
Photo by Ben White on Unsplash

Having left my previous job in June earlier this year, I began my job hunting and before I knew it, it turned out to be a 4 months-long journey. During these 4 months, it was a mundane process every single day of logging into LinkedIn, scrolling job vacancy websites (Jobstreet, MyCareersFuture, etc.), and waiting for my phone to ring hoping it’s a recruiter or company telling me I’m being shortlisted for a job. …


Image for post
Image for post
Photo by Chelsea Bock on Unsplash

One of the most common Machine Learning algorithms in the world of data science is Decision Trees because it’s easy to implement and understand even if you have limited knowledge of how Machine Learning works. An extension to the Decision Tree algorithm is Random Forests, which is simply growing multiple trees at once, and choosing the most common or average value as the final result. Both of them are classification algorithms that categorize the data into distinct classes. This article will introduce both algorithms in detail, and implementing them in Pyspark.

Decision Tree Classifier — Introduction

A decision tree classifier, as the name suggests, makes decisions using a tree-based model. This algorithm will consider all data features, chooses the one with the highest accuracy, performs binary split, and repeats recursively until it successfully splits the data in all leaves (or reaches the maximum depth). …


Image for post
Image for post
By Rhey T. Snodgrass & Victor F. Camp, 1922 — Image:Intcode.png and Image:International Morse Code.PNG, Public Domain, https://commons.wikimedia.org/w/index.php?curid=3902977

In one of my previous post, I’ve designed a simple Morse Code Decoder in Python which is capable of accepting user inputs and outputting them in their original alphanumerical form. One of the limitations of the decoder is that it does not allow the user to input sentences. Remember that we have chosen to represent each alphabet or number using a series of ‘0’ and ‘1’, where ‘0’ represents a dot, and ‘1’ represents a dash. Each alphabet or number is then separated by a ‘*’ in our decoder, as shown in the screenshot below.

Image for post
Image for post
Image by author — Python output of Simple Morse Code Decoder

In this post, we will improve our simple Morse Code Decoder to be able to decipher sentences as well. Furthermore, we can implement checks in the decoder to inform us of the frequencies in which each alphabet/number, word, or sentence type have been decoded. …


Hands-on Tutorials

Helping communication practitioners get actionable insights through open data sources like Twitter

Image for post
Image for post
Photo by Antenna on Unsplash

Open data sources are one of the best gifts for data scientists or analysts as they allow them to draw valuable insights for free, without having to worry about the data licenses. Twitter is one of the most popular social media application in the world as it’s free, and also allow users to tweet on any topics that come to their mind. This article will focus on how can we use Twitter through R programming to extract valuable insights and communicate these findings to the relevant stakeholders using Tableau.

Problem Statement

“How might we help the communication practitioners to get actionable insights from Twitter so that they can create more effective communication that caters to the needs & concerns of general…


Image for post
Image for post
Image by author

Upon graduating from my master’s in Australia, I managed to clinch myself a job in a big tech firm as a data analyst. The remuneration package was awesome and the office perks were nothing but amazing. I get to wake up later than the typical office starting hour of 9 am, and was able to commute to and fro work after peak hours. The best part of it, I’m working on something that was my passion, data analytics. Life’s good isn’t it? Nope, things started to turn bad a few weeks into the new job.

I began to feel constantly tired and stressed about work. Panic attacks, difficulties in breathing, and sleepless nights soon followed. Work was the only thing on my mind after office hours and even on weekends. I started to beat myself for every little mistake and I couldn’t find the patience to talk to my loved ones, nor the joy to embark on my hobbies. And perhaps the most significant sign of all, the thought of committing suicide. …


Making Sense of Big Data

Exploring and gathering insights on 12 million US health insurance marketplace data

Image for post
Image for post
Photo by Luis Melendez on Unsplash

In my previous post, we were exploring graphical approaches in Python to perform Exploratory Data Analysis (EDA). Line charts, regression lines and the fanciful motion charts were discussed on how they could be used to gather insights on Population, Income and Gender Equality in Education data. The data however, were relatively small of about 200 rows. In this post, we will explore bigger data of around 12 million records, and look into other ways to perform EDA in Python.

Dataset

The dataset we will be using contains data on health and dental plans offered to individuals and small businesses through the US Health Insurance Marketplace. It was originally prepared and released by the Centers for Medicare & Medicaid Services (CMS) and was subsequently published on Kaggle. …


Investigating Population, Gender Equality in Education & Income for Singapore, United States and China

Image for post
Image for post
Photo by Jack Finnigan on Unsplash

Exploratory Data Analysis (EDA) is one of the most important aspect in every data science or data analysis problem. It provides us greater understanding on our data and can possibly unravel hidden insights that aren’t that obvious to us. The first article I’ve wrote on Medium is also on performing EDA in R, you can check it out here. This post will focus more on graphical EDA in Python using matplotlib, regression line and even motion chart!

Dataset

The dataset we are using for this article can be obtained from Gapminder, and drilling down into Population, Gender Equality in Education and Income.


Comparing the effectiveness of Tableau and R ggplot to deliver the key messages in Coral Bleaching Analysis

Image for post
Image for post
Photo by Bawah Reserve on Unsplash

Tableau and R are two common data visualisation tools where the former is known for it’s simple and beginner-friendly functions, and the latter for it’s extensive user interaction possibilities. How do we decide which visualisation tool is easier to implement or more effective in conveying the key insights to the relevant stakeholders? This article will look into this, and hopefully arrive at a common consensus for all.

Dataset

The dataset we will be using contains Coral Bleaching percentages located in Great Barrier Reef from 2010 to 2017. There’s a total of 5 different coral types, mainly Blue Corals, Hard Corals, Sea Fans, Sea Pens and Soft Corals. …


Image for post
Image for post
Photo by Ruthson Zimmerman on Unsplash

Having attended numerous data scientist job interviews, I was asked this particular question 75% of the time:

Can you tell me the key differences between Linear Regression and Logistic Regression?

To be honest, you can easily google the answer to this question as it’s really common in the world of data science, but I thought I should try writing a post to discuss the differences and list them down in order of importance so that you can just quote the few most important ones, we all know how stressful it is to prepare for a job interview, much more remembering the concepts and theories. …


Image for post
Image for post
By Rhey T. Snodgrass & Victor F. Camp, 1922 — Image:Intcode.png and Image:International Morse Code.PNG, Public Domain, https://commons.wikimedia.org/w/index.php?curid=3902977

Morse code is a method used in telecommunication where each alphabet, number and punctuation is represented by a series of dots/dashes/spaces. It was first invented by Samuel Morse in 1930s and it has been heavily used in the navy industry. This article will describe the process to build a simple Morse Code decoder in Python.

Morse Code Representation in Python

As seen in the image above, each alphabet and number is represented by a series of dots and dashes. …

About

Kieran Tan Kah Wang

Data Analytics | Artificial Intelligence | Data Visualization | Perspective | https://www.linkedin.com/in/tankahwang/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store