Understanding A/B Testing and Statistics Behind
1. A trivial example: flipping a coin¶
Question:
A coin is tossed for 100 times, heads appeared 65 times, tails appeared 35 times. Construct a hypothesis test to check whether the coin is fair. ($P_{head}=P_{tail}=\frac{1}{2}$)
Hypothesis: $$H_0:P_{h}=0.5, H_a:P_{h}\neq 0.5$$
(1) Z test¶
Since for binomial distribution, each trial is either 0 or 1, so the sample mean is also the proportion of success. Since for this experiment, we already know the distribution of tossing a coin is a binomial distribution, let $X$ to be the count of successes, then: $$\mu_X = np$$ $$\sigma^2_X=np(1-p)$$ $$X \sim N(np, np(1-p))$$
more ...How to Use Data Science to Find A Data Science Job? Step 1: Find H1B Sponsors
As a international student, after I got the master degree in Statistics, I was looking for a data analyst / data scientist position in US. For international students, one concern is the visa status. It's important to find a employer who sponsors H1B visa. There are some websites like myvisajobs ...
more ...Customer Segmentation: Customer Types Analysis for a Wholesale Distributor
Getting Started¶
In this project, you will analyze a dataset containing data on various customers' annual spending amounts (reported in monetary units) of diverse product categories for internal structure. One goal of this project is to best describe the variation in the different types of customers that a wholesale distributor interacts with. Doing so would equip the distributor with insight into how to best structure their delivery service to meet the needs of each customer.
more ...Summary of SQL Questions on Leetcode
I resolved all the database questions on Leetcode.com recently. The questions cover most of the SQL common queries inlcuding JOIN, Ranking and other SQL basics. I provided the answers as well as explanations in this blog, as a way to consolidate the SQL knowledge.
The questions on Leetcode only support MySQL, so you can install MySQL on your laptop for testing purpose before submitting the solution, or use online SQL platforms to test you query. E.g. rextester more ...
Terrorist Attacks in China From 1970-2015
Terrorist attack is a national problem worldwide. Fortunately according to this dataset, from 1970-2015 (actully 1989-2015), there are only 242 terrorist attacks in China, in which 2917 killed or wounded. Also, it's astounding that 128 out of the all 242 attacks were happend in Xinjing (Sinkiang, Uygur).
Within this ...
more ...Which Asian Food is More Popular in Seattle?
I moved to Seattle three months ago and found there were many Asian food restaurants here. Some of the Chinese food is really authetic. However, when I was enjoying the food in some Chinese restaurants I found many customers are Asian people, which indicates Americans don't like Chinese food ...
more ...Intro of NumPy and Pandas for Data Analysis
This is an introduction of using
NumPy
and Pandas
based on the course Intro to Data Analysis
on Udacity. It includes NumPy
and Pandas
data structures, basic operations and functions with code examples.. I learnt a lot from this course and shared this notes for your reference.
Table of Contents¶
Data Wrangling: Quick Guide for dplyr, data.table and R build-in data.frame
The dplyr
and data.table
part are based on the courses Data Manipulation in R with dplyr and Data Manipulation in R, the data.table way on DataCamp. Hope the description along with the code in this guide help you understand the basic data wrangling in R clearly.
dplyr
Overview ...
more ...Titanic: A Tutorial to Achieve 0.82297
For the original Knitr version HTML file, see here. It looks like the plugin of Pelican doesn't support Rmd perfectly.
1. Introduction
The sinking of Titanic in twentieth century is an sensational tragedy, in which 1502 out of 2224 passenger and crew members were killed. Kaggle provided this dataset ...
more ...