Theresa Welchy: Practice DS interview questions through email newsletter

Ace your next data science interview

Get better at data science interviews by solving a few questions per week.
Join 10,000+ other data scientists and analysts practicing for interviews!

We will never spam. One-click unsubscribe.

How it works

1 We write questions

Get relevant data science interview questions frequently asked at top companies.

2 You solve them

Solve the problem before receiving the solution the next morning.

3 We send you the solution Premium

Check your work and get better at interviewing!

The schedule

Sample questions

Sample question 1: Statistical knowledge

Suppose there are 15 different color crayons in a box. Each time one obtains a crayon, it is equally likely to be any of the 15 types. Compute the expected # of different colors that are obtained in a set of 5 crayons. (Hint: use indicator variables and linearity of expectation)

We enumerate the crayons from 1 to 15. Let \(X_i\) indicate when the ith crayon is among the 5 crayons selected.

So,
\(E(X_i) =\) Pr {Probability that at least one type i crayon is in set of 5}
\(E(X_i) =\) 1 - Pr {no type i crayons in set of 5}
\(E(X_i) = 1 - \frac{14}{15}^5\ \)

Therefore, the expected # of crayons is:

\( = \sum_{i=1}^{15} E(X_i)\)
\( = 15[1 - \frac{14}{15}^5]\)
\( = 4.38\)

Sample question 2: Coding/computation

Given a dataframe, df, return only those rows which have missing values.
For example:

Name	age	favorite_color	grade	name
Willard Morris	20	blue		Willard Morris
Al Jennings	19	red	92	Al Jennings
	22	yellow	95	Omar Mullins
Spencer McDaniel	21	green	70	Spencer McDaniel

Will return...

Name	age	favorite_color	grade	name
Willard Morris	20	blue		Willard Morris
	22	yellow	95	Omar Mullins


 #Written in Python (Pandas)


 #First, we build a boolean series of the null values, using 'isnull' and 'any'

 #-->df.isnull().any(axis=1) will return the series True, False, True, False

 #We can then index this series against our dataframe to filter on the null values

 df[df.isnull().any(axis=1)]

Sample question 3: Coding/computation

A prime number is a natural number greater than 1 that cannot be formed by multiplying two smaller natural numbers. Given a single number, n, write a function using Python to return whether or not the number is prime. Additionally, if the inputted number is prime, save it into an array, a.

We'll set up a function below to determine whether or not a given number is prime, using simple if/else statements. Additionally, when a number is defined as prime we'll append it to our array, a.


#First, define an empty array to store prime numbers

a = []

#Define a function to identify whether or not a given number, x, is prime

def is_prime(x):

    if x < 2:

        #if the number is < 2, it's not prime, per definition of prime number 

        #(e.g. natural number greater than 1)

        return False

    else:

        #for all other numbers >=2

        for n in range(2,x):

            #if divisible by two smaller #s, then not prime

            if x % n == 0:

               return False

        #s that don't meet the above conditions are prime! save them to our array, a

        a.append(x)

        return True

See what others are saying

Dylan +

I've been on the mailing list since the initial beta a few months ago, and found the questions to be very helpful with my data science interview at Facebook!

Melissa +

I've been enjoying the mix of questions coming out Data Interview Qs. The balance between stats, data manipulation, classic programming questions, and SQL came in handy during my Amazon interview.

Richard +

Data Interview Qs helped me land a quantitative analyst role at Google. The ROI here is great and would recommend for anyone seeking a role in the data science space.