Ace your next data science interview
Get better at data science interviews by solving a few questions per week.
Join 10,000+ other data scientists and analysts practicing for interviews!
We will never spam. One-click unsubscribe.
How it works
1 We write questions
Get relevant data science interview questions frequently asked at top companies.
2 You solve them
Solve the problem before receiving the solution the next morning.
3 We send you the solution Premium
Check your work and get better at interviewing!
The schedule

Sample questions
Sample question 1: Statistical knowledge
Suppose there are 15 different color crayons in a box. Each time one obtains a crayon, it is equally likely to be any of the 15 types. Compute the expected # of different colors that are obtained in a set of 5 crayons. (Hint: use indicator variables and linearity of expectation)
We enumerate the crayons from 1 to 15. Let \(X_i\) indicate when the ith crayon is among the 5 crayons selected.
So,
\(E(X_i) =\) Pr {Probability that at least one type i crayon is in set of 5}
\(E(X_i) =\) 1 - Pr {no type i crayons in set of 5}
\(E(X_i) = 1 - \frac{14}{15}^5\ \)
Therefore, the expected # of crayons is:
\( = \sum_{i=1}^{15} E(X_i)\)
\( = 15[1 - \frac{14}{15}^5]\)
\( = 4.38\)
Sample question 2: Coding/computation
Given a dataframe, df, return only those rows which have missing values.
For example:
Name | age | favorite_color | grade | name |
---|---|---|---|---|
Willard Morris | 20 | blue | Willard Morris | |
Al Jennings | 19 | red | 92 | Al Jennings |
22 | yellow | 95 | Omar Mullins | |
Spencer McDaniel | 21 | green | 70 | Spencer McDaniel |
Will return...
Name | age | favorite_color | grade | name |
---|---|---|---|---|
Willard Morris | 20 | blue | Willard Morris | |
22 | yellow | 95 | Omar Mullins |
#Written in Python (Pandas)
#First, we build a boolean series of the null values, using 'isnull' and 'any'
#-->df.isnull().any(axis=1) will return the series True, False, True, False
#We can then index this series against our dataframe to filter on the null values
df[df.isnull().any(axis=1)]
Sample question 3: Coding/computation
A prime number is a natural number greater than 1 that cannot be formed by multiplying two smaller natural numbers. Given a single number, n, write a function using Python to return whether or not the number is prime. Additionally, if the inputted number is prime, save it into an array, a.
We'll set up a function below to determine whether or not a given number is prime, using simple if/else statements. Additionally, when a number is defined as prime we'll append it to our array, a.
#First, define an empty array to store prime numbers
a = []
#Define a function to identify whether or not a given number, x, is prime
def is_prime(x):
if x < 2:
#if the number is < 2, it's not prime, per definition of prime number
#(e.g. natural number greater than 1)
return False
else:
#for all other numbers >=2
for n in range(2,x):
#if divisible by two smaller #s, then not prime
if x % n == 0:
return False
#s that don't meet the above conditions are prime! save them to our array, a
a.append(x)
return True
See what others are saying
Dylan +
I've been on the mailing list since the initial beta a few months ago, and found the questions to be very helpful with my data science interview at Facebook!
Melissa +
I've been enjoying the mix of questions coming out Data Interview Qs. The balance between stats, data manipulation, classic programming questions, and SQL came in handy during my Amazon interview.
Richard +
Data Interview Qs helped me land a quantitative analyst role at Google. The ROI here is great and would recommend for anyone seeking a role in the data science space.
Used by thousands of students and industry workers












DataTau published first on DataTau
No comments:
Post a Comment