1. 程式人生 > >Notes and technical questions from interviewing as a Data Scientist in 2018

Notes and technical questions from interviewing as a Data Scientist in 2018

Notes and technical questions from interviewing as a Data Scientist in 2018

After almost three years at Jobr/Monster, I have decided to leave to pursue a different opportunity. This gave me the chance to interview and discover new companies while practicing my technical chops and learning the inner workings of different businesses and their utilization of data science.

Overall a few trends stood out to me during the interviewing process that I figured would be helpful for anyone looking for a data scientist role. In addition I received many technical interview questions that generalized what I encountered many times over.

Make sure the “Data Scientist” role is a fit

Ten years after the creation of the official Data Scientist position, you think the industry would have formalized the job requirements and responsibilities. That is not true. There are large companies that want glorified data analysts. There are startups that want software engineers that have dabbled with tensorflow

. Then there are the companies that want a data scientist because their CEO felt like having a trophy on reserve.

What your actual skillset comprises of and what you want in your next role should be well-defined by the time you take the recruiter phone screen given the fact that they will ask for both. What are you looking for in your next role? What is your current background? Generally with those two questions, you can figure out if you’re a fit within an engineering based data scientist (machine learning engineer), or maybe a product data scientist (business-driven), or everything data related under the sun.

After interviewing the first couple times, I figured out how to appease recruiters. Repeating the job requirements generally did the trick. But in hindsight what was the point of moving on in the interview if I didn’t actually want to focus within that area of data science after learning the defined role? It wastes time on both ends. Interviewing takes a ton of time as well as mental and emotional energy. Which leads me to….

Never do the Take Home Assignments

This one is controversial. I will take a technical video or phone interview over a coding assignment any day of the week. I’m sticking to my guns on this issue for a couple of reasons that people may or may not agree with me on.

1. You are subjecting yourself to completely ambiguous requirements.

When doing a technical video/phone interview you are getting general incremental feedback as you work on problems. Each interviewer has an idea of what they want and if they see you going down a different/wrong/useless path then they’ll correct it (or they should) with as much feedback as telling you the actual answer or just a that’s not right. In a coding assignment, you get none of that. You are taking a non-standardized test every single time with a likely biased grader. You could spend the entirety of your effort analyzing data for something the grader would not give a shit about. I was given an assignment in which the requirements were as ambiguous as “analyze a dataset and turn it into a presentation” without any further clarification when I asked the recruiter. Needless to say I did not understand why I didn’t pass nor understand what I was actually supposed to do in the end.

2. There isn’t a real timeframe.

They give estimates such as “this assignment should generally take 3 to 6 hours” and to “return it in around 2 to 7 days”. What that means to me is that every other candidate is now putting in 6 to 12+ hours on the take home assignment. As why wouldn’t they? Take home assignments are designed to filter out candidates as you are generally graded against every other candidate given the assignment before. Dedication towards doing a good job on the challenge would mean figuring out each edge case on the problem plus the edge cases around the grader’s own biases. And why wouldn’t you then, to gain an edge over all of your competitors, spend 12+ hours of your time making it perfect and then tell the recruiter you “really did just finish it like maybe three hours?”

3. You are generally given zero feedback.

What’s worse than spending 12+ hours on an assignment and then not getting the job? An inability to figure out what you did wrong and use it to iterate onto the next position that also requires an assignment.

4. You are telling them that your time is worth less than the company’s.

Imagine interviewing for five different jobs and they all want coding assignments. Imagine then actually only doing the allotted amount they tell you to spend on it. That’s still 15 to 25 hours of take home assignment drudgery work over a course of a week or two on top of your full time job that constitutes free unpaid work. Goodbye to the weekend. Goodbye to your resolve. Maybe just say goodbye to companies that want coding assignments.

Caveats. They are useful for figuring out the work you would be doing on the job. Many times startups will take a sample of their data and thoughtfully give out assignments and questions that mimics actual assignments.

They are also a great way for unproven candidates to become competitive. Aspiring data scientists or graduate students should utilize the coding assignments and spend all of their efforts on making it perfect. With endless resources and time, it generally levels the playing field allowing a candidate to demonstrate hard work and effort. And so while a detriment to some, it may be a positive for others.

P.S. I have a large file of collected coding assignments that I’m willing to share with people if they PM me. I’m debating whether to just put it up on Github for everyone to see as it is at the very least, really good practice.

Real interview questions that I think generalize many facets of the DS interview

The data scientist technical interview section now comprises of around six topics.

  1. Coding
  2. Product
  3. SQL (or Python but basically analytics)
  4. AB testing
  5. Machine Learning
  6. Probability

Here are all very real questions I was asked.

Coding

  1. Fizzbuzz
  2. Given a list of timestamps in sequential order, return a list of lists grouped by weekly aggregation.
  3. Given a list of characters, a list of prior of probabilities for each character, and a matrix of probabilities for each character combination, return the optimal sequence for the highest probability.
  4. Given a log file with rows featuring a date, a number, and then a string of names, parse the log file and return the count of unique names aggregated by month.

Product

  1. Given there are no metrics being tracked for Google Docs, a product manager comes to you and asks what are the top five metrics you would implement?
  2. In addition, let’s say theres a dip in the engagement metric of Google Docs. What would you investigate?
  3. Let’s say we want to implement a notification system for reminding nurses to discharge patients at a hospital. How would you implement it?
  4. Let’s say at LinkedIn we want to implement a green dot for an “active user” on the new messaging platform. How would you analyze the effectiveness of it for roll out?

SQL

  1. Given a payment transactions table and a customers table, return the customer’s name and the first transaction that the customer made.
  2. Given a payments transactions table, return a frequency distribution of the number of payments each customer made. (I.E. 1 transaction — 100 customers, 2 transactions — 50 customers, etc…)
  3. Given the same payments table, return the cumulative distribution. (At least one transaction, at least two transactions, etc…)
  4. Given a table of — friend1|friend2. Return the number of mutual friends between two friends.

AB Testing

  1. Given AB test funnel statistics such as the sample size, sign up rate, feature 1 usage rate, feature 2 usage rate, analyze which variant won and why.
  2. How would you design an experiment to change a button on a sign up page?
  3. How do you know if you have enough sample size?
  4. How do you run significance tests on more than one variant?
  5. How do you reduce variance and bias in an AB test?
  6. Explain a P-value and confidence interval to a product manager or non-technical person.

Machine Learning

  1. What features would you use to predict the time spent for a restaurant preparing food from the moment an order comes in?
  2. Can you come up with a scenario in which you would rather under-predict versus over-predict?
  3. Analyzing the results of a model, how would you explain the tradeoff between bias and variance?
  4. Explain how a Random Forest model actual works under the hood.
  5. How do you know if you have enough data for your model?
  6. How do you evaluate a model? (F1 score, ROC curve, cross validation, etc…)

Probability

  1. Given uniform distributions X and Y and the mean 0 and standard deviation 1 for both, what’s the probability of 2X > Y?
  2. There are four people in an elevator and four floors in a building. What’s the probability that each person gets off on a different floor?
  3. What’s the probability that two people get off on the same floor?
  4. Given a deck of cards labeled from 1 to 100, what’s the probability of getting Pick 1 < Pick2 < Pick3?

Yes. Data science interviews are hard.

Tip: Try to figure out the answer to each question you get from either the interviewer or online. Cause if you fail, you’re likely to encounter a variant of the question in another interview. Also, what’s the probability of this occurring? ;)

Lastly employers are hiring data scientists like crazy

The market for data scientists is still super tight as the general role becomes more and more necessary for companies that need to grow or monetize. Even with my eventual requests of denying coding challenges, I still got many companies that were willing to forgo and switch up the interviewing process to technical interviews. Data science in 2018 is unbelievably hot in a market where data scientists have the potential to contribute to many different parts of the business.

What I have noticed is how much a bit of experience on the resume does matter though as I started my search this time versus a few years ago when unproven. I encourage everyone who’s looking to either jump into the field without experience to look to do a couple of projects first to demonstrate overall excitement in the field. Ultimately this job is still in my opinion one of the coolest because of how versatile a data scientist can be with their skillset as well as their contribution to different business goals.

And when does one ever get to be the first of any kind of new job. Ten years really isn’t that much time at all.

Lastly I’ve decided to join Nextdoor. If you’re interested in joining check out the careers page and shoot me a PM. We are hiring data scientists! And my blog more data posts.