1. 程式人生 > >Five Interview Questions to Predict a Good Data Scientist

Five Interview Questions to Predict a Good Data Scientist

  • What is the significance of the normal distribution to data science? This question is designed to demonstrate an understanding of one of the most basic elements of data science. It would be great if the response involved a discussion of the Central Limit Theorem, but maybe that’s too much to ask for. And maybe getting the mathematical formula for the Gaussian probability distribution function is an overreach. But aside from a mention of the “bell curve” it would be nice to hear something along the lines of: its mean, median and mode are all same, or the entire distribution can be specified using just two parameters — mean and variance, or maybe a description of its importance to linear regression (the workhorse of data science).
  • Tell me about your passion for data science. Do you: attend local meetups, participate in data challenges like Kaggle, work to use data for common good like public data hacking, speak at conferences, write books or articles, etc.? The point of this question is to determine whether the candidate feels that data science is their true calling. Do they think and dream about data? Do they see a problem and instantly look for a solution involving patterns in data? What books are in their library? A related question is how much does a mathematical foundation for data science play a role in how they think about the subject? A data scientist who understands the math behind the algorithms will typically perform much better.
  • Describe that last time you experienced frustration in a data science project you were working on, and how did you overcome it? Not all data science projects progress swimmingly along, as many potential roadblocks may occur. This question probes the depth of their true experience and how they managed to handle inevitable problems. People with scant knowledge and experience will easily be exposed here.
  • Think back to a past data science project you worked on. If the powers that be asked you to change one of your data sources, and thus use different predictors, how would you alter your solution? This question relates to the previous role the candidate has played, and how well they adapted to changing requirements such as introducing new data sets. Many times, lower level data scientists are simply given a data set with a list of predictors to use, without providing any input to their suitability. Heavier contributors, on the other hand, will be involved with dataset selection, feature engineering, and statistical analysis. You probably want a more well-rounded candidate for your team.
  • Research has stated that 2.3 billion people have been affected by floods in the last two decades. Describe how you’d approach a data science project to predict upcoming floods in the next 100–500 years. These predictions can be used to build dams at correct locations to minimize loss. This kind of question, or one more in alignment to your specific industry, calls for consideration of the “data science process” including problem formulation, data acquisition, data wrangling, exploratory data analysis, feature engineering, modeling the data (build, fit, and validate a model), and data storytelling with the results. The candidate needs to be intimately familiar with a data scientist’s workflow.