Share some Data Scientist E20-007 exam questions and answers below.
What is a property of window functions in SQL commands?

A. They can be used to calculate moving averages over various intervals.

B. They group rows into a single output row.

C. They can be used between the keywords FROM and WHERE in a SELECT command.

D. They don’t require ordering of data within a window.

Answer: A

A business colleague who is new to Hadoop approaches you with a question. The

colleague wants to know the best approach to access their data. The colleague has previously worked extensively with SQL and databases.

Which query interface should be recommended?





Answer: A

You have plotted the distribution of savings account sizes for a bank.

Based on the distribution shown in the exhibit, how would you proceed?

A.Data is extremely skewed. Replot the data on a logarithmic scale to get a better understanding of it.

B.Data is extremely skewed but looks bimodal. Replot the data in the range 2,500 – 10,000 to be certain.

C.Accounts of sizes greater than 2,500 are rare and are most likely outliers. Eliminate them from future analysis.

D.Data is extremely skewed. Split the analysis into two cohorts; accounts less than 2,500 and accounts greater than 2,500.

Answer: A

Consider a database with 4 transactions:

Transaction 1: {cheese, bread, milk}

Transaction 2: {soda, bread, milk}

Transaction 3: {cheese, bread}

Transaction 4: {cheese, soda, juice}

You decide to run the association rules algorithm where minimum support is 50%. Which rule has a confidence at least 50%?

A. {cheese} => {bread}

B. {juice} => {cheese}

C. {milk} => {soda}

D. {soda} => {milk}

Answer: A

A data scientist plans to classify the sentiment polarity of 10, 000 product reviews collected from the

Internet. What is the most appropriate model to use? Suppose labeled training data is available.

A. Na ve Bayesian classifier

B. Linear regression

C. Logistic regression

D. K-means clustering

Answer: A

