Top 25 Amazon Data Analytics Interview Questions and Answers For 0-3 Years of Experience

Whether you’re preparing for your first role in data analytics or aiming to crack high-paying roles at companies like Amazon, this curated set of real-world interview questions is designed just for you. Covering SQL queries, business insights, and data-driven decision-making, these questions are perfect for candidates with 0–3 years of experience targeting 22–25 LPA roles

1. What is the difference between WHERE and HAVING?
Solution:

Criteria	WHERE	HAVING
Application Level	Works on individual rows	Works on aggregated results
Aggregate Functions	Cannot be used	Can be used
Filtering Phase	Filters before aggregation	Filters after aggregation

Example:

— Using WHERE
SELECT *
FROM Employees
WHERE Department = ‘Sales’;

— Using HAVING
SELECT Department, AVG(Salary) AS Avg_Salary
FROM Employees
GROUP BY Department
HAVING AVG(Salary) > 50000;

2. How would you forecast sales for the next quarter?
Solution:
Sales forecasting predicts future sales based on historical data.

Steps:

Collect data (sales, campaigns, seasonal trends)
Preprocess the data (cleaning, formatting time series)
Analyze trends and seasonality
Choose a model: ARIMA, SARIMA, Random Forest, LSTM
Evaluate (RMSE, MAE, MAPE)
Forecast

Example (Python with ARIMA):

from statsmodels.tsa.arima.model import ARIMA

model = ARIMA(sales_data[‘Sales’], order=(1, 1, 1))
result = model.fit()
forecast = result.forecast(steps=3)
print(forecast)

3. How do you remove duplicates from a table in SQL?
Solution:
Use ROW_NUMBER() to identify duplicates and then delete.

WITH DuplicateCTE AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY EmployeeID, Name, Department ORDER BY EmployeeID) AS Row_Num
FROM Employees
)
DELETE FROM DuplicateCTE
WHERE Row_Num > 1;

4. What metrics would you track for Amazon Prime subscription growth?
Solution:

Subscriber Growth Rate
Churn Rate
Retention Rate
Customer Lifetime Value (CLV)
Average Revenue Per User (ARPU)
Engagement metrics (viewing hours, purchases)
Conversion Rate

Example Query:

5. Explain window functions and give an example.
Solution:
Window functions perform calculations over rows related to the current row, without collapsing them.

Example:

SELECT
EmployeeID,
Name,
Salary,
RANK() OVER (ORDER BY Salary DESC) AS Salary_Rank
FROM Employees;

6. How do you handle seasonality in sales data?
Solution:

Use seasonal decomposition
Apply SARIMA for modeling
Try moving averages or one-hot encoding months

Example (Python):

7. Write a query to find the second-highest salary from an employee table.
Solution:

SELECT MAX(Salary) AS Second_Highest_Salary
FROM Employees
WHERE Salary < (SELECT MAX(Salary) FROM Employees);

8. How do you analyze customer churn using data?
Solution:
Steps:

Collect data (demographics, transactions)
Define churn (e.g., no activity in 6 months)
Engineer features
Do EDA
Build predictive models (Logistic Regression, Random Forest)
Segment users
Track churn KPIs

Example Query:

SELECT
CustomerID,
COUNT(CASE WHEN TransactionDate >= DATEADD(MONTH, -6, GETDATE()) THEN 1 END) AS Transactions_Last_6_Months
FROM CustomerTransactions
GROUP BY CustomerID
HAVING COUNT(CASE WHEN TransactionDate >= DATEADD(MONTH, -6, GETDATE()) THEN 1 END) = 0;

9. Write a query to find top N customers by total sales.
Solution:

SELECT CustomerID,
SUM(SalesAmount) AS Total_Sales
FROM Sales
GROUP BY CustomerID
ORDER BY Total_Sales DESC
LIMIT 5;

10. Explain A/B testing and how to interpret the results.
Solution:
A/B testing compares two versions to find the more effective one.

Steps:

Define goal
Form hypotheses
Split users into A (control) and B (test)
Run test
Perform statistical test (e.g., t-test)
Analyze (check p-value < 0.05)

Example Query:

SELECT
Version,
COUNT(*) AS Users,
SUM(Conversion) AS Conversions,
(SUM(Conversion) * 1.0 / COUNT(*)) AS Conversion_Rate
FROM AB_Testing_Data
GROUP BY Version;

Power BI/Tableau and Statistics & Probability

11. What is the difference between correlation and causation?

Solution:

Correlation measures the strength and direction of a relationship between two variables (range: -1 to +1). It does not imply causation.
Causation indicates that one variable directly influences another.
Example:
Correlation: Ice cream sales and drowning rates rise in summer – correlated but not causal.
Causation: More study hours → higher exam scores.

12. How do you choose the right visualization for different datasets?

Solution:
Choosing a visual depends on the purpose:

Comparison: Bar, Column, Stacked Bar
Distribution: Histogram, Box Plot
Trend: Line, Area Chart
Part-to-Whole: Pie, Donut, Stacked Area
Correlation: Scatter, Bubble Chart
Geographic: Map visuals
Example:
Use a line chart for monthly sales trends, bar chart for region-wise sales.

13. Explain the Central Limit Theorem (CLT) in simple terms.

Solution:
CLT says that sample means form a normal distribution as sample size increases — regardless of population distribution.

Sample size ≥ 30 is usually sufficient
Mean of sample means = population mean (µ)
SE = σ / √n
Example:
Take repeated samples of 30 students’ heights – their average heights form a normal distribution.

14. How do you create a year-over-year growth visualization?

Solution:
In Power BI:

Use DAX measures:
- Total Sales = SUM(Sales[Amount])
- YoY Growth = CALCULATE([Total Sales], SAMEPERIODLASTYEAR(...))
- YoY % = DIVIDE([Total Sales] - [YoY Growth], [YoY Growth])
Use a line and column chart with Year on X-axis.
In Tableau:
Use Quick Table Calculation → “Year over Year Growth”.

15. Explain p-value in hypothesis testing.

Solution:
P-value tells how likely observed results are, assuming the null hypothesis is true.

p < 0.05: Significant → Reject H₀
p ≥ 0.05: Not significant → Fail to reject H₀
Example:
In an A/B test, p = 0.03 → Reject null hypothesis, results are statistically significant.

16. Explain drill-through & drill-down in Power BI.

Solution:

Drill-Down: View details within the same chart (e.g., click “Year” → see “Month”).
Drill-Through: Navigate to a new report page based on selected data (e.g., right-click “Region” → see detailed sales).
Implementation:
Use hierarchies and drill-through filters, and enable navigation.

17. How do you determine if a dataset is skewed?

Solution:

Right-skewed: Tail on the right, Mean > Median
Left-skewed: Tail on the left, Mean < Median
Methods:
Visuals: Histogram, Box Plot
Skewness formula:
Skewness = 3(Mean − Median) / Std Dev
Python:

data.skew()

Example: House prices → usually right-skewed.

18. What are parameters in Tableau, and how do you use them?

Solution:
Parameters are dynamic values used for filtering or calculations.
Use cases:

Switch between metrics (e.g., Sales vs. Profit)
What-if analysis
Filter Top N items
Example:
Create a parameter “Measure Selector” with values ‘Sales’ and ‘Profit’ → use it in a calculated field to toggle between views.

19. How do you optimize a Power BI dashboard for performance?

Solution:
Best Practices:

Use Star Schema for modeling
Prefer calculated measures over columns
Reduce data load with filters or DirectQuery
Remove unused columns, limit visuals
Disable auto date/time
Example:
Use REMOVEFILTERS() in DAX to improve performance:

20. What is normal distribution, and why is it important?

Solution:
A bell-shaped, symmetric distribution where:

Mean = Median = Mode
68% within 1σ, 95% within 2σ, 99.7% within 3σ
Importance:
Many natural variables follow it (e.g., height, IQ)
Basis for parametric tests (t-tests, CI)
Example:
If mean IQ = 100 and σ = 15 → 68% have IQ between 85–115.

21. How do you handle missing values in Pandas?

Solution:
Handling missing values is essential for data cleaning and analysis. Pandas offers multiple techniques.

Identifying Missing Values:

import pandas as pd
df = pd.DataFrame({‘Name’: [‘Alice’, ‘Bob’, None], ‘Age’: [25, None, 30]})
print(df.isnull()) # Shows True for missing values
print(df.isnull().sum()) # Count of missing values per column

Handling Missing Data:

Drop Rows with Missing Values: df.dropna(inplace=True)
Fill Missing Values with Static or Calculated Values:

df.fillna(0, inplace=True) # Replace with 0
df[‘Age’].fillna(df[‘Age’].mean(), inplace=True) # Fill using column mean

Forward/Backward Fill (useful for time series):

df.fillna(method=’ffill’, inplace=True)
df.fillna(method=’bfill’, inplace=True)

Best Practices:

Drop only if missing data is minimal.
Fill with statistical values for numerical data.
Use ffill/bfill for time-series or sequential data.

22. Write a Python function to calculate the moving average of a time series.

Solution:
A moving average smooths data by averaging points over a sliding window.

import pandas as pd

def moving_average(data, window):
return data.rolling(window=window).mean()

# Example usage
data = pd.Series([10, 20, 30, 40, 50, 60])
print(moving_average(data, window=3))

Output:

0 NaN
1 NaN
2 20.0
3 30.0
4 40.0
5 50.0
dtype: float64

Use Cases:

Smoothing noisy sensor or financial data.
Analyzing stock trends and patterns.

23. How do you merge two datasets in Pandas?

Solution:
Merging in Pandas combines data from different DataFrames based on common columns or indexes, similar to SQL joins.

Merge Methods:

Inner Join (default): Returns matching rows only.

import pandas as pd
df1 = pd.DataFrame({‘ID’: [1, 2, 3], ‘Name’: [‘Alice’, ‘Bob’, ‘Charlie’]})
df2 = pd.DataFrame({‘ID’: [2, 3, 4], ‘Age’: [25, 30, 35]})
merged_df = pd.merge(df1, df2, on=’ID’, how=’inner’)
print(merged_df)

Output:

ID Name Age
0 2 Bob 25
1 3 Charlie 30

Left Join: Retains all rows from the left DataFrame.

pd.merge(df1, df2, on=’ID’, how=’left’)

Right Join: Retains all rows from the right DataFrame.

pd.merge(df1, df2, on=’ID’, how=’right’)

Outer Join: Includes all rows from both DataFrames.

pd.merge(df1, df2, on=’ID’, how=’outer’)

When to Use What:

Use inner when you only care about matching records.
Use left when all records from the first dataset are required.
Use outer when you want everything from both datasets, even if unmatched.

24. Explain groupby() and agg() in Pandas with an example.

Solution:

groupby(): Groups the DataFrame based on one or more columns.
agg(): Applies aggregation functions like sum, mean, max, etc.

Example:

import pandas as pd

data = {‘Department’: [‘HR’, ‘HR’, ‘IT’, ‘IT’, ‘Finance’],
‘Salary’: [50000, 55000, 60000, 65000, 70000]}
df = pd.DataFrame(data)

grouped = df.groupby(‘Department’).agg({‘Salary’: [‘mean’, ‘sum’]})
print(grouped)

Output:

Salary
mean sum
Department
Finance 70000 70000
HR 52500 105000
IT 62500 125000

Use Cases:

Aggregating sales data by region or category.
Calculating performance metrics department-wise.

25. What is the difference between apply() and map() in Pandas?

Feature	`apply()`	`map()`
Purpose	Works on DataFrame or Series	Works only on Series
Scope	Applies functions element-wise	Maps values using a dictionary or function
Flexibility	Can apply complex functions	Mostly for simple mappings/replacements
Return Type	Series or DataFrame	Series

Examples:

apply() on DataFrame:

import pandas as pd
df = pd.DataFrame({‘A’: [1, 2, 3], ‘B’: [4, 5, 6]})
df[‘Sum’] = df.apply(lambda row: row[‘A’] + row[‘B’], axis=1)
print(df)

Output:

A B Sum
0 1 4 5
1 2 5 7
2 3 6 9

map() on Series:

mapping = {1: ‘One’, 2: ‘Two’, 3: ‘Three’}
df[‘Mapped_A’] = df[‘A’].map(mapping)
print(df)

Output:

A B Sum Mapped_A
0 1 4 5 One
1 2 5 7 Two
2 3 6 9 Three

When to Use:

Use apply() for complex row or column operations.
Use map() for simple element-wise replacements in Series.

Conclusion

These 25 Python for Data Analytics questions cover essential topics—from basics to advanced Pandas operations. With practical examples and clear explanations, this set helps you confidently prepare for data analyst interviews and apply Python effectively in real-world scenarios.

Good luck with your interview prep!

Join Our Telegram Group (1.9 Lakhs + members):- Click Here To Join

For Experience Job Updates Follow – FLM Pro Network – Instagram Page

For All types of Job Updates (B.Tech, Degree, Walk in, Internships, Govt Jobs & Core Jobs) Follow – Frontlinesmedia JobUpdates – Instagram Page

For Healthcare Domain Related Jobs Follow – Frontlines Healthcare – Instagram Page

For Major Job Updates & Other Info Follow – Frontlinesmedia – Instagram Page

Tagged data analytics interview questions and answers, flm education, interview preparation

Top 25 Amazon Data Analytics Interview Questions and Answers For 0-3 Years of Experience

Power BI/Tableau and Statistics & Probability

11. What is the difference between correlation and causation?

12. How do you choose the right visualization for different datasets?

13. Explain the Central Limit Theorem (CLT) in simple terms.

14. How do you create a year-over-year growth visualization?

15. Explain p-value in hypothesis testing.

16. Explain drill-through & drill-down in Power BI.

17. How do you determine if a dataset is skewed?

18. What are parameters in Tableau, and how do you use them?

19. How do you optimize a Power BI dashboard for performance?

20. What is normal distribution, and why is it important?

21. How do you handle missing values in Pandas?

22. Write a Python function to calculate the moving average of a time series.

23. How do you merge two datasets in Pandas?

24. Explain groupby() and agg() in Pandas with an example.

25. What is the difference between apply() and map() in Pandas?

Conclusion

Office Address

Call Us On

Office Mail