Whether you’re preparing for your first role in data analytics or aiming to crack high-paying roles at companies like Amazon, this curated set of real-world interview questions is designed just for you. Covering SQL queries, business insights, and data-driven decision-making, these questions are perfect for candidates with 0–3 years of experience targeting 22–25 LPA roles
1. What is the difference between WHERE and HAVING?
Solution:
Criteria | WHERE | HAVING |
---|---|---|
Application Level | Works on individual rows | Works on aggregated results |
Aggregate Functions | Cannot be used | Can be used |
Filtering Phase | Filters before aggregation | Filters after aggregation |
Example:
— Using WHERE
SELECT *
FROM Employees
WHERE Department = ‘Sales’;
— Using HAVING
SELECT Department, AVG(Salary) AS Avg_Salary
FROM Employees
GROUP BY Department
HAVING AVG(Salary) > 50000;
2. How would you forecast sales for the next quarter?
Solution:
Sales forecasting predicts future sales based on historical data.
Steps:
-
Collect data (sales, campaigns, seasonal trends)
-
Preprocess the data (cleaning, formatting time series)
-
Analyze trends and seasonality
-
Choose a model: ARIMA, SARIMA, Random Forest, LSTM
-
Evaluate (RMSE, MAE, MAPE)
-
Forecast
Example (Python with ARIMA):
from statsmodels.tsa.arima.model import ARIMA
model = ARIMA(sales_data[‘Sales’], order=(1, 1, 1))
result = model.fit()
forecast = result.forecast(steps=3)
print(forecast)
3. How do you remove duplicates from a table in SQL?
Solution:
Use ROW_NUMBER()
to identify duplicates and then delete.
WITH DuplicateCTE AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY EmployeeID, Name, Department ORDER BY EmployeeID) AS Row_Num
FROM Employees
)
DELETE FROM DuplicateCTE
WHERE Row_Num > 1;
4. What metrics would you track for Amazon Prime subscription growth?
Solution:
-
Subscriber Growth Rate
-
Churn Rate
-
Retention Rate
-
Customer Lifetime Value (CLV)
-
Average Revenue Per User (ARPU)
-
Engagement metrics (viewing hours, purchases)
-
Conversion Rate
Example Query:
5. Explain window functions and give an example.
Solution:
Window functions perform calculations over rows related to the current row, without collapsing them.
Example:
SELECT
EmployeeID,
Name,
Salary,
RANK() OVER (ORDER BY Salary DESC) AS Salary_Rank
FROM Employees;
6. How do you handle seasonality in sales data?
Solution:
-
Use seasonal decomposition
-
Apply SARIMA for modeling
-
Try moving averages or one-hot encoding months
Example (Python):
Solution:
FROM Employees
WHERE Salary < (SELECT MAX(Salary) FROM Employees);
8. How do you analyze customer churn using data?
Solution:
Steps:
-
Collect data (demographics, transactions)
-
Define churn (e.g., no activity in 6 months)
-
Engineer features
-
Do EDA
-
Build predictive models (Logistic Regression, Random Forest)
-
Segment users
-
Track churn KPIs
Example Query:
SELECT
CustomerID,
COUNT(CASE WHEN TransactionDate >= DATEADD(MONTH, -6, GETDATE()) THEN 1 END) AS Transactions_Last_6_Months
FROM CustomerTransactions
GROUP BY CustomerID
HAVING COUNT(CASE WHEN TransactionDate >= DATEADD(MONTH, -6, GETDATE()) THEN 1 END) = 0;
9. Write a query to find top N customers by total sales.
Solution:
SELECT CustomerID,
SUM(SalesAmount) AS Total_Sales
FROM Sales
GROUP BY CustomerID
ORDER BY Total_Sales DESC
LIMIT 5;
10. Explain A/B testing and how to interpret the results.
Solution:
A/B testing compares two versions to find the more effective one.
Steps:
-
Define goal
-
Form hypotheses
-
Split users into A (control) and B (test)
-
Run test
-
Perform statistical test (e.g., t-test)
-
Analyze (check p-value < 0.05)
Example Query:
SELECT
Version,
COUNT(*) AS Users,
SUM(Conversion) AS Conversions,
(SUM(Conversion) * 1.0 / COUNT(*)) AS Conversion_Rate
FROM AB_Testing_Data
GROUP BY Version;
Power BI/Tableau and Statistics & Probability
11. What is the difference between correlation and causation?
Solution:
-
Correlation measures the strength and direction of a relationship between two variables (range: -1 to +1). It does not imply causation.
-
Causation indicates that one variable directly influences another.
Example: -
Correlation: Ice cream sales and drowning rates rise in summer – correlated but not causal.
-
Causation: More study hours → higher exam scores.
12. How do you choose the right visualization for different datasets?
Solution:
Choosing a visual depends on the purpose:
-
Comparison: Bar, Column, Stacked Bar
-
Distribution: Histogram, Box Plot
-
Trend: Line, Area Chart
-
Part-to-Whole: Pie, Donut, Stacked Area
-
Correlation: Scatter, Bubble Chart
-
Geographic: Map visuals
Example:
Use a line chart for monthly sales trends, bar chart for region-wise sales.
13. Explain the Central Limit Theorem (CLT) in simple terms.
Solution:
CLT says that sample means form a normal distribution as sample size increases — regardless of population distribution.
-
Sample size ≥ 30 is usually sufficient
-
Mean of sample means = population mean (µ)
-
SE = σ / √n
Example:
Take repeated samples of 30 students’ heights – their average heights form a normal distribution.
14. How do you create a year-over-year growth visualization?
Solution:
In Power BI:
-
Use DAX measures:
-
Total Sales = SUM(Sales[Amount])
-
YoY Growth = CALCULATE([Total Sales], SAMEPERIODLASTYEAR(...))
-
YoY % = DIVIDE([Total Sales] - [YoY Growth], [YoY Growth])
-
-
Use a line and column chart with Year on X-axis.
In Tableau: -
Use Quick Table Calculation → “Year over Year Growth”.
15. Explain p-value in hypothesis testing.
Solution:
P-value tells how likely observed results are, assuming the null hypothesis is true.
-
p < 0.05: Significant → Reject H₀
-
p ≥ 0.05: Not significant → Fail to reject H₀
Example:
In an A/B test, p = 0.03 → Reject null hypothesis, results are statistically significant.
16. Explain drill-through & drill-down in Power BI.
Solution:
-
Drill-Down: View details within the same chart (e.g., click “Year” → see “Month”).
-
Drill-Through: Navigate to a new report page based on selected data (e.g., right-click “Region” → see detailed sales).
Implementation: -
Use hierarchies and drill-through filters, and enable navigation.
17. How do you determine if a dataset is skewed?
Solution:
-
Right-skewed: Tail on the right, Mean > Median
-
Left-skewed: Tail on the left, Mean < Median
Methods: -
Visuals: Histogram, Box Plot
-
Skewness formula:
Skewness = 3(Mean − Median) / Std Dev
-
Python:
data.skew()
Example: House prices → usually right-skewed.
18. What are parameters in Tableau, and how do you use them?
Solution:
Parameters are dynamic values used for filtering or calculations.
Use cases:
-
Switch between metrics (e.g., Sales vs. Profit)
-
What-if analysis
-
Filter Top N items
Example:
Create a parameter “Measure Selector” with values ‘Sales’ and ‘Profit’ → use it in a calculated field to toggle between views.
19. How do you optimize a Power BI dashboard for performance?
Solution:
Best Practices:
-
Use Star Schema for modeling
-
Prefer calculated measures over columns
-
Reduce data load with filters or DirectQuery
-
Remove unused columns, limit visuals
-
Disable auto date/time
Example:
UseREMOVEFILTERS()
in DAX to improve performance:
20. What is normal distribution, and why is it important?
Solution:
A bell-shaped, symmetric distribution where:
-
Mean = Median = Mode
-
68% within 1σ, 95% within 2σ, 99.7% within 3σ
Importance: -
Many natural variables follow it (e.g., height, IQ)
-
Basis for parametric tests (t-tests, CI)
Example:
If mean IQ = 100 and σ = 15 → 68% have IQ between 85–115.
21. How do you handle missing values in Pandas?
Solution:
Handling missing values is essential for data cleaning and analysis. Pandas offers multiple techniques.
Identifying Missing Values:
import pandas as pd
df = pd.DataFrame({‘Name’: [‘Alice’, ‘Bob’, None], ‘Age’: [25, None, 30]})
print(df.isnull()) # Shows True for missing values
print(df.isnull().sum()) # Count of missing values per column
Handling Missing Data:
- Drop Rows with Missing Values: df.dropna(inplace=True)
- Fill Missing Values with Static or Calculated Values:
df.fillna(0, inplace=True) # Replace with 0
df[‘Age’].fillna(df[‘Age’].mean(), inplace=True) # Fill using column mean
- Forward/Backward Fill (useful for time series):
df.fillna(method=’ffill’, inplace=True)
df.fillna(method=’bfill’, inplace=True)
Best Practices:
-
Drop only if missing data is minimal.
-
Fill with statistical values for numerical data.
-
Use
ffill
/bfill
for time-series or sequential data.
22. Write a Python function to calculate the moving average of a time series.
Solution:
A moving average smooths data by averaging points over a sliding window.
import pandas as pd
def moving_average(data, window):
return data.rolling(window=window).mean()
# Example usage
data = pd.Series([10, 20, 30, 40, 50, 60])
print(moving_average(data, window=3))
Output:
0 NaN
1 NaN
2 20.0
3 30.0
4 40.0
5 50.0
dtype: float64
0 NaN
1 NaN
2 20.0
3 30.0
4 40.0
5 50.0
dtype: float64
Use Cases:
-
Smoothing noisy sensor or financial data.
-
Analyzing stock trends and patterns.
23. How do you merge two datasets in Pandas?
Solution:
Merging in Pandas combines data from different DataFrames based on common columns or indexes, similar to SQL joins.
Merge Methods:
-
Inner Join (default): Returns matching rows only.
import pandas as pd
df1 = pd.DataFrame({‘ID’: [1, 2, 3], ‘Name’: [‘Alice’, ‘Bob’, ‘Charlie’]})
df2 = pd.DataFrame({‘ID’: [2, 3, 4], ‘Age’: [25, 30, 35]})
merged_df = pd.merge(df1, df2, on=’ID’, how=’inner’)
print(merged_df)
Output:
ID Name Age
0 2 Bob 25
1 3 Charlie 30
Left Join: Retains all rows from the left DataFrame.
pd.merge(df1, df2, on=’ID’, how=’left’)
Right Join: Retains all rows from the right DataFrame.
pd.merge(df1, df2, on=’ID’, how=’right’)
Outer Join: Includes all rows from both DataFrames.
pd.merge(df1, df2, on=’ID’, how=’outer’)
When to Use What:
-
Use inner when you only care about matching records.
-
Use left when all records from the first dataset are required.
-
Use outer when you want everything from both datasets, even if unmatched.
24. Explain groupby() and agg() in Pandas with an example.
Solution:
-
groupby()
: Groups the DataFrame based on one or more columns. -
agg()
: Applies aggregation functions likesum
,mean
,max
, etc.
Example:
import pandas as pd
data = {‘Department’: [‘HR’, ‘HR’, ‘IT’, ‘IT’, ‘Finance’],
‘Salary’: [50000, 55000, 60000, 65000, 70000]}
df = pd.DataFrame(data)
grouped = df.groupby(‘Department’).agg({‘Salary’: [‘mean’, ‘sum’]})
print(grouped)
Output:
Salary
mean sum
Department
Finance 70000 70000
HR 52500 105000
IT 62500 125000
Use Cases:
-
Aggregating sales data by region or category.
-
Calculating performance metrics department-wise.
25. What is the difference between apply() and map() in Pandas?
Feature | apply() |
map() |
---|---|---|
Purpose | Works on DataFrame or Series | Works only on Series |
Scope | Applies functions element-wise | Maps values using a dictionary or function |
Flexibility | Can apply complex functions | Mostly for simple mappings/replacements |
Return Type | Series or DataFrame | Series |
Examples:
-
apply()
on DataFrame:
import pandas as pd
df = pd.DataFrame({‘A’: [1, 2, 3], ‘B’: [4, 5, 6]})
df[‘Sum’] = df.apply(lambda row: row[‘A’] + row[‘B’], axis=1)
print(df)
Output:
A B Sum
0 1 4 5
1 2 5 7
2 3 6 9
map()
on Series:
mapping = {1: ‘One’, 2: ‘Two’, 3: ‘Three’}
df[‘Mapped_A’] = df[‘A’].map(mapping)
print(df)
Output:
A B Sum Mapped_A
0 1 4 5 One
1 2 5 7 Two
2 3 6 9 Three
When to Use:
-
Use
apply()
for complex row or column operations. -
Use
map()
for simple element-wise replacements in Series.
Conclusion
These 25 Python for Data Analytics questions cover essential topics—from basics to advanced Pandas operations. With practical examples and clear explanations, this set helps you confidently prepare for data analyst interviews and apply Python effectively in real-world scenarios.
Good luck with your interview prep!
Join Our Telegram Group (1.9 Lakhs + members):- Click Here To Join
For Experience Job Updates Follow – FLM Pro Network – Instagram Page
For All types of Job Updates (B.Tech, Degree, Walk in, Internships, Govt Jobs & Core Jobs) Follow – Frontlinesmedia JobUpdates – Instagram Page
For Healthcare Domain Related Jobs Follow – Frontlines Healthcare – Instagram Page
For Major Job Updates & Other Info Follow – Frontlinesmedia – Instagram Page