Whether you’re preparing for your first role in data analytics or aiming to crack high-paying roles at companies like Amazon, this curated set of real-world interview questions is designed just for you. Covering SQL queries, business insights, and data-driven decision-making, these questions are perfect for candidates with 0–3 years of experience targeting 22–25 LPA roles

1. What is the difference between WHERE and HAVING?
Solution:

Criteria WHERE HAVING
Application Level Works on individual rows Works on aggregated results
Aggregate Functions Cannot be used Can be used
Filtering Phase Filters before aggregation Filters after aggregation

Example:

— Using WHERE
SELECT *
FROM Employees
WHERE Department = ‘Sales’;

— Using HAVING
SELECT Department, AVG(Salary) AS Avg_Salary
FROM Employees
GROUP BY Department
HAVING AVG(Salary) > 50000;

2. How would you forecast sales for the next quarter?
Solution:
Sales forecasting predicts future sales based on historical data.

Steps:

  1. Collect data (sales, campaigns, seasonal trends)

  2. Preprocess the data (cleaning, formatting time series)

  3. Analyze trends and seasonality

  4. Choose a model: ARIMA, SARIMA, Random Forest, LSTM

  5. Evaluate (RMSE, MAE, MAPE)

  6. Forecast

Example (Python with ARIMA):

from statsmodels.tsa.arima.model import ARIMA

model = ARIMA(sales_data[‘Sales’], order=(1, 1, 1))
result = model.fit()
forecast = result.forecast(steps=3)
print(forecast)

3. How do you remove duplicates from a table in SQL?
Solution:
Use ROW_NUMBER() to identify duplicates and then delete.

WITH DuplicateCTE AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY EmployeeID, Name, Department ORDER BY EmployeeID) AS Row_Num
FROM Employees
)
DELETE FROM DuplicateCTE
WHERE Row_Num > 1;

4. What metrics would you track for Amazon Prime subscription growth?
Solution:

Example Query:

sql
SELECT
COUNT(*) AS Total_Subscribers,
SUM(CASE WHEN SubscriptionStatus = ‘Cancelled’ THEN 1 ELSE 0 END) AS Cancelled_Subscribers,
(SUM(CASE WHEN SubscriptionStatus = ‘Cancelled’ THEN 1 ELSE 0 END) * 100.0 / COUNT(*)) AS Churn_Rate
FROM PrimeSubscriptions;

5. Explain window functions and give an example.
Solution:
Window functions perform calculations over rows related to the current row, without collapsing them.

Example:

SELECT
EmployeeID,
Name,
Salary,
RANK() OVER (ORDER BY Salary DESC) AS Salary_Rank
FROM Employees;

6. How do you handle seasonality in sales data?
Solution:

  • Use seasonal decomposition

  • Apply SARIMA for modeling

  • Try moving averages or one-hot encoding months

Example (Python):

python
from statsmodels.tsa.seasonal import seasonal_decompose
result = seasonal_decompose(sales_data[‘Sales’], model=’multiplicative’, period=12)
result.plot()
7. Write a query to find the second-highest salary from an employee table.
Solution:
SELECT MAX(Salary) AS Second_Highest_Salary
FROM Employees
WHERE Salary < (SELECT MAX(Salary) FROM Employees);

8. How do you analyze customer churn using data?
Solution:
Steps:

  1. Collect data (demographics, transactions)

  2. Define churn (e.g., no activity in 6 months)

  3. Engineer features

  4. Do EDA

  5. Build predictive models (Logistic Regression, Random Forest)

  6. Segment users

  7. Track churn KPIs

Example Query:

SELECT
CustomerID,
COUNT(CASE WHEN TransactionDate >= DATEADD(MONTH, -6, GETDATE()) THEN 1 END) AS Transactions_Last_6_Months
FROM CustomerTransactions
GROUP BY CustomerID
HAVING COUNT(CASE WHEN TransactionDate >= DATEADD(MONTH, -6, GETDATE()) THEN 1 END) = 0;

9. Write a query to find top N customers by total sales.
Solution:

SELECT CustomerID,
SUM(SalesAmount) AS Total_Sales
FROM Sales
GROUP BY CustomerID
ORDER BY Total_Sales DESC
LIMIT 5;

10. Explain A/B testing and how to interpret the results.
Solution:
A/B testing compares two versions to find the more effective one.

Steps:

  1. Define goal

  2. Form hypotheses

  3. Split users into A (control) and B (test)

  4. Run test

  5. Perform statistical test (e.g., t-test)

  6. Analyze (check p-value < 0.05)

Example Query:

SELECT
Version,
COUNT(*) AS Users,
SUM(Conversion) AS Conversions,
(SUM(Conversion) * 1.0 / COUNT(*)) AS Conversion_Rate
FROM AB_Testing_Data
GROUP BY Version;

Power BI/Tableau and Statistics & Probability

11. What is the difference between correlation and causation?

Solution:


12. How do you choose the right visualization for different datasets?

Solution:
Choosing a visual depends on the purpose:


13. Explain the Central Limit Theorem (CLT) in simple terms.

Solution:
CLT says that sample means form a normal distribution as sample size increases — regardless of population distribution.

14. How do you create a year-over-year growth visualization?

Solution:
In Power BI:


15. Explain p-value in hypothesis testing.

Solution:
P-value tells how likely observed results are, assuming the null hypothesis is true.


16. Explain drill-through & drill-down in Power BI.

Solution:


17. How do you determine if a dataset is skewed?

Solution:

data.skew()

Example: House prices → usually right-skewed.


18. What are parameters in Tableau, and how do you use them?

Solution:
Parameters are dynamic values used for filtering or calculations.
Use cases:


19. How do you optimize a Power BI dashboard for performance?

Solution:
Best Practices:

  1. Use Star Schema for modeling

  2. Prefer calculated measures over columns

  3. Reduce data load with filters or DirectQuery

  4. Remove unused columns, limit visuals

  5. Disable auto date/time
    Example:
    Use REMOVEFILTERS() in DAX to improve performance:

Optimized = CALCULATE([Total Sales], REMOVEFILTERS(Calendar[Year]))

20. What is normal distribution, and why is it important?

Solution:
A bell-shaped, symmetric distribution where:

  • Mean = Median = Mode

  • 68% within 1σ, 95% within 2σ, 99.7% within 3σ
    Importance:

  • Many natural variables follow it (e.g., height, IQ)

  • Basis for parametric tests (t-tests, CI)
    Example:
    If mean IQ = 100 and σ = 15 → 68% have IQ between 85–115.

21. How do you handle missing values in Pandas?

Solution:
Handling missing values is essential for data cleaning and analysis. Pandas offers multiple techniques.

Identifying Missing Values:

import pandas as pd
df = pd.DataFrame({‘Name’: [‘Alice’, ‘Bob’, None], ‘Age’: [25, None, 30]})
print(df.isnull()) # Shows True for missing values
print(df.isnull().sum()) # Count of missing values per column

Handling Missing Data:

  • Drop Rows with Missing Values: df.dropna(inplace=True)
  • Fill Missing Values with Static or Calculated Values:

df.fillna(0, inplace=True) # Replace with 0
df[‘Age’].fillna(df[‘Age’].mean(), inplace=True) # Fill using column mean

  • Forward/Backward Fill (useful for time series):

df.fillna(method=’ffill’, inplace=True)
df.fillna(method=’bfill’, inplace=True)

Best Practices:

  • Drop only if missing data is minimal.

  • Fill with statistical values for numerical data.

  • Use ffill/bfill for time-series or sequential data.


22. Write a Python function to calculate the moving average of a time series.

Solution:
A moving average smooths data by averaging points over a sliding window.

import pandas as pd

def moving_average(data, window):
return data.rolling(window=window).mean()

# Example usage
data = pd.Series([10, 20, 30, 40, 50, 60])
print(moving_average(data, window=3))

Output:

0 NaN
1 NaN
2 20.0
3 30.0
4 40.0
5 50.0
dtype: float64

0 NaN
1 NaN
2 20.0
3 30.0
4 40.0
5 50.0
dtype: float64

Use Cases:

  • Smoothing noisy sensor or financial data.

  • Analyzing stock trends and patterns.

23. How do you merge two datasets in Pandas?

Solution:
Merging in Pandas combines data from different DataFrames based on common columns or indexes, similar to SQL joins.

Merge Methods:

  • Inner Join (default): Returns matching rows only.

import pandas as pd
df1 = pd.DataFrame({‘ID’: [1, 2, 3], ‘Name’: [‘Alice’, ‘Bob’, ‘Charlie’]})
df2 = pd.DataFrame({‘ID’: [2, 3, 4], ‘Age’: [25, 30, 35]})
merged_df = pd.merge(df1, df2, on=’ID’, how=’inner’)
print(merged_df)

Output:

ID Name Age
0 2 Bob 25
1 3 Charlie 30

Left Join: Retains all rows from the left DataFrame.

pd.merge(df1, df2, on=’ID’, how=’left’)

Right Join: Retains all rows from the right DataFrame.

pd.merge(df1, df2, on=’ID’, how=’right’)

Outer Join: Includes all rows from both DataFrames.

pd.merge(df1, df2, on=’ID’, how=’outer’)

When to Use What:

  • Use inner when you only care about matching records.

  • Use left when all records from the first dataset are required.

  • Use outer when you want everything from both datasets, even if unmatched.


24. Explain groupby() and agg() in Pandas with an example.

Solution:

  • groupby(): Groups the DataFrame based on one or more columns.

  • agg(): Applies aggregation functions like sum, mean, max, etc.

Example:

import pandas as pd

data = {‘Department’: [‘HR’, ‘HR’, ‘IT’, ‘IT’, ‘Finance’],
‘Salary’: [50000, 55000, 60000, 65000, 70000]}
df = pd.DataFrame(data)

grouped = df.groupby(‘Department’).agg({‘Salary’: [‘mean’, ‘sum’]})
print(grouped)

Output:

Salary
mean sum
Department
Finance 70000 70000
HR 52500 105000
IT 62500 125000

Use Cases:

  • Aggregating sales data by region or category.

  • Calculating performance metrics department-wise.

 25. What is the difference between apply() and map() in Pandas?

Feature apply() map()
Purpose Works on DataFrame or Series Works only on Series
Scope Applies functions element-wise Maps values using a dictionary or function
Flexibility Can apply complex functions Mostly for simple mappings/replacements
Return Type Series or DataFrame Series

Examples:

  • apply() on DataFrame:

import pandas as pd
df = pd.DataFrame({‘A’: [1, 2, 3], ‘B’: [4, 5, 6]})
df[‘Sum’] = df.apply(lambda row: row[‘A’] + row[‘B’], axis=1)
print(df)

Output:

A B Sum
0 1 4 5
1 2 5 7
2 3 6 9

map() on Series:

mapping = {1: ‘One’, 2: ‘Two’, 3: ‘Three’}
df[‘Mapped_A’] = df[‘A’].map(mapping)
print(df)

Output:

A B Sum Mapped_A
0 1 4 5 One
1 2 5 7 Two
2 3 6 9 Three

When to Use:

  • Use apply() for complex row or column operations.

  • Use map() for simple element-wise replacements in Series.

Conclusion

These 25 Python for Data Analytics questions cover essential topics—from basics to advanced Pandas operations. With practical examples and clear explanations, this set helps you confidently prepare for data analyst interviews and apply Python effectively in real-world scenarios.

Good luck with your interview prep! 

Join Our Telegram Group (1.9 Lakhs + members):- Click Here To Join

For Experience Job Updates Follow – FLM Pro Network – Instagram Page

For All types of Job Updates (B.Tech, Degree, Walk in, Internships, Govt Jobs & Core Jobs) Follow – Frontlinesmedia JobUpdates – Instagram Page

For Healthcare Domain Related Jobs Follow – Frontlines Healthcare – Instagram Page

For Major Job Updates & Other Info Follow – Frontlinesmedia – Instagram Page