Top 20 Python Interview Questions for Data Science Rol
What are Python's built-in data types?
Python includes several built-in types such
as:
- int, float, complex (numeric types)
- str (string)
- list, tuple, set, dict (sequence and collection types)
- bool (boolean)
- None (null value)
What is the difference between list, tuple, and set?
list: Mutable, ordered, allows duplicates.
tuple: Immutable, ordered, allows duplicates.
set: Unordered, mutable, does not allow duplicates.
Explain the difference between shallow copy and deep copy
in Python.
A shallow copy creates a new object but
does not recursively copy the objects it contains. A deep copy copies all
nested objects creating entirely new objects for everything.
What is list comprehension in Python?
List comprehension is a concise way to
create lists in Python. It can replace for-loops to make the code more
readable.
What are Python decorators?
Decorators are functions that modify the
behavior of other functions or methods. They are used for logging, enforcing
access control, memoization, and more.
What is a lambda function in Python?
A lambda function is an anonymous, small
function defined using the lambda keyword. It can take any number of arguments
but only contains a single expression.
How does Python handle memory management?
Python uses reference counting and garbage
collection to manage memory. The built-in garbage collector automatically
deallocates objects that are no longer in use.
Explain *args and **kwargs in Python.
*args: Used to pass a variable number of
non-keyword arguments to a function.
**kwargs: Used to pass a variable number of keyword arguments to a function.
What is a Pandas DataFrame?
A DataFrame is a 2-dimensional,
size-mutable, and heterogeneous tabular data structure with labeled axes (rows
and columns) in the Pandas library.
How can you handle missing values in a dataset using
Pandas?
df.dropna(): Removes rows/columns with
missing values.
df.fillna(value): Fills missing values with a specified value.
df.isnull(): Detects missing values.
What is the difference between iloc and loc in Pandas?
iloc: Accesses rows and columns by index
positions.
loc: Accesses rows and columns by labels or conditions.
How do you merge two DataFrames in Pandas?
You can merge two DataFrames using the
merge() function, which works similarly to SQL joins.
What is the difference between apply() and map() in
Pandas?
apply(): Used to apply a function along an
axis of the DataFrame.
map(): Used element-wise to transform a Series according to an input mapping or
function.
What is a NumPy array and how is it different from a
Python list?
A NumPy array is a powerful n-dimensional
array object that supports fast mathematical operations. It is more efficient
in terms of memory and performance compared to a Python list, which is less
suited for mathematical computations.
Explain broadcasting in NumPy.
Broadcasting refers to the ability of NumPy
to perform element-wise operations on arrays of different shapes, aligning them
as necessary.
What is the difference between a for loop and list
comprehension in Python?
List comprehension provides a more concise
and faster alternative to for loops when creating lists.
What is the significance of __init__ in Python?
__init__ is the constructor method in
Python classes. It is called when an object is created and is used to
initialize the object's attributes.
What is regularization in machine learning and how is it
implemented in Python?
Regularization is a technique used to
prevent overfitting by adding a penalty term to the loss function. In Python,
regularization can be implemented using libraries like sklearn (e.g., Ridge or
Lasso regression).
Explain how to handle imbalanced datasets in Python.
Imbalanced datasets can be handled using
techniques like:
Resampling (over-sampling minority class or under-sampling majority class)
Using algorithms like SMOTE (Synthetic Minority Over-sampling Technique)
Using performance metrics like AUC-ROC instead of accuracy.
What are some common Python libraries used in data
science?
Pandas: For data manipulation and analysis.
NumPy: For numerical operations.
Matplotlib/ Seaborn: For data visualization.
Scikit-learn: For machine learning algorithms.
TensorFlow/Keras: For deep learning.
0 Comments