Hi! I completed the timed exam from the certification and was just wondering if anyone knows the formal of the practical exam?
i skimmed the subreddit and saw a lot of practical exams include a presentation and video recording of it. is that the case with this practical too? i looked at the sample exam and it doesn’t seem like it but just want to be sure.
if anyone can provide a description of what the practical exam entails that would be greatly appreciated. Thank you!
Videos
I tried to start my first timed exam (DS101) and the first thing it told me was that I need to enable a webcam and screen sharing. Is the screen sharing done to make sure you’re not referencing any material? Or is it to make sure you’re not referencing any material that’s blatantly cheating (like feeding all your questions into ChatGPT)? I’m wondering if I’m allowed to look at official documentation for Python modules if I forget if a parameter name is “column” or “columns” or “col” or “usecols” etc. After all, Python users in the professional world have to look things up all the time. Nobody has every module, function, attribute, etc memorized.
Hi all! I finished the timed exam and am now on to the practical. When attempting to try to the practice practical exam, it will allow me to pull up everything with a query like SELECT * FROM coffee, but the second I try to get specific with a column shown in that result, like SELECT Region FROM coffee, it says "column Region not found"...even though it's clearly in the previous results for what's in the table. Does anyone have any advice? I'm hesitant to start the practical if I can't even get this to work
I've just tried my first attempt and just can't see what is wrong even with the hints, made some changes but I think something might be still off.
I don't mind failing the exam to take it again but I just want to learn from my mistakes here since I spent quite a while doing this.
UPDATE: the guy in the comments helped me out and i passed, do take a look if you're struggling to complete it
ps.!! i couldn't update it on git so just take note of my error in TASK 3 where the column description should not be renamed to service_description
https://github.com/christyleeyx/sql-associate-cert/blob/main/notebook%20(3).ipynb
So I'm going through the study guide for DS101 and it gives quizzes to take to test my knowledge but I'm just curious which classes cover which bullet point. I've filled in everything I know but I was wondering if anyone else had info. (I've done all the courses in the DS track just making sure I got enough practice in each of these to ensure I pass) Please correct me if I'm wrong about any of these
Calculate metrics to effectively report characteristics of data and relationships between features
● Calculate measures of center (e.g. mean, median, mode) for variables using R or Python. Introduction to Statistics in R
● Calculate measures of spread (e.g. range, standard deviation, variance) for variables using R or Introduction to Statistics in R
● Calculate skewness for variables using R or Python. INTRO TO STATS/UNSURE??
● Calculate missingness for variables and explain its influence on reporting characteristics of data and relationships in R or Python. INTRO TO STATS/UNSURE
● Calculate the correlation between variables using R or Python.
1.2 Create data visualizations in coding language to demonstrate the characteristics of data
● Create and customize bar charts using R or Python. INTRO DATA VIZ GGPLOT2
● Create and customize box plots using R or Python. INTRO TIDYVERSE
● Create and customize line graphs using R or Python. INTRO DATA VIZ GGPLOT2
● Create and customize histograms graph using R or Python. INTRO DATA VIZ GGPLOT2
1.3 Create data visualizations in coding language to represent the relationships between features
● Create and customize scatterplots using R or Python. INTRO DATA VIZ WITH GGPLOT2
● Create and customize heatmaps using R or Python. INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2
● Create and customize pivot tables using R or Python. UNSURE
1.4 Identify and reduce the impact of characteristics of data
● Identify when imputation methods should be used and implement them to reduce the impact of missing data on analysis or modeling using R or Python. DATA MANIPULATION WITH R
● Describe when a transformation to a variable is required and implement corresponding transformations using R or Python. DATA MANIPULATION WITH R
● Describe the differences between types of missingness and identify relevant approaches to handling types of missingness. DATA MANIPULATION WITH R / UNSURE
● Identify and handle outliers using R or Python. DATA MANIPULATION WITH R / UNSURE
2.1 Describe statistical concepts that underpin hypothesis testing and experimentation
● Define different statistical distributions (e.g. binomial, normal, Poisson, t-distribution, chi-square, and F-distribution, etc. ). Introduction to Statistics in R
● Explain the statistical concepts in hypothesis testing (e.g. null hypothesis, alternative hypothesis, one-tailed and two-tailed hypothesis tests, etc. ). HYPOTHESIS TESTING IN R
● Explain the statistical concepts in the experimental design (e.g. control group, randomization, confounding variables, etc. ). Introduction to Statistics in R
● Explain parameter estimation and confidence intervals. SAMPLING IN R / HYPOTHESIS TESTING IN R
2.2 Apply sampling methods to data
● Distinguish between different types of random sampling techniques and apply the methods using R or Python SAMPLING IN R
● Sample data from a statistical distribution (e.g. normal, binomial, Poisson, exponential, etc.) using R or Python SAMPLING IN R
● Calculate a probability from a statistical distribution (e.g. normal, binomial, Poisson, exponential, etc.) using R or Python SAMPLING IN R
2.3 Implement methods for performing statistical tests HYPOTHESIS TESTING IN R
● Run statistical tests (e.g. t-test, ANOVA test, chi-square test) using R or Python HYPOTHESIS TESTING IN R
● Analyze the results of statistical tests from R or Python HYPOTHESIS TESTING IN R
I'm starting to think there is something wrong with this data set. TASK 2 seems to be problematic. I'd appreciate your enlightenment.
Clean categorical and text data by manipulating strings not passed.Source Link
Edit: It was a funny experience for me. First of all, I need to state that my first problem is with my use of Markdown. We had the chance to choose the database connection of the cells as Dataframe or Query. I didn't know that I could access the frame I created in a different cell on the page by selecting Query. In my first attempt, I tried to create a temporary table and this made me make unnecessary mistakes. As a result, I had the opportunity to realise a different approach to task2 thanks to your comments. You can find the details in the successful source code.
Has anyone done the Data Associate practical exam lately? I was expecting one similar to what I encountered when I tried the Professional version (indeed that's even what they have in the practical hub: https://app.datacamp.com/workspace/w/396b8323-75d7-4715-b390-fa43e386fb3c), but instead I got a workbook with instructions for 4 tasks and all the answers had to be in a single code cell. Upon submitting it is all run and auto checked.
Well this sucks so much because, even though I'm sure my code cell was right, I got one of them wrong in the two attempts provided. So there it goes, all of your effort so far and you can't pass because your output doesn't match exactly theirs. There's no feedback at all so I can't know what I got wrong. Hell, how do I even know I'm the one wrong, as far as I know they haven't even updated the resources section to show this new kind of exam, and they get many big mistakes in the courses' exercises also, so they could be getting something wrong here.
I'm so frustrated with this lack of help. Last time I failed the professional version of the exam because I had a 'technical issue' with the grading.... and that's all I ever knew and couldn't even get a result. Does anyone who work at Datacamp can help?? What do you guys think? Does this seem like a fair process to you?
How many questions are on the times exam for datacamps Associate Data Analyst timed exam and are the questions the same on both attempts??? Currently freaking out and I would like an answer please 🙏? Also I mean the exam tailored for Postgres sql. Also I do mean this in the politest way possible and I appreciate any help provided.
Hello! I'm a college student trying to find a career in Data Science / Machine Learning. I've submitted my work on the Data Scientist Professional Practical Exam here:
https://www.datacamp.com/datalab/w/16f1599a-2f3d-4ffc-9dbb-02046b471ada
And I really want people to evaluate/point out my strengths and weaknesses. It's a good thing that I can learn from other learners what Im good at and what field or concept I should review. My presentation can be found in my Github repo:
https://github.com/miniloda/DataCamp-DataScience-Exam
Thank you so much
Hi I just passed my exam and wanted to give you some feedback based on my experience. I think your project is great and good job on scaling and your explanation about considering and using different types of models in your project. A few things I did different: 1. I focused on precision as the main metric as it is more closely related to the company’s objective. 2. I think your model is very influenced by the random state you selected. Did you experiment with different values? I don’t think it’s necessary in this case. 3. Great job using grid search cv. I’m not an expert on this and actually thought that the explanation in the course wasn’t enough so I had to dig deeper into it using other material to complete my project.
Okay, I'm about to take the exam in the coming week, so I don't think I have the experience to evaluate your work, but I must say that it looks quite neat, although I feel some, if not all, of your visualizations, would have done better on a lighter background. I wish you all the best though and hope you pass!
Meanwhile, would you happen to have any tips for me before my attempt, and what tools did you use to put up your presentation?
Any help at all will be appreciated.
I am waiting for your ideas for my failed Practical Exam DA601P experiment, which I tried to create using Python.You can access the failed source code and project instructions for the project by clicking on them.As you can verify the failed part from the photo, I am stuck in the Data validation part.
Screen shot showing the error in the data validation phase.
I am open to any comments that will make me realise what I have missed.
Thank you for your attention.
Here is my attempt:
https://www.datacamp.com/datalab/w/63ba6f0d-09fc-4774-b2b6-e0545b4d969b/edit?emitCellOutputs=false&reducedMenuBar=true&showExploreMore=false&showLeftNavigation=false&showNavBar=false&showPublicationButton=false&showOnlyRelevantSampleIntegrationIds[]=89e17161-a224-4a8a-846b-0adc0fe7a4b1&showOnlyRelevantSampleIntegrationIds[]=e0c96696-ae0a-46fb-b6f9-1a43eb428ecb&showOnlyRelevantSampleIntegrationIds[]=b1fcb109-b4fe-4543-bc98-681df8c4dc6e&showOnlyRelevantSampleIntegrationIds[]=fcf37a0e-f8bd-4c85-95a5-201d3eebea48&showOnlyRelevantSampleIntegrationIds[]=db697c09-0402-4a02-b327-26018dc2ecce&showOnlyRelevantSampleIntegrationIds[]=7569175e-98be-4c89-9873-c20f699a9cc7&fetchUnlistedSampleIntegrationIds[]=7569175e-98be-4c89-9873-c20f699a9cc7#538ffb3d-4008-49b6-9876-7831e025f5a4
and these are the task I failed at:
I just have one attempt left
Hi there,
I looked a lot if the question was already answered somewhere but I didnt find anything.
Right now Iam preparing for the DSA Practical Exam and somehow, I have a really hard time with the sample exam.
Practical Exam: Supermarket Loyalty
International Essentials is an international supermarket chain.
Shoppers at their supermarkets can sign up for a loyalty program that provides rewards each year to customers based on their spending. The more you spend the bigger the rewards.
The supermarket would like to be able to predict the likely amount customers in the program will spend, so they can estimate the cost of the rewards.
This will help them to predict the likely profit at the end of the year.
## Data
The dataset contains records of customers for their last full year of the loyalty program.
So my main problem is I think in understanding the tasks correctly. For Task 2:
Task 2
The team at International Essentials have told you that they have always believed that the number of years in the loyalty scheme is the biggest driver of spend.
Producing a table showing the difference in the average spend by number of years in the loyalty programme along with the variance to investigate this question for the team.
-
You should start with the data in the file 'loyalty.csv'.
-
Your output should be a data frame named
spend_by_years. -
It should include the three columns
loyalty_years,avg_spend,var_spend. -
Your answers should be rounded to 2 decimal places.
This is my code:spend_by_years = clean_data.groupby("loyalty_years", as_index=False).agg( avg_spend=("spend", lambda x: round(x.mean(), 2)),var_spend=("spend", lambda x: round(x.var(), 2)) )print(spend_by_years)
This is my result:loyalty_years avg_spend var_spend0 0-1 110.56 9.301 1-3 129.31 9.652 3-5 124.55 11.093 5-10 135.15 14.104 10+ 117.41 16.72
But the auto evaluation says that : Task 2: Aggregate numeric, categorical variables and dates by groups. is failing, I dont understand why?
Iam also a bit confused they provide a train.csv and test.csv separately, as all the conversions and data cleaning steps have to be done again?
As you can see, Iam confused and need help :D
EDIT: So apparently, converting and creating a order for loyalty years, was not necessary, as not doing that, passes the valuation.
Now Iam stuck at the tasks 3 and 4,
Task 3
Fit a baseline model to predict the spend over the year for each customer.
-
Fit your model using the data contained in “train.csv”
-
Use “test.csv” to predict new values based on your model. You must return a dataframe named
base_result, that includescustomer_idandspend. Thespendcolumn must be your predicted values. Task 3 Fit a baseline model to predict the spend over the year for each customer. Fit your model using the data contained in “train.csv” Use “test.csv” to predict new values based on your model. You must return a dataframe named base_result, that includes customer_id and spend. The spend column must be your predicted values.
Task 4
Fit a comparison model to predict the spend over the year for each customer.
-
Fit your model using the data contained in “train.csv”
-
Use “test.csv” to predict new values based on your model. You must return a dataframe named
compare_result, that includescustomer_idandspend. Thespendcolumn must be your predicted values.Task 4 Fit a comparison model to predict the spend over the year for each customer. Fit your model using the data contained in “train.csv” Use “test.csv” to predict new values based on your model. You must return a dataframe named compare_result, that includes customer_id and spend. The spend column must be your predicted values.
I already setup two pipelines with model fitting, one with linear regression, the other with random forest. Iam under the demanded RMSE threshold.
Maybe someone else did this already and ran into the same problem and solved it already?
Thank you for your answer,
Yes i dropped those.
I think i got the structure now but the script still not passes and i have no idea left what to do. tried several types of regression but without the data to test against i dont know what to do anymore.
I also did Gridsearches to find optimal parameters, those are the once I used for the modeling
here my code so far:
from sklearn.linear_model import Ridge, Lasso
from sklearn.preprocessing import StandardScaler
# Load training & test data
df_train = pd.read_csv('train.csv')
df_test = pd.read_csv("test.csv")
customer_ids_test = df_test['customer_id']
# Cleaning and dropping for train/test
df_train.drop(columns='customer_id', inplace=True)
df_train_encoded = pd.get_dummies(df_train, columns=['region', 'joining_month', 'promotion'], drop_first=True)
df_test_encoded = pd.get_dummies(df_test, columns=['region', 'joining_month', 'promotion'], drop_first=True)
# Ordinal for loyalty
loyalty_order = CategoricalDtype(categories=['0-1', '1-3', '3-5', '5-10', '10+'], ordered=True)
df_train_encoded['loyalty_years'] = df_train_encoded['loyalty_years'].astype(loyalty_order).cat.codes
df_test_encoded['loyalty_years'] = df_test_encoded['loyalty_years'].astype(loyalty_order).cat.codes
# Preparation
y_train = df_train_encoded['spend']
X_train = df_train_encoded.drop(columns=['spend'])
X_test = df_test_encoded.drop(columns=['customer_id'])
# Scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Prediction
model=Ridge(alpha=0.4)
model.fit(X_train_scaled, y_train)
y_pred = model.predict(X_test_scaled)
# Result
base_result = pd.DataFrame({
'customer_id': customer_ids_test,
'spend': y_pred
})
base_result
Task4:
# Model
lasso = Lasso(alpha=1.5)
lasso.fit(X_train_scaled, y_train)
# Prediction
y_pred_lasso = lasso.predict(X_test_scaled)
# Result
compare_result = pd.DataFrame({
'customer_id': customer_ids_test,
'spend': y_pred_lasso
})
compare_result
I was able to pass the practical exam on my second attempt. I had the FoodYum dataset. If anyone has any questions I will do my best to help out.
Can someone anyone, who has completed task 1 and task 2 of the sql practical exam please provide the answers in full. Ive gotten task 3 and 4 on the first try but after 4 attempts at the first 2 nothing worked. Im going to re register again in 14 days, but I am almost confident what I did was correct but I am wrong, so Id like someone to provide the correct answers. What are the answers please. Again I dont have access to the exam so I cannot provide more info anymore. Just so confused on what I did wrong.
Task 1 Before you can start any analysis, you need to confirm that the data is accurate and reflects what you expect to see. It is known that there are some issues with the branch table, and the data team have provided the following data description. Write a query to return data matching this description, including identifying and cleaning all invalid values. You must match all column names and description criteria. Your output should be a DataFrame named 'clean_branch_data'.
Task 2 The Head of Operations wants to know whether there is a difference in time taken to respond to a customer request in each hotel. They already know that different services take different lengths of time. Calculate the average and maximum duration for each branch and service. Your output should be a DataFrame named 'average_time_service' It should include the columns service_id, branch_id, avg_time_taken and max_time_taken Values should be rounded to two decimal places where appropriate
Hi guys,
I just passed this Exam today. Here are my suggestions that may help you with the Coffee dataset:
- Missed value rows of the Brand column are '-' not Null;
- Weight column needs to convert all of rows to Real type with 2 decimal and without 'grams';
- Stock location should be converted UPPER.
You can add more questions, I will try to answer. Good luck guys.