🌐
DataCamp
support.datacamp.com › hc › en-us › articles › 19267574171927-SQL-Associate
SQL Associate – Support | DataCamp
September 2, 2024 - DataCamp's SQL Associate Certification is ideal for anyone who needs to use SQL in their role, regardless of whether they are in a data team or not. The Certification will be awarded to individuals who successfully complete one timed exam (SQ101) and one practical exam (SQ501P).
🌐
Reddit
reddit.com › r/datacamp › data analyst associate practical exam tips
r/DataCamp on Reddit: Data Analyst Associate Practical Exam Tips
November 2, 2023 -

Hi! I completed the timed exam from the certification and was just wondering if anyone knows the formal of the practical exam?

i skimmed the subreddit and saw a lot of practical exams include a presentation and video recording of it. is that the case with this practical too? i looked at the sample exam and it doesn’t seem like it but just want to be sure.

if anyone can provide a description of what the practical exam entails that would be greatly appreciated. Thank you!

🌐
Reddit
reddit.com › r/datacamp › timed exams
r/DataCamp on Reddit: Timed exams
November 9, 2023 -

I tried to start my first timed exam (DS101) and the first thing it told me was that I need to enable a webcam and screen sharing. Is the screen sharing done to make sure you’re not referencing any material? Or is it to make sure you’re not referencing any material that’s blatantly cheating (like feeding all your questions into ChatGPT)? I’m wondering if I’m allowed to look at official documentation for Python modules if I forget if a parameter name is “column” or “columns” or “col” or “usecols” etc. After all, Python users in the professional world have to look things up all the time. Nobody has every module, function, attribute, etc memorized.

🌐
Reddit
reddit.com › r/datacamp › data analyst associate certification practical exam study example issues
r/DataCamp on Reddit: Data Analyst Associate Certification Practical Exam Study Example Issues
January 18, 2024 -

Hi all! I finished the timed exam and am now on to the practical. When attempting to try to the practice practical exam, it will allow me to pull up everything with a query like SELECT * FROM coffee, but the second I try to get specific with a column shown in that result, like SELECT Region FROM coffee, it says "column Region not found"...even though it's clearly in the previous results for what's in the table. Does anyone have any advice? I'm hesitant to start the practical if I can't even get this to work

🌐
Reddit
reddit.com › r/datacamp › sql associate practical exam help
r/DataCamp on Reddit: SQL Associate Practical Exam help
February 21, 2024 -

I've just tried my first attempt and just can't see what is wrong even with the hints, made some changes but I think something might be still off.

I don't mind failing the exam to take it again but I just want to learn from my mistakes here since I spent quite a while doing this.

UPDATE: the guy in the comments helped me out and i passed, do take a look if you're struggling to complete it

ps.!! i couldn't update it on git so just take note of my error in TASK 3 where the column description should not be renamed to service_description

https://github.com/christyleeyx/sql-associate-cert/blob/main/notebook%20(3).ipynb

🌐
Reddit
reddit.com › r/datacamp › study guide
r/DataCamp on Reddit: Study Guide
February 8, 2023 -

So I'm going through the study guide for DS101 and it gives quizzes to take to test my knowledge but I'm just curious which classes cover which bullet point. I've filled in everything I know but I was wondering if anyone else had info. (I've done all the courses in the DS track just making sure I got enough practice in each of these to ensure I pass) Please correct me if I'm wrong about any of these

Calculate metrics to effectively report characteristics of data and relationships between features

 ● Calculate measures of center (e.g. mean, median, mode) for variables using R or Python. Introduction to Statistics in R

 ● Calculate measures of spread (e.g. range, standard deviation, variance) for variables using R or Introduction to Statistics in R

● Calculate skewness for variables using R or Python. INTRO TO STATS/UNSURE??

● Calculate missingness for variables and explain its influence on reporting characteristics of data and relationships in R or Python. INTRO TO STATS/UNSURE

● Calculate the correlation between variables using R or Python.

 1.2 Create data visualizations in coding language to demonstrate the characteristics of data

 ● Create and customize bar charts using R or Python. INTRO DATA VIZ GGPLOT2

● Create and customize box plots using R or Python. INTRO TIDYVERSE

● Create and customize line graphs using R or Python. INTRO DATA VIZ GGPLOT2

● Create and customize histograms graph using R or Python. INTRO DATA VIZ GGPLOT2

 1.3 Create data visualizations in coding language to represent the relationships between features

 ● Create and customize scatterplots using R or Python. INTRO DATA VIZ WITH GGPLOT2

● Create and customize heatmaps using R or Python. INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

● Create and customize pivot tables using R or Python. UNSURE

1.4 Identify and reduce the impact of characteristics of data

● Identify when imputation methods should be used and implement them to reduce the impact of missing data on analysis or modeling using R or Python. DATA MANIPULATION WITH R

 ● Describe when a transformation to a variable is required and implement corresponding transformations using R or Python. DATA MANIPULATION WITH R

 ● Describe the differences between types of missingness and identify relevant approaches to handling types of missingness. DATA MANIPULATION WITH R / UNSURE

● Identify and handle outliers using R or Python. DATA MANIPULATION WITH R / UNSURE

2.1 Describe statistical concepts that underpin hypothesis testing and experimentation

● Define different statistical distributions (e.g. binomial, normal, Poisson, t-distribution, chi-square, and F-distribution, etc. ). Introduction to Statistics in R

● Explain the statistical concepts in hypothesis testing (e.g. null hypothesis, alternative hypothesis, one-tailed and two-tailed hypothesis tests, etc. ). HYPOTHESIS TESTING IN R

 ● Explain the statistical concepts in the experimental design (e.g. control group, randomization, confounding variables, etc. ). Introduction to Statistics in R

● Explain parameter estimation and confidence intervals. SAMPLING IN R / HYPOTHESIS TESTING IN R

2.2 Apply sampling methods to data

● Distinguish between different types of random sampling techniques and apply the methods using R or Python SAMPLING IN R

● Sample data from a statistical distribution (e.g. normal, binomial, Poisson, exponential, etc.) using R or Python SAMPLING IN R

● Calculate a probability from a statistical distribution (e.g. normal, binomial, Poisson, exponential, etc.) using R or Python SAMPLING IN R

2.3 Implement methods for performing statistical tests HYPOTHESIS TESTING IN R

● Run statistical tests (e.g. t-test, ANOVA test, chi-square test) using R or Python HYPOTHESIS TESTING IN R

● Analyze the results of statistical tests from R or Python HYPOTHESIS TESTING IN R

🌐
Reddit
reddit.com › r/datacamp › data analyst associate practical exam da501p
r/DataCamp on Reddit: Data Analyst Associate Practical Exam DA501P
December 5, 2023 -

I'm starting to think there is something wrong with this data set. TASK 2 seems to be problematic. I'd appreciate your enlightenment.

Clean categorical and text data by manipulating strings not passed.

Source Link

Edit: It was a funny experience for me. First of all, I need to state that my first problem is with my use of Markdown. We had the chance to choose the database connection of the cells as Dataframe or Query. I didn't know that I could access the frame I created in a different cell on the page by selecting Query. In my first attempt, I tried to create a temporary table and this made me make unnecessary mistakes. As a result, I had the opportunity to realise a different approach to task2 thanks to your comments. You can find the details in the successful source code.

Top answer
1 of 4
4
Hey man, we share the same pain, but a little difference on my case: I managed to produce the results but I could not understand why only "Convert values between data types" was not met. I guess it is something with the final data types of weight and price, but I tried casting both REAL and NUMERIC types for weight, yet it did not work. I put my failed code here if you want to have a look. 2. Feedback on your code (at the time I look at it): Based on you failing the 3rd requirement, I think you failed to convert stock_location to uppercase. A simple UPPER function would help, rather than CASE WHEN which is complicated and you may miss/mess up some conditions. 3. Questions: Does the "median" part really matter here? I don't think so. There is no null values in weight and price, so it seems like we don't even need to think about it. I think your code proves it, as you don't write anything to change the null values to median (PERCENTILE_CONT). The final type of each column? As I failed this requirement, I don't know what is expected here and why I failed. I guess by casting type back and forth, I made the value type in the columns complicated and inconsistent somehow. 4. Based on all the above, I come to this conclusion: We don't need to write redundant code although we are required to. So we just do nothing for columns where null values do not exist or where values need no change -> just select them. This aligns with the guideline: Only the final output will be graded. 5. Let's sum up of what should be done: Missing values of the brand column are '-', not Null -> should be changed to 'Unknown' weight values needs to be a number rounded to 2 decimal places and without 'grams' stock_location should be converted to uppercase ('a' -> 'A', etc.) I hope that my assumptions above are correct and help all of us 14 days later.
2 of 4
4
For those facing trouble with task 3 on this exam, it is meant to be done on the products table, not the corrected query. As someone who teaches coding, I would explain this project is in fact not something that ywould be done in a production environment (as you would use clean_data to get the accurate numbers and/or update the table / create a new one after correcting the data errors), but rather a test of your skills. I just passed this with help from some threads on here. I had preivously failed it, went back and practiced case and coalesce stuff, looked at ways to address the data issues, of which there are reasonable solutions in the answer provided in this thread. The one problem in the 'solution' by the OP is that they utilize clean_data in task 3. If you look closely, though, the output doesn't match what you'd get with running the same query on the products table - so use the products table (even though that wouldn't make sense in a production environment / real world. This is testing your skills on the products dataset, do not see each task as tied together at all - which is largely contrary to how coding in a production environment works. I hope that helps some folks break through the wall! Also, there are multiple different ways to do task 2, the only thing that matters is that your outputs are correct! Look at the sample outputs in the provided solution by OP and you'll get an idea of what it should look like.
🌐
DataCamp
datacamp.com › blog › introducing-new-sql-associate-certification
Introducing a brand new SQL associate certification for data-driven professionals. | DataCamp
December 6, 2023 - SQ101 is a 60-minute exam that assesses your proficiency in data management theory, data management in SQL, and exploratory analysis in SQL.
🌐
Reddit
reddit.com › r/datacamp › bullshit practical exam
r/DataCamp on Reddit: Bullshit practical exam
October 25, 2023 -

Has anyone done the Data Associate practical exam lately? I was expecting one similar to what I encountered when I tried the Professional version (indeed that's even what they have in the practical hub: https://app.datacamp.com/workspace/w/396b8323-75d7-4715-b390-fa43e386fb3c), but instead I got a workbook with instructions for 4 tasks and all the answers had to be in a single code cell. Upon submitting it is all run and auto checked.

Well this sucks so much because, even though I'm sure my code cell was right, I got one of them wrong in the two attempts provided. So there it goes, all of your effort so far and you can't pass because your output doesn't match exactly theirs. There's no feedback at all so I can't know what I got wrong. Hell, how do I even know I'm the one wrong, as far as I know they haven't even updated the resources section to show this new kind of exam, and they get many big mistakes in the courses' exercises also, so they could be getting something wrong here.

I'm so frustrated with this lack of help. Last time I failed the professional version of the exam because I had a 'technical issue' with the grading.... and that's all I ever knew and couldn't even get a result. Does anyone who work at Datacamp can help?? What do you guys think? Does this seem like a fair process to you?

Find elsewhere
🌐
Reddit
reddit.com › r/datacamp › answer’s please
r/DataCamp on Reddit: Answer’s please
March 22, 2024 -

How many questions are on the times exam for datacamps Associate Data Analyst timed exam and are the questions the same on both attempts??? Currently freaking out and I would like an answer please 🙏? Also I mean the exam tailored for Postgres sql. Also I do mean this in the politest way possible and I appreciate any help provided.

🌐
Reddit
reddit.com › r/datacamp › evaluate my data science practical examination attempt
r/DataCamp on Reddit: Evaluate my Data Science Practical Examination Attempt
August 8, 2024 -

Hello! I'm a college student trying to find a career in Data Science / Machine Learning. I've submitted my work on the Data Scientist Professional Practical Exam here:

https://www.datacamp.com/datalab/w/16f1599a-2f3d-4ffc-9dbb-02046b471ada

And I really want people to evaluate/point out my strengths and weaknesses. It's a good thing that I can learn from other learners what Im good at and what field or concept I should review. My presentation can be found in my Github repo:
https://github.com/miniloda/DataCamp-DataScience-Exam

Thank you so much

🌐
DEV Community
dev.to › itsjjpowell › passing-the-datacamp-sql-associate-certificate-jcc
Passing The DataCamp SQL Associate Certificate - DEV Community
April 10, 2024 - If you're familiar with the terms, or have used them at least once, you'll be able to answer most questions. Joins: LEFT JOIN, FULL JOIN, INNER JOIN, UNION, UNION ALL · String Manipulation: UPPER, LOWER, LIKE, ILIKE, ~, REGEX, TRIM, LEFT, RIGHT ... The more general database questions may require taking the DataCamp courses because they seem to expect you to use the same language that instructors use in those courses. The timed practical exam is a mutli-part question where you write queries for some provided tables.
🌐
Reddit
reddit.com › r/datacamp › practical exam da601p
r/DataCamp on Reddit: Practical Exam DA601P
December 10, 2023 -

I am waiting for your ideas for my failed Practical Exam DA601P experiment, which I tried to create using Python.You can access the failed source code and project instructions for the project by clicking on them.As you can verify the failed part from the photo, I am stuck in the Data validation part.

Screen shot showing the error in the data validation phase.

I am open to any comments that will make me realise what I have missed.
Thank you for your attention.

🌐
DataCamp
support.datacamp.com › hc › en-us › articles › 7926454692631-Data-Scientist-Associate
Data Scientist Associate – Support | DataCamp
July 25, 2025 - The DS101 is a 2-hour exam where you'll choose either R or Python to work through the questions. Most candidates typically complete it in 45 minutes. This exam assesses your abilities to use R or Python for exploratory analysis and statistical experimentation.
🌐
Reddit
reddit.com › r/datacamp › i started attempting the data scientist associate exam and failed at many of the tasks, although it seems to me that i have correctly attempted all the questions.
r/DataCamp on Reddit: I started attempting the Data scientist associate exam and failed at many of the tasks, although it seems to me that I have correctly attempted all the questions.
August 10, 2024 -

Here is my attempt:

https://www.datacamp.com/datalab/w/63ba6f0d-09fc-4774-b2b6-e0545b4d969b/edit?emitCellOutputs=false&reducedMenuBar=true&showExploreMore=false&showLeftNavigation=false&showNavBar=false&showPublicationButton=false&showOnlyRelevantSampleIntegrationIds[]=89e17161-a224-4a8a-846b-0adc0fe7a4b1&showOnlyRelevantSampleIntegrationIds[]=e0c96696-ae0a-46fb-b6f9-1a43eb428ecb&showOnlyRelevantSampleIntegrationIds[]=b1fcb109-b4fe-4543-bc98-681df8c4dc6e&showOnlyRelevantSampleIntegrationIds[]=fcf37a0e-f8bd-4c85-95a5-201d3eebea48&showOnlyRelevantSampleIntegrationIds[]=db697c09-0402-4a02-b327-26018dc2ecce&showOnlyRelevantSampleIntegrationIds[]=7569175e-98be-4c89-9873-c20f699a9cc7&fetchUnlistedSampleIntegrationIds[]=7569175e-98be-4c89-9873-c20f699a9cc7#538ffb3d-4008-49b6-9876-7831e025f5a4

and these are the task I failed at:

I just have one attempt left

🌐
Reddit
reddit.com › r/datacamp › sample exam data scientist associate practical
r/DataCamp on Reddit: SAMPLE EXAM Data Scientist Associate Practical
February 20, 2025 -

Hi there,

I looked a lot if the question was already answered somewhere but I didnt find anything.

Right now Iam preparing for the DSA Practical Exam and somehow, I have a really hard time with the sample exam.

Practical Exam: Supermarket Loyalty

International Essentials is an international supermarket chain.

Shoppers at their supermarkets can sign up for a loyalty program that provides rewards each year to customers based on their spending. The more you spend the bigger the rewards.

The supermarket would like to be able to predict the likely amount customers in the program will spend, so they can estimate the cost of the rewards.

This will help them to predict the likely profit at the end of the year.

## Data

The dataset contains records of customers for their last full year of the loyalty program.

So my main problem is I think in understanding the tasks correctly. For Task 2:

Task 2

The team at International Essentials have told you that they have always believed that the number of years in the loyalty scheme is the biggest driver of spend.

Producing a table showing the difference in the average spend by number of years in the loyalty programme along with the variance to investigate this question for the team.

  • You should start with the data in the file 'loyalty.csv'.

  • Your output should be a data frame named spend_by_years.

  • It should include the three columns loyalty_years, avg_spend, var_spend.

  • Your answers should be rounded to 2 decimal places.

This is my code:
spend_by_years = clean_data.groupby("loyalty_years", as_index=False).agg( avg_spend=("spend", lambda x: round(x.mean(), 2)),
var_spend=("spend", lambda x: round(x.var(), 2)) )
print(spend_by_years)

This is my result:
loyalty_years avg_spend var_spend
0 0-1 110.56 9.30
1 1-3 129.31 9.65
2 3-5 124.55 11.09
3 5-10 135.15 14.10
4 10+ 117.41 16.72

But the auto evaluation says that : Task 2: Aggregate numeric, categorical variables and dates by groups. is failing, I dont understand why?

Iam also a bit confused they provide a train.csv and test.csv separately, as all the conversions and data cleaning steps have to be done again?

As you can see, Iam confused and need help :D

EDIT: So apparently, converting and creating a order for loyalty years, was not necessary, as not doing that, passes the valuation.

Now Iam stuck at the tasks 3 and 4,

Task 3

Fit a baseline model to predict the spend over the year for each customer.

  1. Fit your model using the data contained in “train.csv”

  2. Use “test.csv” to predict new values based on your model. You must return a dataframe named base_result, that includes customer_id and spend. The spend column must be your predicted values. Task 3 Fit a baseline model to predict the spend over the year for each customer. Fit your model using the data contained in “train.csv” Use “test.csv” to predict new values based on your model. You must return a dataframe named base_result, that includes customer_id and spend. The spend column must be your predicted values.

Task 4

Fit a comparison model to predict the spend over the year for each customer.

  1. Fit your model using the data contained in “train.csv”

  2. Use “test.csv” to predict new values based on your model. You must return a dataframe named compare_result, that includes customer_id and spend. The spend column must be your predicted values.Task 4 Fit a comparison model to predict the spend over the year for each customer. Fit your model using the data contained in “train.csv” Use “test.csv” to predict new values based on your model. You must return a dataframe named compare_result, that includes customer_id and spend. The spend column must be your predicted values.

I already setup two pipelines with model fitting, one with linear regression, the other with random forest. Iam under the demanded RMSE threshold.

Maybe someone else did this already and ran into the same problem and solved it already?

Thank you for your answer,

Yes i dropped those.
I think i got the structure now but the script still not passes and i have no idea left what to do. tried several types of regression but without the data to test against i dont know what to do anymore.

I also did Gridsearches to find optimal parameters, those are the once I used for the modeling

here my code so far:

from sklearn.linear_model import Ridge, Lasso

from sklearn.preprocessing import StandardScaler

# Load training & test data

df_train = pd.read_csv('train.csv')

df_test = pd.read_csv("test.csv")

customer_ids_test = df_test['customer_id']

# Cleaning and dropping for train/test

df_train.drop(columns='customer_id', inplace=True)

df_train_encoded = pd.get_dummies(df_train, columns=['region', 'joining_month', 'promotion'], drop_first=True)

df_test_encoded = pd.get_dummies(df_test, columns=['region', 'joining_month', 'promotion'], drop_first=True)

# Ordinal for loyalty

loyalty_order = CategoricalDtype(categories=['0-1', '1-3', '3-5', '5-10', '10+'], ordered=True)

df_train_encoded['loyalty_years'] = df_train_encoded['loyalty_years'].astype(loyalty_order).cat.codes

df_test_encoded['loyalty_years'] = df_test_encoded['loyalty_years'].astype(loyalty_order).cat.codes

# Preparation

y_train = df_train_encoded['spend']

X_train = df_train_encoded.drop(columns=['spend'])

X_test = df_test_encoded.drop(columns=['customer_id'])

# Scaling

scaler = StandardScaler()

X_train_scaled = scaler.fit_transform(X_train)

X_test_scaled = scaler.transform(X_test)

# Prediction

model=Ridge(alpha=0.4)

model.fit(X_train_scaled, y_train)

y_pred = model.predict(X_test_scaled)

# Result

base_result = pd.DataFrame({

'customer_id': customer_ids_test,

'spend': y_pred

})

base_result

Task4:

# Model

lasso = Lasso(alpha=1.5)

lasso.fit(X_train_scaled, y_train)

# Prediction

y_pred_lasso = lasso.predict(X_test_scaled)

# Result

compare_result = pd.DataFrame({

'customer_id': customer_ids_test,

'spend': y_pred_lasso

})

compare_result

🌐
DataCamp
datacamp.com › certification › sql-associate
SQL Certification Course, SQL Associate Certificate | DataCamp
DataCamp Certification is included with your premium subscription, so there are no additional costs. When you subscribe you will have access to learning materials to prepare for your certification, practice assessments and exams and the certification itself.
🌐
Reddit
reddit.com › r/datacamp › sql practical exam answers
r/DataCamp on Reddit: SQL Practical Exam Answers
July 8, 2025 -

Can someone anyone, who has completed task 1 and task 2 of the sql practical exam please provide the answers in full. Ive gotten task 3 and 4 on the first try but after 4 attempts at the first 2 nothing worked. Im going to re register again in 14 days, but I am almost confident what I did was correct but I am wrong, so Id like someone to provide the correct answers. What are the answers please. Again I dont have access to the exam so I cannot provide more info anymore. Just so confused on what I did wrong.

Task 1 Before you can start any analysis, you need to confirm that the data is accurate and reflects what you expect to see. It is known that there are some issues with the branch table, and the data team have provided the following data description. Write a query to return data matching this description, including identifying and cleaning all invalid values. You must match all column names and description criteria. Your output should be a DataFrame named 'clean_branch_data'.

Task 2 The Head of Operations wants to know whether there is a difference in time taken to respond to a customer request in each hotel. They already know that different services take different lengths of time. Calculate the average and maximum duration for each branch and service. Your output should be a DataFrame named 'average_time_service' It should include the columns service_id, branch_id, avg_time_taken and max_time_taken Values should be rounded to two decimal places where appropriate