Write My Paper Button

WhatsApp Widget

ANL303 Data Analysis of Diabetes: Exploring K-Means and Apriori Models for Patient Profiling, GBA

Question 1

In this question, you will use the dataset “diabetes.csv” to learn more about diabetes. Table 1 describes the fields in the dataset, which contains 768 records. Each record is a medical record of a patient who has had several tests to determine whether they have diabetes.

The dataset was extracted from an internal healthcare organisation database. The technical staff who assisted in the data collection process shared that if the data cannot be captured successfully, 0 will be used. In short, for some fields, a value of 0 indicates that the values are not captured in the dataset.

Table 1. Description of the dataset

Field Description
Pregnancies Number of times pregnant
Glucose Plasma glucose concentration after 2 hours in an oral glucose tolerance test
BloodPressure Diastolic blood pressure (mm Hg)
SkinThickness Triceps skin fold thickness (mm)
Insulin 2-Hour serum insulin (mu U/ml)
BMI Body mass index (weight in kg/(height in m)^2)
DiabetesPedigreeFunction Diabetes pedigree function
Age Age (years)
Outcome 0 (non-diabetic) or 1 (diabetic)
Write My Assignment
Hire a Professional Essay & Assignment Writer for completing your Academic Assessments

Native Singapore Writers Team

  • 100% Plagiarism-Free Essay
  • Highest Satisfaction Rate
  • Free Revision
  • On-Time Delivery

You can attempt this question using IBM SPSS Modeler OR coding in Python.

  • If you use IBM SPSS Modeler, provide the necessary screenshot(s) when explaining your solutions.
  • If you use Python, provide the necessary code snippet(s) when explaining your solutions.

An additional 5 marks will be awarded to groups using Python. No additional marks will be awarded if both IBM SPSS Modeler and Python are used interchangeably.

(a) Assess the quality of the dataset. If needed, perform data cleaning. In less than 200 words, discuss how you identify the data quality issues and clean the data. Also, justify your choice of data cleaning method.

You are expected to use the cleaned dataset obtained from part (a) when attempting the subsequent parts of the question.
(15 marks)

(b) Determine the obesity level for each patient according to the following categories:

  • “Underweight” if the BMI is below 18.5
  • “Normal” if the BMI is 18.5 and above but below 25
  • “Overweight” if the BMI is between 25 and above but below 30
  • “Obese” if the BMI is 30 and above

Then, present one (1) graphical display that can answer the following:
Which obesity level has the highest number of diabetic patients?
(10 marks)

(c) Construct a K-Means model that can help you identify the profile of patients diagnosed with diabetes. In your answer, discuss the following:

  • How do you decide the input(s) and parameter(s) to be used in the model
  • How do you determine your model is the final best model
  • What are the profiles of the clusters
  • How do you identify the cluster to be the target cluster
  • Data preparation steps and post-model analysis, if any

(25 marks)

Buy Custom Answer of This Assessment & Raise Your Grades
Get A Free Quote

(d) Construct an Apriori model that can help you identify the profile of patients diagnosed with diabetes. In your answer, discuss the following:

  • How do you decide the input(s) and parameter(s) to be used in your model
  • How do you determine your model is the final best model
  • Report the total number of association rules obtained
  • Pick one interesting association rule and explain it in terms of support, rule support and confidence
  • Data preparation steps and post-model analysis, if any

(25 marks)

(e) Compare the models obtained from parts (c) and (d). In your opinion, which model is better for deployment? Explain briefly.

Then, based on the chosen model, propose a plan on how the model or result can be deployed to assess the likelihood of a person being diagnosed with diabetes.
(15 marks)

Another 10 marks are allocated for your writing.

(Up to 25 marks of penalties will be imposed for inappropriate or poor paraphrasing. For serious cases, they will be investigated by the examination department. More information on effective paraphrasing strategies can be found at https://academicguides.waldenu.edu/writingcenter/evidence/paraphrase/effective)

Your writing should be succinct but not at the expense of excluding relevant details. Highlight only the points that are relevant to your discussion. Use plain and simple language. Some questions may not come with absolutely right or wrong answers. For such questions, you have the liberty to express your views about the problem. However, your points have to be supported by evidence and good reasoning. It’s the quality and not the length that counts. Make sure you follow the report guidelines and style specified in this assignment.

The topics in the main report should be presented in the order according to the sequence of the tasks/questions listed in the assignment; that is, in the order of (a), (b), …, etc. You can have several sub-sections within a section if you deem it appropriate. To avoid a high Turnitin score, do not copy the assignment questions into the report.

The report must be self-contained. It is important to include all relevant tables and figures in the report as evidence to support the answers given.

Stuck with a lot of homework assignments and feeling stressed ?
Take professional academic assistance & Get 100% Plagiarism free papers
Get A Free Quote

The following are some details of the report format:

  • Length: should not exceed 10 pages (including the relevant graphs, tables, references, screenshots and appendices (if any), but excluding the cover page)
  • Font Style: Times New Roman
  • Font size: 12
  • Line spacing: 1.5
  • Margins: 1” for the top, bottom, right and left
  • Include the page number on each page

Some Further Suggestions:

  • Ensure minimal grammatical and typographical errors
  • Write clearly in plain English
  • Write appropriately to the context
  • Cite appropriate sources
  • Provide a reference or bibliography at the end of the main report
  • Include less relevant details in the Appendix
  • Good overall presentation of the report

The post ANL303 Data Analysis of Diabetes: Exploring K-Means and Apriori Models for Patient Profiling, GBA appeared first on Singapore Assignment Help.

ANL303 Data Analysis of Diabetes: Exploring K-Means and Apriori Models for Patient Profiling, GBA
Scroll to top

Get personalized expert assistance in any academic field

X