In this question, you will use the dataset “diabetes.csv” to learn more about diabetes. Table 1 describes the fields in the dataset, which contains 768 records. Each record is a medical record of a patient who has had several tests to determine whether they have diabetes.
The dataset was extracted from an internal healthcare organisation database. The technical staff who assisted in the data collection process shared that if the data cannot be captured successfully, 0 will be used. In short, for some fields, a value of 0 indicates that the values are not captured in the dataset.
Table 1. Description of the dataset
Field | Description |
---|---|
Pregnancies | Number of times pregnant |
Glucose | Plasma glucose concentration after 2 hours in an oral glucose tolerance test |
BloodPressure | Diastolic blood pressure (mm Hg) |
SkinThickness | Triceps skin fold thickness (mm) |
Insulin | 2-Hour serum insulin (mu U/ml) |
BMI | Body mass index (weight in kg/(height in m)^2) |
DiabetesPedigreeFunction | Diabetes pedigree function |
Age | Age (years) |
Outcome | 0 (non-diabetic) or 1 (diabetic) |
Native Singapore Writers Team
You can attempt this question using IBM SPSS Modeler OR coding in Python.
An additional 5 marks will be awarded to groups using Python. No additional marks will be awarded if both IBM SPSS Modeler and Python are used interchangeably.
(a) Assess the quality of the dataset. If needed, perform data cleaning. In less than 200 words, discuss how you identify the data quality issues and clean the data. Also, justify your choice of data cleaning method.
You are expected to use the cleaned dataset obtained from part (a) when attempting the subsequent parts of the question.
(15 marks)
(b) Determine the obesity level for each patient according to the following categories:
Then, present one (1) graphical display that can answer the following:
Which obesity level has the highest number of diabetic patients?
(10 marks)
(c) Construct a K-Means model that can help you identify the profile of patients diagnosed with diabetes. In your answer, discuss the following:
(25 marks)
(d) Construct an Apriori model that can help you identify the profile of patients diagnosed with diabetes. In your answer, discuss the following:
(25 marks)
(e) Compare the models obtained from parts (c) and (d). In your opinion, which model is better for deployment? Explain briefly.
Then, based on the chosen model, propose a plan on how the model or result can be deployed to assess the likelihood of a person being diagnosed with diabetes.
(15 marks)
Another 10 marks are allocated for your writing.
(Up to 25 marks of penalties will be imposed for inappropriate or poor paraphrasing. For serious cases, they will be investigated by the examination department. More information on effective paraphrasing strategies can be found at https://academicguides.waldenu.edu/writingcenter/evidence/paraphrase/effective)
Your writing should be succinct but not at the expense of excluding relevant details. Highlight only the points that are relevant to your discussion. Use plain and simple language. Some questions may not come with absolutely right or wrong answers. For such questions, you have the liberty to express your views about the problem. However, your points have to be supported by evidence and good reasoning. It’s the quality and not the length that counts. Make sure you follow the report guidelines and style specified in this assignment.
The topics in the main report should be presented in the order according to the sequence of the tasks/questions listed in the assignment; that is, in the order of (a), (b), …, etc. You can have several sub-sections within a section if you deem it appropriate. To avoid a high Turnitin score, do not copy the assignment questions into the report.
The report must be self-contained. It is important to include all relevant tables and figures in the report as evidence to support the answers given.
The post ANL303 Data Analysis of Diabetes: Exploring K-Means and Apriori Models for Patient Profiling, GBA appeared first on Singapore Assignment Help.