Write My Paper Button

WhatsApp Widget

Categories:

About
Data management is the structure and process by which you organize and manage your data and datasets. Often overlooked, data management is a key process to be aware of and implement for projects, small and large. There are at least seven features to be aware of related to data management: storing, sourcing, folders, files, version control, base dataset, and variable data.
Estimated Time
An estimated 90-120 minutes is needed to complete this activity.
Big Picture
Data management sounds complicated because it consists of several part. However, it is useful to see the forest for the trees. In other words, the forest seems vast and mighty, but when you look at individual trees, you can begin to appreciate the simplicity and complexity of both the forest and the trees.
The basic radial diagram helps us visualize the big picture. At the center is Data Management and radiating out from this are the seven features mentioned above: storing, sourcing, folders, files, version control, base dataset, and variable data. Keep this visualization in mind as we proceed in learning about each feature.
Figure 4‑1: Seven features of data management
Storing Data
Storing data is about where you are going to save the data you will be working with. You can store data in three places: 1) on your personal computer, 2) on an external thumb drive or hard drive, or 3) in the “cloud” via the Internet.
Figure 4‑2: Three options for storing data
Your personal computer is the logical place to store your data because it is your computer, and you use it regularly. One drawback to storing the data only on your computer is that if the computer fails, then all your data are likely lost, or very costly to retrieve.
The second place to store your data is on an external thumb drive or hard drive. Thumb drives are common these days, and not expensive, but you can lose since they are small devices. External hard drives are also available, but a bit costly.
The third place, which I am going to recommend storing your data, is in the cloud. The growth of cloud computing, such as Google Drive or DropBox, over the last 10 years is changing the nature of how we interact with our computers and data.
A drawback of the cloud is that you need an internet connection to retrieve the data. Thus, if your internet is spotty or you have lost power, then you will not be able to access your data. However, the advantage the cloud has to the other storage forms is that there is a backup of your data.
Sourcing Data
Sourcing data is about the source of where you find the data you want to analyze. There is a growing mountain of data sources. Recall from Chapter 2, I mentioned three specific data sources:
Inter-university Consortium for Political and Social Research (ICPSR)Links to an external site.
The Dataverse ProjectLinks to an external site.
PPIC Statewide Survey Data – 2020 – Public Policy Institute of CaliforniaLinks to an external site.
And there are other data sources as well. For example, consider the following:
Data.govLinks to an external site.
U.S. Census DataLinks to an external site.
California Open DataLinks to an external site.
GSS General Social Survey | NORCLinks to an external site.
ANES | American National Election StudiesLinks to an external site.
Cooperative Congressional Election StudyLinks to an external site.
San Diego County Data PortalLinks to an external site.
Up until this point, I have used data and datasets interchangeably. But, after introducing you to sourcing data, it is important to make a distinction between these terms. Data is a general term used to describe text (alpha, numeric, alphanumeric), images, audio, and video. All these can be considered data. However, a dataset is a meaningful collection of data organized by an individual or team. Datasets can be created by you, not created by you, or a combination of the two.
Figure 4‑3: Making a distinction between data and datasets
Non-academic example of using an existing dataset
For example, I attended UC Merced during 2005-2007 and again from 2012-2018. During this time, I met a fellow Bobcat named Michael Urner. Michael co-founded Tergis Technologies, “a company developing new medical devices to reduce the number of hospital-acquired infections.” [1] During a UC Merced Venture Lab presentation, Michael shared how he used Centers for Disease Control and Prevention’s National Vital Statistics SystemLinks to an external site. datasets to quantify the demand for his medical device. I thought this was a novel way of how a business entrepreneur can use an existing dataset.
Academic example of creating a new dataset and using an existing dataset
Another example, this time from my Ph.D. dissertation titled Judicial Pork: The Congressional Allocation of Districts, Seats, Meeting Places, and Courthouses to the U.S. District CourtsLinks to an external site.. And I collected data from the Federal Judicial Center | (fjc.gov)Links to an external site. to create a new dataset of federal court districts, seats, meeting places, and courthouses. I combined this dataset with existing datasets, such as Charles Stewart’s congressional committee dataLinks to an external site., to form a “super dataset” that I then analyzed for my research.

Tags:

Comments are closed

Get personalized expert assistance in any academic field

X