Activity 1: Analysis of the antisocx Variable

This analysis uses the Crime Survey for England and Wales, 2013-2014 dataset to investigate levels of perceived antisocial behavior in respondents' neighborhoods. I loaded the dataset into RStudio, where it was named csew1314teachingopen. My first step was to explore the antisocx variable manually in the RStudio console to understand its structure and identify any data issues, such as missing values.

Initial Exploration in RStudio Console

  1. Inspecting the Data:

    I began by viewing a few entries from the antisocx variable in the RStudio console to check its structure and content. This allowed me to confirm that the variable contained numerical data, along with several missing (NA) values.

    # View the first few entries of antisocx
    head(csew1314teachingopen$antisocx)
  2. Checking for Missing Values:

    To determine how many missing values existed in antisocx, I used the is.na function in the R console. This step confirmed that a significant portion of the values were missing, which would need to be handled carefully in the analysis.

    # Count missing values in antisocx
    sum(is.na(csew1314teachingopen$antisocx))
  3. Calculating Mean, Standard Deviation, and Summary Manually:

    Next, I manually calculated the mean and standard deviation in the R console to get a preliminary understanding of the data's distribution. I also used the summary() function to see a broader overview, which helped me recognize the general data spread and identify any outliers. Through pairwise deletion I excluded the specific missing datapoints in the calculation by using the na.rm = TRUE function option

    # Calculate mean, standard deviation, and summary
    mean(csew1314teachingopen$antisocx, na.rm = TRUE)
    sd(csew1314teachingopen$antisocx, na.rm = TRUE)
    summary(csew1314teachingopen$antisocx)

Automated Script for Analysis

After completing the initial exploration, I created a script to automate the calculation of the mean, standard deviation, and summary for the antisocx variable. Here's the R script I wrote:

# Assuming the dataset is already loaded in RStudio as csew1314teachingopen

# Calculate mean and standard deviation for antisocx, ignoring missing values
antisocx_mean <- mean(csew1314teachingopen$antisocx, na.rm = TRUE)
antisocx_sd <- sd(csew1314teachingopen$antisocx, na.rm = TRUE)

# Generate a summary of the antisocx variable
antisocx_summary <- summary(csew1314teachingopen$antisocx)

# Print results
print(paste("Mean of antisocx:", antisocx_mean))
print(paste("Standard deviation of antisocx:", antisocx_sd))
print("Summary of antisocx:")
print(antisocx_summary)

Results

Upon running the above script, I obtained the following results in the RStudio console:

  • Mean of antisocx: -0.0075
  • Standard Deviation of antisocx: 0.991
  • Summary of antisocx:
    • Min: -1.215
    • 1st Qu.: -0.788
    • Median: -0.185
    • Mean: -0.0075
    • 3rd Qu.: 0.528
    • Max: 4.015
    • Missing Values (NA's): 6,694

These results suggest that the average perception of antisocial behavior is slightly negative, indicating that respondents generally perceive low levels of antisocial behavior in their neighborhoods. The moderate standard deviation suggests some variability in responses, and the large number of missing values implies that many respondents may have been uncomfortable or found this information irrelevant.

Here there are screenshots showing the preliminary analysis through R and the script execution and results:

Preliminary Analysis in R Preliminary Analysis in R Studio
Antisocx Variable Results Results of the script execution in R Studio

Reflection

In this analysis, I explored the antisocx variable from the 2013-2014 Crime Survey for England and Wales to assess perceived levels of antisocial behavior in neighborhoods. Loading the data into RStudio revealed 6,694 missing entries for antisocx, a potential bias if these missing responses correlate with specific demographics or neighborhood traits.

Key statistics—mean (-0.0075), median (-0.185), and standard deviation (0.991)—indicated that respondents generally perceive antisocial behavior as low, with moderate variability in responses. Most perceptions fell within a low range, with minimal reports of extreme antisocial behavior.

This exercise reinforced the importance of data exploration, especially in handling missing data to avoid skewed insights. Going forward, I aim to apply imputation or filtering techniques to manage missing values effectively, recognizing that detailed data inspection and bias management are essential steps in accurate statistical analysis.

Email
GitHub
LinkedIn