Why Statistics is Not Data Science
Chris Malone | Tisha Hooks

What elements of Task #1 are

Statistical in natureData Science in nature

0:> Matching race levels in such a way to minimize impact on analysis, i.e. reduction of bias, etc.
> Obtain summaries for SQF and Census data to compute a discrepancy measure
> Retrieve data
> Incorporate precinct information (boundaries via shape files) into census data
> Create necessary variables so that a diversity measure can be computed


2:Understanding the question
Define & decide what to calculate
Understanding the question
Merge, manipulate, and prepare data
Calculations, advanced visualizations

3:Create graphs of potential bias variables, segmented (colored?) by categories.

Find measures of center, spread for said potential bias variables.

Create two-way tables of potential bias variables
Examining and addressing missing data.

Dealing with scale. How many individuals are unique?


5:Definitions of bias, generalization to other cities/not (time), conditional cross-tabs, plots, consideration of age and reasons for stops as a confoundingAggregation by zip code, file merging (matching algorithm for race)


7:comparing the percent of a race in those stopped to the percent of that race in that precinct, taking uncertainty into accountmapping blocks to precincts
mapping different race definitions to each other
merging datasets

8:Descriptive stats, and maybe inference.
web scraping? (if needed)
wrangling, merging categories
Map to see the data

9:understand question of interest
understand variables
making linkages across data
summary stats and walk through
analysis to answer question (multiple ways/models preferable)
cleaning data
understand data structures
executing data set linkage


11:Agree on measure of bias.
Align racial defns.
Determine confounders for which to adjust.
Model development/analysis strategy.
Data Viz.
Talking with police/census for data clarity.
link precincts and neighborhoods.
Align racial defns.
Merge data.
Obtain data on confounders.
Fitting model chosen.
Data Viz.
Talking with police/census for data clarity.

12:Merging of categories (census)
Merge data sets
Goodness of fit test by precinct
Pseudo-measure bias looking at max test stat
Merging of categories (census)
Merge data sets

13:chi-square test and computing percentagesdata wrangling (consolidating race, merge census and frisk data at neighborhood level)