Why Statistics is Not Data Science Chris Malone  Tisha Hooks

What elements of Task #1 are 
 Statistical in nature  Data Science in nature 

0:  > Matching race levels in such a way to minimize impact on analysis, i.e. reduction of bias, etc.
> Obtain summaries for SQF and Census data to compute a discrepancy measure
 > Retrieve data
> Incorporate precinct information (boundaries via shape files) into census data
> Create necessary variables so that a diversity measure can be computed




2:  Understanding the question
Define & decide what to calculate  Understanding the question
Merge, manipulate, and prepare data
Calculations, advanced visualizations 


3:  Create graphs of potential bias variables, segmented (colored?) by categories.
Find measures of center, spread for said potential bias variables.
Create twoway tables of potential bias variables
 Examining and addressing missing data.
Dealing with scale. How many individuals are unique?




5:  Definitions of bias, generalization to other cities/not (time), conditional crosstabs, plots, consideration of age and reasons for stops as a confounding  Aggregation by zip code, file merging (matching algorithm for race) 



7:  comparing the percent of a race in those stopped to the percent of that race in that precinct, taking uncertainty into account  mapping blocks to precincts
mapping different race definitions to each other
merging datasets 


8:  Descriptive stats, and maybe inference.
Inference/modeling  web scraping? (if needed)
wrangling, merging categories
Map to see the data 


9:  understand question of interest
understand variables
making linkages across data
summary stats and walk through
analysis to answer question (multiple ways/models preferable)  cleaning data
understand data structures
executing data set linkage




11:  Agree on measure of bias.
Align racial defns.
Determine confounders for which to adjust.
Model development/analysis strategy.
Data Viz.
Talking with police/census for data clarity.  link precincts and neighborhoods.
Align racial defns.
Merge data.
Obtain data on confounders.
Fitting model chosen.
Data Viz.
Talking with police/census for data clarity. 


12:  Merging of categories (census)
Merge data sets
Goodness of fit test by precinct
Pseudomeasure bias looking at max test stat  Merging of categories (census)
Merge data sets 


13:  chisquare test and computing percentages  data wrangling (consolidating race, merge census and frisk data at neighborhood level) 


