Discussion week 1 – data cleansing and de-duplication