Discussion – data cleansing and de-duplication