For this portion of the project, you will examine your dataset for incorrect data. Any incorrect data should be removed, corrected, or imputed. Follow these steps:
- Remove irrelevant data. If you are unsure if it is irrelevant, then keep it.
- Remove duplicate records that are repeated.
- Make sure numbers are interpreted as numerical data types.
- Fix typos.
- Standardize.
- Investigate outliers.
- Check and manage missing values.
- Format and normalize data if needed.
- Change categorical values into numbers if needed.
Once you have completed this, you will need to provide a Word document summarizing the pre-processing steps performed on your dataset.
Leave a reply