Home → Techniques and Tips → NeuralTools → Data treatment before training a Neural Network
Applies to: NeuralTools 7.x/8.x
Main steps to follow:
1. Data Quality
The first step for building any prediction or classification model is to evaluate the data quality. It is important to identify and fix some problems related to the data set.
2. Univariate Analysis
Once data issues described before have been fixed, the next step is to run a Univariate Analysis.
- If the variable only has one value, it should be removed from the analysis.
- If there are categories with a frequency lower than 5%, the variable should be categorized again in order to ensure a frequency greater than 5%.
-If there are only two categories and one of them has a frequency lower than 5% it should be removed from the analysis.
- If the variable is highly skewed, it would be convenient to use the log transformation during the Neural Network training.
- If there are outliers, it is important to see if they are error measures or not, before making the decision to exclude them.
3. Bivariate Analysis
If there are a big number of independent variables, it is convenient to run a Bivariate Analysis which means that all the independent variables will be analyzed with the dependent variable at the same time.
- t Test. These results are based on the assumption that the variables are approximately normally distributed. If this is not the case, then these results might not be valid, especially if the sample size is small. You can use the Mann-Whitney test in these cases.
If the p-value of the test is low, the independent variable is included in the neural net training; otherwise, it is omitted.
You can run this analysis trough the menu Statistical Inference > Hypothesis Test > Mean/Std. Deviation… of StatTools. Be sure to select the Two-Sample Analysis type.
- Mann-Whitney test. If the p-value of the test is low, the independent variable is included in the neural net training; otherwise, it is omitted. You can run this analysis trough the menu Nonparametric tests > Mann-Whitney test … of StatTools.
Last Update: 2020-06-04