3.1.6.3. 鸢尾花瓣和萼片尺寸分析

说明对真实数据集的分析

  • 可视化数据以形成直觉

  • 线性模型拟合

  • 在存在连续混淆变量的情况下,对分类变量影响进行假设检验

import matplotlib.pyplot as plt
import pandas
from pandas import plotting
from statsmodels.formula.api import ols
# Load the data
data = pandas.read_csv("iris.csv")

绘制散点矩阵

# Express the names as categories
categories = pandas.Categorical(data["name"])
# The parameter 'c' is passed to plt.scatter and will control the color
plotting.scatter_matrix(data, c=categories.codes, marker="o")
fig = plt.gcf()
fig.suptitle("blue: setosa, green: versicolor, red: virginica", size=13)
blue: setosa, green: versicolor, red: virginica
Text(0.5, 0.98, 'blue: setosa, green: versicolor, red: virginica')

统计分析

# Let us try to explain the sepal length as a function of the petal
# width and the category of iris
model = ols("sepal_width ~ name + petal_length", data).fit()
print(model.summary())
# Now formulate a "contrast", to test if the offset for versicolor and
# virginica are identical
print("Testing the difference between effect of versicolor and virginica")
print(model.f_test([0, 1, -1, 0]))
plt.show()
                            OLS Regression Results
==============================================================================
Dep. Variable: sepal_width R-squared: 0.478
Model: OLS Adj. R-squared: 0.468
Method: Least Squares F-statistic: 44.63
Date: Mon, 07 Oct 2024 Prob (F-statistic): 1.58e-20
Time: 04:56:51 Log-Likelihood: -38.185
No. Observations: 150 AIC: 84.37
Df Residuals: 146 BIC: 96.41
Df Model: 3
Covariance Type: nonrobust
======================================================================================
coef std err t P>|t| [0.025 0.975]
--------------------------------------------------------------------------------------
Intercept 2.9813 0.099 29.989 0.000 2.785 3.178
name[T.versicolor] -1.4821 0.181 -8.190 0.000 -1.840 -1.124
name[T.virginica] -1.6635 0.256 -6.502 0.000 -2.169 -1.158
petal_length 0.2983 0.061 4.920 0.000 0.178 0.418
==============================================================================
Omnibus: 2.868 Durbin-Watson: 1.753
Prob(Omnibus): 0.238 Jarque-Bera (JB): 2.885
Skew: -0.082 Prob(JB): 0.236
Kurtosis: 3.659 Cond. No. 54.0
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Testing the difference between effect of versicolor and virginica
<F test: F=3.245335346574177, p=0.07369058781701142, df_denom=146, df_num=1>

脚本总运行时间: (0 分钟 0.421 秒)

由Sphinx-Gallery生成的图库