本文共 2477 字,大约阅读时间需要 8 分钟。
逻辑回归(Logistic Regression)
判别模型:我们只需要学习P(y|x)。
让步比(odds ratio):
假设一个特征有0.9的概率属于类别1,P(y=1)=0.9。那让步比为:P(y=1)/P(y=0) = 0.9/0.1 = 9。让步比范围0到正无穷。取对数后将所有0到1之间的概率映射到负无穷到正无穷,更高的概率对应于更高的
让步比对数。
线性等式:
yi=w0+w1xi y i = w 0 + w 1 x i 等式替换(用p替换y):
log(p(y=1|x)p(y=0|x))=log(pi1−pi)=w0+w1xi l o g ( p ( y = 1 | x ) p ( y = 0 | x ) ) = l o g ( p i 1 − p i ) = w 0 + w 1 x i 求解pi:
pi=11+e−(w0+w1xi) p i = 1 1 + e − ( w 0 + w 1 x i ) Binary case:
p(y=1|x)=ew0+∑iwixi1+ew0+∑iwixi p ( y = 1 | x ) = e w 0 + ∑ i w i x i 1 + e w 0 + ∑ i w i x i
p(y=0|x)=11+ew0+∑iwixi p ( y = 0 | x ) = 1 1 + e w 0 + ∑ i w i x i sigmoid函数:
损失函数:
L(w)=∑i=1N[yilog(p(y=1|x))+(1−yi)log(p(y=0|x))] L ( w ) = ∑ i = 1 N [ y i l o g ( p ( y = 1 | x ) ) + ( 1 − y i ) l o g ( p ( y = 0 | x ) ) ] 优化算法
梯度下降算法:
w:=w−α∇wf(w) w := w − α ∇ w f ( w )
b:=b−α∇bf(b) b := b − α ∇ b f ( b ) 特征:
对于逻辑回归,我们可以通过已经掌握的系数(clf.coef_
)直接就能获知这个特征的影响力。一个特征的系数越高,就在模型预测过程中的作用越大;负值系数告诉我们对负类的贡献度。
多项逻辑斯蒂回归
将LR泛化到多分类。
Multinomial case:
p(y=k|x)=e(w0+wkx)1+∑K−1k=1e−(w0+wkx),k=1,2,...,K−1 p ( y = k | x ) = e ( w 0 + w k x ) 1 + ∑ k = 1 K − 1 e − ( w 0 + w k x ) , k = 1 , 2 , . . . , K − 1
p(y=K|x)=11+∑K−1k=1e−(w0+wkx) p ( y = K | x ) = 1 1 + ∑ k = 1 K − 1 e − ( w 0 + w k x ) 损失函数:
minw,c12wTw+C∑i=1nlog(exp(−yi(XTiw+b))+1) m i n w , c 1 2 w T w + C ∑ i = 1 n l o g ( e x p ( − y i ( X i T w + b ) ) + 1 )
minw,c||w||1+C∑i=1nlog(exp(−yi(XTiw+b))+1) m i n w , c | | w | | 1 + C ∑ i = 1 n l o g ( e x p ( − y i ( X i T w + b ) ) + 1 ) 应用
from sklearn.linear_model import LogisticRegressionclf = LogisticRegression()clf.fit(X, y)
重要参数:
- C:Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization.
- penalty : str, ‘l1’ or ‘l2’, default: ‘l2’
小结
Given this model formulation, we want to learn parametes {c_i} that maximise the conditional likehood of the data according to the model.
Due to the softmax function we only construct a classifier, but learn probablity distributions over classifications.
These are many ways to chose weights {c_i}:
- Percptron: Find misclassified examples and move weights in the direction of their correct class
- Margin-Based: Methods such as Support Vector Machines can be used for learning weights
- Logistic Regression: Directly maximise the conditional log-likelihood via gradient descent.
《Building Machine Learning Systems with Python》
scikit-learn.org/stable/modules/linear_model.html#logistic-regression
《统计学习方法》
转载地址:http://kioji.baihongyu.com/