非线性回归 logistic regression 原理及实现

发表于： 2017年6月19日 2018年7月26日
分类：未分类

1. 概率：

1.1 定义概率(P)robability: 对一件事情发生的可能性的衡量

1.2 范围 0 <= P <= 1

1.3 计算方法：

1.3.1 根据个人置信

1.3.2 根据历史数据

1.3.3 根据模拟数据

1.4 条件概率：

2. Logistic Regression (逻辑回归)

2.1 例子

h(x) > 0.2

2.2 基本模型

测试数据为X（x0，x1，x2···xn）

要学习的参数为： Θ（θ0，θ1，θ2，···θn）

向量表示：

处理二值数据，引入Sigmoid函数时曲线平滑化

预测函数：

用概率表示：

正例(y=1)：

反例(y=0):

2.3 Cost函数

线性回归:

找到合适的 θ0，θ1使上式最小

Logistic regression:

Cost函数：

目标：找到合适的 θ0，θ1使上式最小

2.4 解法：梯度下降（gradient decent)

更新法则：

学习率

同时对所有的θ进行更新

重复更新直到收敛

非线性回归应用使用python实现

import numpy as np
import random

# m denotes the number of examples here, not the number of features
def gradientDescent(x, y, theta, alpha, m, numIterations):
    xTrans = x.transpose()
    for i in range(0, numIterations):
        hypothesis = np.dot(x, theta)
        loss = hypothesis - y
        # avg cost per example (the 2 in 2*m doesn't really matter here.
        # But to be consistent with the gradient, I include it)
        cost = np.sum(loss ** 2) / (2 * m)
        print("Iteration %d | Cost: %f" % (i, cost))
        # avg gradient per example
        gradient = np.dot(xTrans, loss) / m
        # update
        theta = theta - alpha * gradient
    return theta

#模拟获取数据
def genData(numPoints, bias, variance):
    x = np.zeros(shape=(numPoints, 2))
    y = np.zeros(shape=numPoints)
    # basically a straight line
    for i in range(0, numPoints):
        # bias feature
        x[i][0] = 1
        x[i][1] = i
        # our target variable
        y[i] = (i + bias) + random.uniform(0, 1) * variance
    return x, y

# gen 100 points with a bias of 25 and 10 variance as a bit of noise
x, y = genData(100, 25, 10)
m, n = np.shape(x)
numIterations= 100000
alpha = 0.0005
theta = np.ones(n)
theta = gradientDescent(x, y, theta, alpha, m, numIterations)
print(theta)

该代码运行的Iteration 不断变大 cost 不断变小求出theta

logistic regression示例代码下载

admin

1009