57个挑战之57(part6)：客户端+web前端+

57个挑战之57(part6)：客户端+web前端+服务端代码实现

题目：

接前面：

https://www.toutiao.com/article/7131944951776150056/

https://www.toutiao.com/article/7129048315920335399/

https://www.toutiao.com/article/7128604966595920414/

https://www.toutiao.com/article/7128196832081183236/

https://www.toutiao.com/article/7133396498141954595/

上面设计好逻辑之后，主要做代码实现：

代码：57-57clientsidev5.py

import requests
import random
import time
from flask import Flask,request,render_template,url_for,redirect
import redis
from redis import StrictRedis,ConnectionPool
import json
app=Flask(__name__)



def get_data():
    """
    函数作用：从服务端读入题库，以JSON格式返回
    输入：无
    返回：从服务端题库随机读取10个题目，并返回
    """
    print("start process get_data")
    url="http://127.0.0.1:8008/get"
    ret=requests.post(url)
    dic1=ret.json()
    print("end process get_data")
    return dic1


def orderlize(questionlist):
    """
    函数作用：读入列表，并把列表做处理，按照列表中每个字典的‘R’（难度） 大小排序。
    输入：问题list列表
    返回：根据'R‘排序后的问题列表
    """
    print("start process orderlize")
    print("列表通过难度值进行排序")
    sortedlist=sorted(questionlist,key=lambda i : i['R'])
    print(sortedlist)
    print("end process orderlize")
    return sortedlist


def randomlize_input(dic1):
    """
    函数作用：输入问题，对这个问题里面的答案和干扰项目做随机化，并返回给展示界面，包含问题、正确答案、随机化的答案+干扰项目
    输入：问题list 列表中的每一个对象（字典对象）
    输出：返回这个字典
    """
    list1=[dic1['A'], dic1['D1'], dic1['D2'], dic1['D3']]
    listrandom=random.sample(list1,4)
    return dic1['Q'],dic1['A'],listrandom


@app.route('/',methods=['GET','POST'])
def start_html():
    """
    1.get请求打开首页，选择开始答题。
      介绍规则，选择10道题，然后从容易往复杂进行答复，如果答题错误则结束。
    2.post请求 展示开始答题
      跳转到函数答题
    """
    #逻辑1:点击开始答题，读取题库信息

    if request.method=="POST": #这里是web端提交了开始答题，
       name=request.form["name"]
       print("考生名字：{0}".format(name))
       timetitle=time.strftime("%Y-%m-%d-%H:%M:%S", time.localtime())
       print("时间：{0}".format(timetitle))
       token=hash(name+timetitle) #根据考生名字和时间产生考号token
       print(token)
       count=0  #因为是第一次答题，count 初始化为0
       return redirect(url_for('start_question',token=token,count=0))


    if request.method=="GET":
       #1.展示首页，展示"开始答题"
       #1.1 点击开始答题发送请求到后端开始答题
       return render_template('index57.html')


@app.route('/question',methods=['GET','POST'])
def start_question():
    if request.method=="POST": #其实这个逻辑没有太多变化。
       totalcorrectquestion=request.json["count"]
       info="total answered correct question is {0}".format(totalcorrectquestion)
       print(info)
       #2.1 接收用户选择，
       return render_template('finish.html',info=info)

    if request.method=="GET":
       # 接收传入的数据，count 及 token，并基于token在redis池中读取数据。
       # 当count 为0的时候，把数据缓存到redis池中。
       # 当count 为1的时候，把数据从redis池中取出。
       valueargs=request.args 
       print(valueargs) #debug
       count=int(valueargs.get('count'))
       token=valueargs.get('token')
       print("we have received 'count', value is {0}".format(count))
       pool=ConnectionPool(host='localhost', port=6379, db=0, decode_responses=True) #连接本地数据库
       r=StrictRedis(connection_pool=pool) #进入连接池
       #如果count为0，调用get_data()函数，并缓存数据到本地redis，
       #读出列表中第一个数据，并展示到web端
       if  count==0: 
           print("initialize")
           dic1=get_data() #调用get_data()函数
           print(dic1) #debug
           questionlist=dic1['data'] #缓存返回的list数据
           sortedlist=orderlize(questionlist)  #把返回的list数据做排序
           print(sortedlist) #debug 展示排序后的数据
           r.hset("Qlib",token,json.dumps(sortedlist))#把排序后的数据，sortedlist存储到本地redis
           Q, A, listrandom=randomlize_input(sortedlist[count])  # 取出第count道题，并做排序
           print("it's {0} round,the question is {1} and the answer is {2}".format(count, Q, A))
           return render_template('start_question7.html', Q=Q, A=A, listrandom=listrandom, count=count, token=token) #如果count为0，表示初始化，这个时候传递列表第一个元素到客户端。
        #如果count为不为0，从本地redis读取缓存的数据
        #取列表中第count 个数据，并展示到前端
        else: #如果count不为0，则进入下一个逻辑
           sorted_list_json=r.hget("Qlib", token) #从本地内存中读取数据
           sortedlist=json.loads(sorted_list_json) #把数据存储起来，
           print("come to next question") #跳到下一个问题
           print(sortedlist)
           print(len(sortedlist))
           #从本地读取sortedlist到内存
           if  int(count)<len(sortedlist): #如果count的长度比sortedlist要短，则执行取题的操作
               print(count)
               Q, A, listrandom=randomlize_input(sortedlist[count])  # 取出第count道题，并做排序
           if int(count)==len(sortedlist): #表示所有题目已经做完了，而且都正确了
               info="Congratulations, user {0} has finished all the question".format(token)
               print(info)
               return render_template('finish.html',info=info)
           print("it's {0} round,the question is {1} and the answer is {2}".format(count, Q, A))
           return render_template('start_question7.html', Q=Q, A=A, listrandom=listrandom, count=count, token=token) #取出题目，并发送到web端



@app.route('/statistic',methods=['GET','POST'])
def take_account():
    # 当到这个界面，表示没有完全做完，这里做个统计即可，表示做完几道题。
    valueargs=request.args
    print(valueargs)  # debug
    count=int(valueargs.get('count'))
    token=valueargs.get('token')
    info="token {0} : total answered correct question is {1}".format(token,count)
    return render_template('finish.html',info=info)



if __name__=='__main__':
    app.run(host='0.0.0.0',port=80,debug=True)

几个比较难理解的拿出来单独说一下：

sortedlist=sorted(questionlist,key=lambda i : i['R'])

这里表示从questionlist 列表中取出每个元素（是字典元素），并按照字典元素中的['R']进行排序，

listrandom=random.sample(list1,4)

return redirect(url_for('start_question',token=token,count=0))

函数内部跳转，跳转到start_question 函数，且带上两个参数，toke 和count。

web 端程序：start_question7.html

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>start_question7.html</title>
    <script src="https://cdn.staticfile.org/jquery/1.10.2/jquery.min.js"></script>
</head>
<body>

问题是：<div id="q"> </div>
<form action="">
   <input type="radio" name="sex" value="0"   id="a1"> <label for="a1" id="c1" onclick="doPost(0)">Huey</label><br>
   <input type="radio" name="sex" value="1"   id="a2"> <label for="a2" id="c2" onclick="doPost(1)">Huey</label><br>
   <input type="radio" name="sex" value="2"   id="a3"> <label for="a3" id="c3" onclick="doPost(2)">Huey</label><br>
   <input type="radio" name="sex" value="3"   id="a4"> <label for="a4" id="c4" onclick="doPost(3)">Huey</label><br>
</form>

<script>
    function myFunction(){
    if("undefined"==typeof list1) //如果未定义，字符串取数值
   { var list1={{listrandom|tojson}};}
   if("undefined"==typeof question) //如果未定义，问题取数值
   { var question={{Q|tojson}};}
   console.log("come to myFunction()");
   console.log("The question is "+question)
   document.getElementById("q").innerHTML=question+"()";
   document.getElementById("c1").innerHTML=list1[0];
   document.getElementById("c2").innerHTML=list1[1];
   document.getElementById("c3").innerHTML=list1[2];
   document.getElementById("c4").innerHTML=list1[3];
    }
    window.onload=myFunction();

    function doPost(idnumber){
    console.log("come to collect info first");
    var count={{count|tojson}};
    var list1={{listrandom|tojson}};
    var correct={{A|tojson}}
    var tokenid={{token|tojson}}
    var answer=list1[idnumber];
    console.log("come to doPost() function");
    console.log("The choice is "+answer);
    console.log("The correct answer is "+ correct);
    console.log("Your token id is " + tokenid);
    console.log("The count now is "+ count);

    if (answer==correct)//如果答案正确，则跳转到URL2，传递token，以及count+1
       {
         var count=count+1;
         url="http://127.0.0.1/question";
         url2=url+"?token="+tokenid+"&count="+count;
         $.ajax({
          type:"GET",
        url:url,
        data:{"count":count,"token":tokenid},
        async:false,
        contentType:"application/json",
        success:function(data)
        {
         window.location.href=url2;
        }
         })
       }
     if(answer !=correct) //如果答案是错误的，则到统计界面，传递token及count
            {
              var jsondoc={"count":count};
              url="http://127.0.0.1/question";
              url2="http://127.0.0.1/statistic"+"?token="+tokenid+"&count="+count;
             $.ajax({
             type:"POST",
             url:url,
             async:false,
             contentType:"application/json",
             data:JSON.stringify(jsondoc),
             success:function()
              {
               window.location.href=url2;
              }
             });
             }


    }



</script>
</body>
</html>

解释下面的一段JS脚本：

先通过 list1={{ listrandom|tojson}} 收集客户端传递来的数据

if("undefined"==typeof list1) //如果未定义，字符串取数值

{ var list1={{listrandom|tojson}};}

if("undefined"==typeof question) //如果未定义，问题取数值

{ var question={{Q|tojson}};}

2.再通过document.getElementById("c1").innerHTML=list1[2]

来把数据展示到前端。

3.这里 window.onload=myFunction()

表示打开页面后自动调用这个函数。

function myFunction(){
if("undefined"==typeof list1) //如果未定义，字符串取数值
{ var list1={{listrandom|tojson}};}
if("undefined"==typeof question) //如果未定义，问题取数值
{ var question={{Q|tojson}};}
console.log("come to myFunction()");
console.log("The question is "+question)
document.getElementById("q").innerHTML=question+"()";
document.getElementById("c1").innerHTML=list1[0];
document.getElementById("c2").innerHTML=list1[1];
document.getElementById("c3").innerHTML=list1[2];
document.getElementById("c4").innerHTML=list1[3];
}
window.onload=myFunction();

解释下面的一段代码；

首先onclick=doPost(0) 表示在label 上点击鼠标后，会调度doPost()函数，且入参为0，
doPost(1),doPost(2),doPost(3) 一样，分别入参为1，2，3。
进入doPost(idnumber)函数，

var answer=list1[idnumber]，选择答案。
correct 为服务端传来的正确答案 correct={{A|tojson}}
最后判断用户的选择 answer 和实际答案 corret是否一致。

如果一致，表示用户选择正确的答案，则跳到下一道题，

否则直接统计。

<form action="">
<input type="radio" name="sex" value="0" id="a1"> <label for="a1" id="c1" onclick="doPost(0)">Huey</label><br>
<input type="radio" name="sex" value="1" id="a2"> <label for="a2" id="c2" onclick="doPost(1)">Huey</label><br>
<input type="radio" name="sex" value="2" id="a3"> <label for="a3" id="c3" onclick="doPost(2)">Huey</label><br>
<input type="radio" name="sex" value="3" id="a4"> <label for="a4" id="c4" onclick="doPost(3)">Huey</label><br>
</form>
function doPost(idnumber){
console.log("come to collect info first");
var count={{count|tojson}};
var list1={{listrandom|tojson}};
var correct={{A|tojson}}
var tokenid={{token|tojson}}
var answer=list1[idnumber];
console.log("come to doPost() function");
console.log("The choice is "+answer);
console.log("The correct answer is "+ correct);
console.log("Your token id is " + tokenid);
console.log("The count now is "+ count);
if (answer==correct)//如果答案正确，则跳转到URL2，传递token，以及count+1
{
var count=count+1;
url="http://127.0.0.1/question";
url2=url+"?token="+tokenid+"&count="+count;
$.ajax({
type:"GET",
url:url,
data:{"count":count,"token":tokenid},
async:false,
contentType:"application/json",
success:function(data)
{
window.location.href=url2;
}
})
}
if(answer !=correct) //如果答案是错误的，则到统计界面，传递token及count
{
var jsondoc={"count":count};
url="http://127.0.0.1/question";
url2="http://127.0.0.1/statistic"+"?token="+tokenid+"&count="+count;
$.ajax({
type:"POST",
url:url,
async:false,
contentType:"application/json",
data:JSON.stringify(jsondoc),
success:function()
{
window.location.href=url2;
}
});
}
}

index57.html

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>57-index</title>
</head>
<body>

<p>注意，一共10道题，逐题做答复，如果答题成功，则进入下一道题，只到结束。</p>
<p>如果答复失败，则直接退出。并统计所有答复正确的题数。</p>

<form method="post">
    <input type="text" name="name" value="请输入姓名">
    <input type="submit" name="submit" value="开始答题">
</form>


</body>
</html>

finish.html

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>finish.html</title>
</head>
<body>

{{info}}
</body>
</html>

57-serversidev3.py

import copy
import flask,pymongo
from flask import Flask,jsonify,request

app=Flask(__name__)


def insert_mongo(list1):
    myclient=pymongo.MongoClient('mongodb://127.0.0.1:27017/')
    mydb=myclient.client["lukedb5"]
    mycol=mydb["site"]
    x=mycol.insert_many(list1)
    print(x.inserted_ids)
    return "success"


def get_mongo():
    myclient=pymongo.MongoClient('mongodb://127.0.0.1:27017/')
    mydb=myclient.client["lukedb5"]
    mycol=mydb["site"]
    list2=[]
    for x in mycol.aggregate([{'$sample':{'size':10}}]):
        del x['_id'] #把id项目删除
        list2.append(x)
    print(list2)
    return list2,"success"

def getall_mongo():
    myclient=pymongo.MongoClient('mongodb://127.0.0.1:27017/')
    mydb=myclient.client["lukedb5"]
    mycol=mydb["site"]
    list3=[]
    for x in mycol.find({},{"_id":0}):#把id项目删除
        list3.append(x)
    print(list3)
    return list3,"success"


def update_mongo(listold,listnew):
    myclient=pymongo.MongoClient('mongodb://127.0.0.1:27017/')
    mydb=myclient.client["lukedb5"]
    mycol=mydb["site"]
    for x in range(0,len(listold)):#把id项目删除
        myquery=listold[x]
        newvalues={"$set":listnew[x]}
        mycol.update_one(myquery,newvalues)
    return "success"

@app.route('/')
def hello_world():
    return 'Hello World!'


@app.route('/get',methods=['POST'])
def get():
#调用查询逻辑
    #获取题目10个
    #获取难度3个容易，5个中等，2个难题
    list2,info=get_mongo()
    return jsonify({"info":info,"data":list2})

@app.route('/append',methods=['GET','POST'])
def append():
#调用插入逻辑
    list1=request.json["list"]
    #dic1={"Q":"How old is Republic of China1","A":"73","D1":"72","D2":"74","D3":"111"}
    #list1=[dic1]
    list2=copy.deepcopy(list1)
    info=insert_mongo(list1)
    #print(info)
    #print(dic1)
    #print(list2)
    return jsonify({"info":info,"data":list2})

@app.route('/getall', methods=['POST'])
def getall():
# 调用抽取接口，把题库数据读出来
    list3,info=getall_mongo()
    return jsonify({"whole question lib is":list3,"info":info})

@app.route('/update', methods=['POST'])
def update():
# 调用update 接口，修改题目
    listold=request.json["listold"]
    listnew=request.json["listnew"]
    info=update_mongo(listold,listnew)
    return jsonify({"info": info,"oldinfo":listold,"newinfo":listnew})


if __name__=='__main__':
    app.run(host='0.0.0.0',port=8008,debug=True)

效果图：

前端的日志：

编辑题库的逻辑应该差不多，这里不想重复了。

大体方法也是，先编写逻辑，基于逻辑编写实现代码。

类似这样：编写逻辑

逻辑实现后，直接做代码实现即可。

这两天有点卡在这个题目上了，还好今天close掉了。。

、前言

在使用爬虫的时候，很多网站都有一定的反爬措施，甚至在爬取大量的数据或者频繁地访问该网站多次时还可能面临ip被禁，所以这个时候我们通常就可以找一些代理ip，和不用的浏览器来继续爬虫测试。下面就开始来简单地介绍一下User-Agent池和免费代理ip池。

二、User-Agent池

User-Agent 就是用户代理，又叫报头，是一串字符串，相当于浏览器的身份证号，我们在利用python发送请求的时候，默认为： python-requests/2.22.0，所以我们在利用爬虫爬取网站数据时，频繁更换它可以避免触发相应的反爬机制。

构建User-Agent池，这里介绍两种方法：1，手动构造随机函数。2，第三方库fake-useragent

方法1：构造随机函数

自己手动编写User-Agent池，然后随机获取其中一个就行了。

def get_ua():
    import random
    user_agents=[
        'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36 OPR/26.0.1656.60',
		'Opera/8.0 (Windows NT 5.1; U; en)',
		'Mozilla/5.0 (Windows NT 5.1; U; en; rv:1.8.1) Gecko/20061208 Firefox/2.0.0 Opera 9.50',
		'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 9.50',
		'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0',
		'Mozilla/5.0 (X11; U; Linux x86_64; zh-CN; rv:1.9.2.10) Gecko/20100922 Ubuntu/10.10 (maverick) Firefox/3.6.10',
		'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534.57.2 (KHTML, like Gecko) Version/5.1.7 Safari/534.57.2 ',
		'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36',
		'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
		'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.16 (KHTML, like Gecko) Chrome/10.0.648.133 Safari/534.16',
		'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.101 Safari/537.36',
		'Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko',
		'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.11 TaoBrowser/2.0 Safari/536.11',
		'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.71 Safari/537.1 LBBROWSER',
		'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E)',
		'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.84 Safari/535.11 SE 2.X MetaSr 1.0',
		'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; SV1; QQDownload 732; .NET4.0C; .NET4.0E; SE 2.X MetaSr 1.0) ',
    ]
    user_agent=random.choice(user_agents) # 随机抽取对象
    return user_agent

# 调用
get_ua()

至于，在哪里找这些浏览器，网上一大堆，复制过来即可。

实际环境调用随机User-Agent池

import requests

def get_page(url):
    ua=get_ua()
    headers={'User-Agent': ua}
    response=requests.get(url=url, headers=headers)
    print(response.text)

if __name__=='__main__':
    get_page('https://www.baidu.com')

方法2： fake-useragent 库自动生成

注：此库在2018年已经停止更新，版本目前停止在0.1.11，所以生成的浏览器版本都比较低。如果有网站检测浏览器版本号大小(范围)的话，就可能会被检测到。

安装：

pip install fake-useragent

调用第三方库，生成指定浏览器的user-agent

from fake_useragent import UserAgent
ua=UserAgent()

ua.ie
# Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US);
ua.msie
# Mozilla/5.0 (compatible; MSIE 10.0; Macintosh; Intel Mac OS X 10_7_3; Trident/6.0)'
ua['Internet Explorer']
# Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; GTB7.4; InfoPath.2; SV1; .NET CLR 3.3.69573; WOW64; en-US)
ua.opera
# Opera/9.80 (X11; Linux i686; U; ru) Presto/2.8.131 Version/11.11
ua.chrome
# Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.2 (KHTML, like Gecko) Chrome/22.0.1216.0 Safari/537.2'
ua.google
# Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_4) AppleWebKit/537.13 (KHTML, like Gecko) Chrome/24.0.1290.1 Safari/537.13
ua['google chrome']
# Mozilla/5.0 (X11; CrOS i686 2268.111.0) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.57 Safari/536.11
ua.firefox
# Mozilla/5.0 (Windows NT 6.2; Win64; x64; rv:16.0.1) Gecko/20121011 Firefox/16.0.1
ua.ff
# Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:15.0) Gecko/20100101 Firefox/15.0.1
ua.safari
# Mozilla/5.0 (iPad; CPU OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5355d Safari/8536.25

# and the best one, random via real world browser usage statistic
ua.random

官方文档：https://fake-useragent.readthedocs.io/en/latest/

实际示例代码：

from fake_useragent import UserAgent
import requests

ua=UserAgent()
#请求的网址
url="http://www.baidu.com"
#请求头
headers={"User-Agent":ua.random}
#请求网址
response=requests.get(url=url,headers=headers)
#响应体内容
print(response.text)
#响应状态信息
print(response.status_code)
#响应头信息
print(response.headers)

三、IP代理池

开源IP代理池，这里推荐两个：

https://github.com/Python3WebSpider/ProxyPool

https://github.com/jhao104/proxy_pool

这里用第二个测试，使用人数更多，而且一直在更新。

1：下载启动

Linux下载

git clone git@github.com:jhao104/proxy_pool.git
#或者
git clone https://github.com/jhao104/proxy_pool.git

使用docker compose启动

#进入目录
cd proxy_pool/
#启动代理池
docker compose up -d

启动web服务后, 默认配置下会开启 http://127.0.0.1:5010 的api接口服务:

api：

/get ：GET，随机获取一个代理，可选参数: ?type=https 过滤支持https的代理
/pop ：GET，获取并删除一个代理，可选参数: ?type=https 过滤支持https的代理
/all ：GET，获取所有代理，可选参数: ?type=https 过滤支持https的代理
/count ：GET，查看代理数量，
/delete ：GET，删除代理， ?proxy=host:ip

访问浏览器测试，我这里IP，192.168.152.100

2：爬虫使用

如果要在爬虫代码中使用的话，可以将此api封装成函数直接使用，例如

import requests

def get_proxy():
    return requests.get("http://127.0.0.1:5010/get/").json()

def delete_proxy(proxy):
    requests.get("http://127.0.0.1:5010/delete/?proxy={}".format(proxy))

# your spider code

def getHtml():
    # ....
    retry_count=5
    proxy=get_proxy().get("proxy")
    while retry_count > 0:
        try:
            html=requests.get('http://www.example.com', proxies={"http": "http://{}".format(proxy)})
            # 使用代理访问
            return html
        except Exception:
            retry_count -=1
    # 删除代理池中代理
    delete_proxy(proxy)
    return None

更多使用方法，参考官方文档：https://proxy-pool.readthedocs.io/zh/latest/

以上是基本使用方法，都是免费的，质量有限，如需要付费的。

付费代理推荐: luminati-china. 国外的亮数据BrightData（以前叫luminati）被认为是代理市场领导者，覆盖全球的7200万IP，大部分是真人住宅IP，成功率扛扛的。

分享成果，随喜正能量】人生，有多少计较，就有多少痛苦。有多少宽容，就有多少欢乐。痛苦与欢乐都是心灵的折射，就像镜子里面有什么，决定于镜子面前的事物。心里放不下，自然成了负担，负担越多，人生越不快乐。计较的心如同口袋，宽容的心犹如漏斗。复杂的心爱计较，简单的心易快乐。

《VBA信息获取与处理》教程是我推出第六套教程，目前已经是第一版修订了。这套教程定位于最高级，是学完初级，中级后的教程。这部教程给大家讲解的内容有：跨应用程序信息获得、随机信息的利用、电子邮件的发送、VBA互联网数据抓取、VBA延时操作，剪贴板应用、Split函数扩展、工作表信息与其他应用交互，FSO对象的利用、工作表及文件夹信息的获取、图形信息的获取以及定制工作表信息函数等等内容。程序文件通过32位和64位两种OFFICE系统测试。是非常抽象的，更具研究的价值。

教程共两册，八十四讲。今日的内容是专题八“VBA与HTML文档”的第二节上半部分：HTML文档元素

第二节 HTML文档的元素分析之一

大家好，我们这讲开始HTML的讲解，为了认识这种特殊的语言，我们要先看看这种语言中的各种元素个代表什么意义。上节中在讲这种语言的特征时讲过，标签是用来描述网页的。浏览器读取HTML文档，识别标签，并按标签要求以网页进行显示文本。大部分标签都是成队出现的。

起始标签和结束标签之间的所有文本，都叫做元素。也就是这个格式就是：

<起始标签:也叫元素名>元素的内容<结束标签:/+元素名>

标签是可以拥有属性的，属性提供了元素的一些附加信息，起始标签，有时候格式如下：

<元素名属性名称=”属性值”>

同时，元素是可以拥有元素的，即某个元素的内容有时候是一个子元素。

元素这东西，比较抽象，我们可以把它且当作一个对象来理解，例如工作簿对象，每个工作簿的名称都不一样（属性），每个工作簿里面都有工作表（子元素），每个表都有名称（属性），里面填写的内容也不一样（元素的内容）。

1 HTML文档构成的整体框架结构

在上一讲中我们举了一个最为简单的实例来说明HTML文档：

<html>

<body>

<h1>学习VBA语言</h1>

<p>为了更好的掌握VBA的各个知识点，您可以参考我的第一套教程：VBA代码解决方案</p>

</body>

</html>

我们将上面的格式修正一下，如下面：

<html>

<title>

VBA应用提高篇

</title>

</head>

<h1>学习VBA语言</h1>

<p>为了更好的掌握VBA的各个知识点，您可以先参考我的第一套教程：VBA代码解决方案</p>

</body>

</html>

将上面的内容写入记事本中保存为.html文件：HTML基础学习-1.html

然后我们双击打开这个文件，看看浏览器的翻译效果：

我们再看看网页的源码：

从上面的框架代码，我们可以看出，一般的页面，都有html元素，其一般内含两个元素，一个是head元素，一个是body元素。Head元素仅仅说明文档的相关消息，并不展示文档实体，body元素才是真正展示文档主体的，所有要在页面展示的元素，都要在body內进行书写。

2 HTML文档常用元素的构成

1）HTML标题 <h1> - <h6>

标题（Heading）是通过 <h1> - <h6> 等标签进行定义的。<h1> 定义最大的标题。<h6> 定义最小的标题。浏览器会自动地在标题的前后添加空行。标题很重要，将 HTML heading 标签只用于标题。不要仅仅是为了产生粗体或大号的文本而使用标题。因为用户可以通过标题来快速浏览您的网页，所以用标题来呈现文档结构是很重要的。应该将 h1 用作主标题（最重要的），其后是 h2（次重要的），再其次是 h3，以此类推。在HTML基础学习.HTML文件中我们已经用到了标题的元素

2）HTML水平线 <hr />

<hr /> 标签在 HTML 页面中创建水平线。可用于分隔内容。

[待续]

本节参考文件：HTML基础学习-1.html；HTML基础学习-2.html

我20多年的VBA实践经验，全部浓缩在下面的各个教程中，教程学习顺序：

【分享成果，随喜正能量】人生，一岁有一岁的味道，一站有一站的风景，你的年龄应该成为你生命的勋章而不是你伤感的理由。 ??

在线咨询

上一篇：CSS设置字体大小、字体粗细、字体风格
下一篇：费用报销申请单怎么填写？费用报销申请单填写指南（建议