前言

帮学长写了个作业,顺便开源一下,携程网还挺有意思的,有一个挺好玩的反爬机制

反爬介绍

携程网的点评,有些景点是可以直接看到的

但是部分景点的点评,不进行显示

从搜索简介页面可以看到,是有点评的

所以,我们直接抓包调用携程网的点评查询Api,尝试获取点评
这里直接上代码

代码

import requests
import json
import pandas as pd


list = [['千佛山', '76530', 300]]

for n in list:
    name_list, content_list, time_list = [], [], []
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36 Edg/96.0.1054.62',
        'content-type': 'application/json',
        'cookie': '_RSG=PRqnUq6fHoEm9p.VCWpwOA; _RDG=2883d111b8ea022cfc36462eb4061cfc57; _RGUID=ec91a46d-1ec7-46d9-a63a-ce031ffb3270; MKT_CKID=1630228618330.jb8pq.e7d9; _ga=GA1.2.1011814713.1630228618; GUID=09031119317039003671; ibulanguage=CN; ibulocale=zh_cn; cookiePricesDisplayed=CNY; _RF1=223.99.163.165; Session=smartlinkcode=U130026&smartlinklanguage=zh&SmartLinkKeyWord=&SmartLinkQuary=&SmartLinkHost=; Union=AllianceID=4897&SID=130026&OUID=&createtime=1640689003&Expires=1641293802506; MKT_CKID_LMT=1640689002563; MKT_Pagesource=PC; _bfaStatusPVSend=1; nfes_isSupportWebP=1; _bfa=1.1630228615045.2k93h1.1.1630228615045.1640688999635.2.8; _bfs=1.5; _bfi=p1%3D290510%26p2%3D290510%26v1%3D8%26v2%3D7; _bfaStatus=send; _jzqco=%7C%7C%7C%7C1640689002878%7C1.776089812.1630228705099.1640689096277.1640689138839.1640689096277.1640689138839.undefined.0.0.4.4; __zpspc=9.2.1640689002.1640689138.3%232%7Cwww.baidu.com%7C%7C%7C%7C%23',
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36 Edg/96.0.1054.62',
        'Content-Type': 'application/json',
    }
    params = (
        ('_fxpcqlniredt', '09031119317039003671'),
        ('x-traceID', '09031119317039003671-1640689150743-4414977'),
    )
    for ints in range(1, n[2]):
        print(ints)
        data = '{"arg":{"channelType":2,"collapseType":0,"commentTagId":0,"pageIndex":' + str(ints) + ',"pageSize":10,"poiId":' + n[1] + ',"sourceType":1,"sortType":3,"starType":0},"head":{"cid":"09031119317039003671","ctok":"","cver":"1.0","lang":"01","sid":"8888","syscode":"09","auth":"","xsid":"","extension":[]}}'
        # poiId景点编号
        # pageIndex页码
        response = requests.post('https://m.ctrip.com/restapi/soa2/13444/json/getCommentCollapseList', headers=headers,
                                 params=params, data=data)
        data_json_1 = json.loads(response.text)
        data_json_2 = data_json_1['result']['items']
        for i in data_json_2:
            if i['userInfo'] != None:
                name_list.append(i['userInfo']['userNick'])
                content_list.append(i['content'])
                print(i['content'])
                time_list.append(i['publishTypeTag'])
    news_df = pd.DataFrame({"发帖昵称": name_list, "发贴内容": content_list, "发帖时间": time_list})
    news_df.to_csv('携程网/' + n[0] + '.csv', encoding='utf-8')

Ps

通过直接调用携程网的点评Api,可以完美获取到所有的点评
所以这个反爬就是装装样子
代码最开始的list,传入[景点名,景点编号,评论页数(一页10条)]
景点编号,在景点页面,查看页面源代码,Crtl+F,搜索poiid,即可获取
最离谱的是,携程网用户信息还有获取不到的(Json数据为null),对Json数据必须进行筛选

最后修改:2022 年 04 月 19 日
请随意赞赏