Java自学者论坛

 找回密码
 立即注册

手机号码,快捷登录

恭喜Java自学者论坛(https://www.javazxz.com)已经为数万Java学习者服务超过8年了!积累会员资料超过10000G+
成为本站VIP会员,下载本站10000G+会员资源,会员资料板块,购买链接:点击进入购买VIP会员

JAVA高级面试进阶训练营视频教程

Java架构师系统进阶VIP课程

分布式高可用全栈开发微服务教程Go语言视频零基础入门到精通Java架构师3期(课件+源码)
Java开发全终端实战租房项目视频教程SpringBoot2.X入门到高级使用教程大数据培训第六期全套视频教程深度学习(CNN RNN GAN)算法原理Java亿级流量电商系统视频教程
互联网架构师视频教程年薪50万Spark2.0从入门到精通年薪50万!人工智能学习路线教程年薪50万大数据入门到精通学习路线年薪50万机器学习入门到精通教程
仿小米商城类app和小程序视频教程深度学习数据分析基础到实战最新黑马javaEE2.1就业课程从 0到JVM实战高手教程MySQL入门到精通教程
查看: 718|回复: 0

Python使用request包请求网页乱码解决方法

[复制链接]
  • TA的每日心情
    奋斗
    6 天前
  • 签到天数: 803 天

    [LV.10]以坛为家III

    2053

    主题

    2111

    帖子

    72万

    积分

    管理员

    Rank: 9Rank: 9Rank: 9

    积分
    726482
    发表于 2021-4-12 10:22:50 | 显示全部楼层 |阅读模式

     使用requests请求网页时,返回的页面信息有时是乱码,如下代码

    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36'
    }
    def get_all(url,key):
        params = {
            'keyword':key,
            'enc':'utf-8'
        }
        response = requests.get(url=url,params=params,headers=headers)
    
        with open('jd.html','w',encoding='utf-8') as f:
            f.write(response.text)
    
    
    
    if __name__ == '__main__':
        key = input('输入搜索内容:')
        url = 'https://search.jd.com/Search?'
        get_all(url,key)

    返回的部分内容;

    <div class="p-name p-name-type-2">
                <a target="_blank" title="极地传说短袖T恤男夏季韩版潮流短袖男士半袖t恤圆领休闲修身大码五分袖潮牌青年学生t恤衣服男装 430黄色 M" href="//item.jd.com/51029271063.html" onclick="searchlog(1,51029271063,8,1,'','flagsClk=1077936264')">
                    <em>极地传说短袖T恤<font class="skcolor_ljg">男</font>夏季韩版潮流短袖男士半袖t恤圆领休闲修身大码五分袖潮牌青年学生t恤衣服<font class="skcolor_ljg">男装</font> 430黄色 M</em>
                    <i class="promo-words" id="J_AD_51029271063"></i>
                </a>
            </div>

    解决方法和思路过程;

    代码;

    def get_all(url,key):
        params = {
            'keyword':key,
            'enc':'utf-8'
        }
        response = requests.get(url=url,params=params,headers=headers)
        # 打印出所请求页面返回的编码方式
        print(response.encoding)
        # response.apparent_encoding是通过内容分析出的编码方式,这里是urf-8
        print(response.apparent_encoding)
        # 转码
        content = response.text.encode(response.encoding).decode(response.apparent_encoding)
        print(content)
        with open('jd.html','w',encoding='utf-8') as f:
            f.write(content)
    
    
    
    if __name__ == '__main__':
        key = input('输入搜索内容:')
        url = 'https://search.jd.com/Search?'
        get_all(url,key)

    控制台输出(部分);

    E:\anaconda\python.exe E:/练习/最后阶段/0808/jd.py
    输入搜索内容:男装
    ISO-8859-1
    utf-8
    <!DOCTYPE html>
    <html>
    <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="renderer" content="webkit">
    <meta http-equiv="Cache-Control" content="max-age=300" />
    <link rel="dns-prefetch" href="//search.jd.com" /><link rel="dns-prefetch" href="//item.jd.com" /><link rel="dns-prefetch" href="//list.jd.com" /><link rel="dns-prefetch" href="//p.3.cn" /><link rel="dns-prefetch" href="//misc.360buyimg.com" /><link rel="dns-prefetch" href="//nfa.jd.com" /><link rel="dns-prefetch" href="//d.jd.com" /><link rel="dns-prefetch" href="//img12.360buyimg.com" /><link rel="dns-prefetch" href="//img13.360buyimg.com" /><link rel="dns-prefetch" href="//static.360buyimg.com" /><link rel="dns-prefetch" href="//csc.jd.com" /><link rel="dns-prefetch" href="//mercury.jd.com" /><link rel="dns-prefetch" href="//x.jd.com" /><link rel="dns-prefetch" href="//wl.jd.com" /><title>男装 - 商品搜索 - 京东</title><meta name="Keywords" content="男装,京东男装" /><meta name="description" content="在京东找到了260867件男装的类似商品,其中包含了“男装”等类型的男装的商品。" /><script>
    window.loadFa_toJson_data={query:'%E7%94%B7%E8%A3%85'};
    window.jdpts={};jdpts._st=new Date().getTime();window.pageConfig={
        closeJpg : 1,
        compatible: false,
        searchType: 0,
        jdfVersion: '2.0.0',
        floatnav: 1,
        price_pdos_off: 0,
        actName: '',
        pSource: 'search_pc',
        queryParam: {
            c1: 0,
            c2: 1342,
            c3: 0,
            brand: '',
            price: '',
            keyword: '男装',
            page: '1'
        }
    };
    window.searchUnit={
        resizeOnebox: function(g,f,j){var g=parseInt(g),i=typeof f,h=typeof j;if(!isNaN(g)){if("string"==i&&f!=""&&g>0){$("#J_oneBoxFrame_"+f).css("height",g+10);h=="function"&&j()}else{if(i=="undefined"||i=="function"){$("#virtualWareIFrame").css("height",g>0?g+10:0);i=="function"&&f()}}}},
        resizeShopbox: function(e,d){var f=0;switch(e){case 1:case 2:f=145;break;case 3:f=75;break;case 4:f=80;break;default:break}f&&$("#shopboxIFrame").css("height",f).show();typeof(d)=="string"&&(new Image().src=d)},
        coupon: {}};
    window.QUERY_KEYWORD='男装';
    window.REAL_KEYWORD='男装';
    </script>
    <link type="text/css" rel="stylesheet" href="//misc.360buyimg.com/??jdf/1.0.0/unit/ui-base/5.0.0/ui-base.css,jdf/1.0.0/unit/shortcut/5.0.0/shortcut.css,jdf/1.0.0/unit/global-header/5.0.0/global-header.css,jdf/1.0.0/unit/myjd/5.0.0/myjd.css,jdf/1.0.0/unit/nav/5.0.0/nav.css,jdf/1.0.0/unit/shoppingcart/5.0.0/shoppingcart.css,jdf/1.0.0/unit/global-footer/5.0.0/global-footer.css,jdf/1.0.0/unit/service/5.0.0/service.css,jdf/1.0.0/unit/global-header-photo/5.0.0/global-header-photo.css,jdf/1.0.0/ui/area/1.0.0/area.css" />
    <link type="text/css" rel="stylesheet" href="//misc.360buyimg.com/product/search/1.0.7/css/search.css" />
    <script type="text/javascript" src="//misc.360buyimg.com/??jdf/1.0.0/unit/base/5.0.0/base.js,jdf/lib/jquery-1.6.4.js,product/module/es5-shim.js"></script>
    <script>
    window.SEARCH = {
        cid: 1349,
        ui_ver: '1.0.7',
        c_category: 1342,
        p_category: 0,
        enable_adv: 1,
        enable_prom_adwords: 1,
        enable_prom_flag: 1,
        enable_price: 1,
        enable_stock: 2,
        enable_yyk: 0,
        lottery_code: '',
        is_correct_hash: function(e){var a=["keyword","brand_id","activity_id","coupon_batch","ecard_id"];for(var c=0,b=a.length;c<b;c++){var d=new RegExp("(^|\\?|&)"+a[c]+"=([^&]*)(\\s|&|$)");if(d.test(e)){return true}}return false},
        get_real_hash: function(){var a=window.location.hash.substr(1);if(a&&$.browser.mozilla){return location.href.substr(location.href.indexOf("#")+1)}else{return a}}
    };
    (function(a,b){var c=b.get_real_hash();if(b.is_correct_hash(c)){a.location.href=a.location.pathname+"?"+c;return false}else{if(a.self!=a.top||$.browser.msie&&$.browser.version<=9){var f=null,e=function(){var d=$(a).width();return 1210>d?$("html").removeClass():$("html").removeClass().addClass(d>=1210&&1390>d?"resp01":"resp02"),true};e();$(a).resize(function(){clearTimeout(f),f=setTimeout(e,20)})}}})(window,SEARCH);
    </script>
    </head>
    <body>
    <!--shortcut start-->
    <div id="shortcut-2014">
        <div class="w">
            <ul class="fl">
                <li id="ttbar-home"><i class="iconfont">&#xe608;</i><a href="//www.jd.com/" target="_blank">京东首页</a></li>
                <li class="dorpdown" id="ttbar-mycity"></li>
            </ul>

    补充,方法二;

    response = requests.get(url=url, headers=headers)
        etrees = etree.HTML(response.content.decode("gb18030"))

     

     

    done。

    哎...今天够累的,签到来了1...
    回复

    使用道具 举报

    您需要登录后才可以回帖 登录 | 立即注册

    本版积分规则

    QQ|手机版|小黑屋|Java自学者论坛 ( 声明:本站文章及资料整理自互联网,用于Java自学者交流学习使用,对资料版权不负任何法律责任,若有侵权请及时联系客服屏蔽删除 )

    GMT+8, 2024-11-23 19:46 , Processed in 0.154250 second(s), 30 queries .

    Powered by Discuz! X3.4

    Copyright © 2001-2021, Tencent Cloud.

    快速回复 返回顶部 返回列表