700字范文,内容丰富有趣,生活中的好帮手!
700字范文 > python 模拟浏览器登录获取cookie_使用cookielib模拟浏览器在python中获取url

python 模拟浏览器登录获取cookie_使用cookielib模拟浏览器在python中获取url

时间:2024-03-29 05:25:58

相关推荐

python 模拟浏览器登录获取cookie_使用cookielib模拟浏览器在python中获取url

我正在使用cookielib,有时在浏览器中打开一个url,通过浏览器发出许多其他请求来下载许多其他文件。我可以使用cookielib或任何其他python库复制相同的行为吗?在

我必须从python脚本发出超过1个GET请求。当我打开网页时,通过分析网络请求,我得到了浏览器发出的所有请求的url。在

我在看是否有什么方法我可以只做一个请求,它像浏览器一样自己获取所有相关的请求。在

我对js或css不是很感兴趣,而是主要的html。在

我尝试了下面的代码,但它无法下载整个页面cj = cookielib.CookieJar()

opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))

response = opener.open('/psp/hrsappl/EMPLOYEE/HRMS/c/HRS_HRAM.HRS_CE.GBL?Page=HRS_CE_HM_PRE&Action=A&SiteId=1')

html = response.read()

但是当我按顺序获取另外3个GET url时,它能够在第三个GET响应中给我所需的html。我通过检查浏览器的network标签得到了这些url

^{pr2}$

下面是我正在进行的其他抓取的完整代码response = opener.open('/psc/hrsappl/EMPLOYEE/EMPL/s/WEBLIB_PT_NAV.ISCRIPT1.FieldFormula.IScript_UniHeader_Frame?c=NNTCgkqGs001AcPaisqGbYpTu%2fbGx4jx&Page=HRS_CE_HM_PRE&Action=A&SiteId=1&PortalActualURL=https%3a%2f%%2fpsc%2fhrshrm%2fEMPLOYEE%2fHRMS%2fc%2fHRS_HRAM.HRS_CE.GBL%3fPage%3dHRS_CE_HM_PRE%26Action%3dA%26SiteId%3d1&PortalContentURL=https%3a%2f%%2fpsc%2fhrshrm%2fEMPLOYEE%2fHRMS%2fc%2fHRS_HRAM.HRS_CE.GBL%3fPage%3dHRS_CE_HM_PRE%26Action%3dA%26SiteId%3d1&PortalContentProvider=HRMS&PortalRegistryName=EMPLOYEE&PortalServletURI=https%3a%2f%%2fpsp%2fhrsappl%2f&PortalURI=https%3a%2f%%2fpsc%2fhrsappl%2f&PortalHostNode=EMPL&PortalIsPagelet=true&NoCrumbs=yes')

response.read()

response = opener.open('/psc/hrsappl/EMPLOYEE/EMPL/s/WEBLIB_PTPPB.ISCRIPT2.FieldFormula.IScript_TemplatePageletBuilder?PTPPB_PAGELET_ID=KC_LNAV_APPLICANT&target=KCNV_KC_LNAV_APPLICANT_TMPL&Page=HRS_CE_HM_PRE&Action=A&SiteId=1&PortalActualURL=https%3a%2f%%2fpsc%2fhrshrm%2fEMPLOYEE%2fHRMS%2fc%2fHRS_HRAM.HRS_CE.GBL%3fPage%3dHRS_CE_HM_PRE%26Action%3dA%26SiteId%3d1&PortalContentURL=https%3a%2f%%2fpsc%2fhrshrm%2fEMPLOYEE%2fHRMS%2fc%2fHRS_HRAM.HRS_CE.GBL%3fPage%3dHRS_CE_HM_PRE%26Action%3dA%26SiteId%3d1&PortalContentProvider=HRMS&PortalRegistryName=EMPLOYEE&PortalServletURI=https%3a%2f%%2fpsp%2fhrsappl%2f&PortalURI=https%3a%2f%%2fpsc%2fhrsappl%2f&PortalHostNode=EMPL&PortalIsPagelet=true&NoCrumbs=yes&PortalTargetFrame=TargetContent')

response.read()

response = opener.open('/psc/hrshrm/EMPLOYEE/HRMS/c/HRS_HRAM.HRS_CE.GBL?Page=HRS_CE_HM_PRE&Action=A&SiteId=1&PortalActualURL=https%3a%2f%%2fpsc%2fhrshrm%2fEMPLOYEE%2fHRMS%2fc%2fHRS_HRAM.HRS_CE.GBL%3fPage%3dHRS_CE_HM_PRE%26Action%3dA%26SiteId%3d1&PortalContentURL=https%3a%2f%%2fpsc%2fhrshrm%2fEMPLOYEE%2fHRMS%2fc%2fHRS_HRAM.HRS_CE.GBL%3fPage%3dHRS_CE_HM_PRE%26Action%3dA%26SiteId%3d1&PortalContentProvider=HRMS&PortalCRefLabel=Careers&PortalRegistryName=EMPLOYEE&PortalServletURI=https%3a%2f%%2fpsp%2fhrsappl%2f&PortalURI=https%3a%2f%%2fpsc%2fhrsappl%2f&PortalHostNode=EMPL&NoCrumbs=yes&PortalKeyStruct=yes')

required_html = response.read()

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。