m `Dc@sM dZdZdZdZdddddgZd Zd eZd Zd gZd Z d dgZ dk Z dk Z dk Z dkZdkZdkZdkZdkZdkZdkZdkZydklZWndklZnXy dkZWn eZnXy dkZWn eZnXy0dkZeiiedk l!Z"dZ#Wnd Z#hdZ"nXydk$Z$dk%Z%WneZ$Z%nXy dk&Z'WnnXy dk(Z(WnnXy-dk)Z)eodk*Z)de)i+_nWn eZ)nXde,fdYZ-de-fdYZ.de-fdYZ/de-fdYZ0de,fdYZ1e i2de _3e i2de _4e i2d e _5hd!d"<d#d$<d%d&<d'd(<d)d*<d+d,<d-d.<d/d0<d1d2<d3d4<d5d6<d7d8<d9d:<d;d<<d=d><d?d@<dAdBdIe>dJ<e>dKe>dL<e>dMe>dN<e>dOe>dP<e>dQe>dR<e>dSe>dT<e>dUe>dV<e>dWe>dX<e>dYe>dZ<e>d[e>d\<e>d]e>d^<e>d_e>d`<e>dae>db<e>dce>dd<e>dee>df<e>dge>dh<e>die>dj<e>dke>dl<e>dme>dn<e>doe>dp<e>dqe>dr<e>dse>dt<e>due>dv<e>dwe>dx<e>dye>dz<e>d{e>d|<e>d}e>d~ s4.1sCopyright (c) 2002-2006, Mark Pilgrim, All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.s'Mark Pilgrim s%Jason Diamond s'John Beimler s1Fazal Majid s"Aaron Swartz s(Kevin Marks is.UniversalFeedParser/%s +http://feedparser.org/sapplication/atom+xml,application/rdf+xml,application/rss+xml,application/x-netcdf,application/xml;q=0.9,text/xml;q=0.2,*/*;q=0.1t drv_libxml2tuTidytmxTidyN(sStringIO(sescapeicCsc|idd}|idd}|idd}x&|D]\}}|i||}q=W|S(Nt&s&t>s>td?d@dAdBdCdDdEdFdGdHdIdJdKdLdMdNdOdPdQdRdSdTdUdVdWdXdYdZd[d\d]d^d_d`dadbdcdddedfdgdhdidjdkdldmdndodpdqdrdsdtdudvdwdxdydzd{d|d}d~ddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddf}dk}|iditttdditt|an|i tSdS(Niiiiii iiiiii i i iiiiiiiiiiiiiiiiiiiiiiii iiiiiiiiiiiiiiiiiiiiiiiiiii iiiiiiiiii[i.i<i(i+i!i&iiiiiiiiii]i$i*i)i;i^i-i/iiiiiiiii|i,i%i_i>i?iiiiiiiiii`i:i#i@i'i=i"iiaibicidieifigihiiiiiiiiiijikiliminioipiqiriiiiiiii~isitiuiviwixiyiziiiiiiiiiiiiiiiiiiiiiii{iAiBiCiDiEiFiGiHiIiiiiiii}iJiKiLiMiNiOiPiQiRiiiiiii\iiSiTiUiViWiXiYiZiiiiiii0i1i2i3i4i5i6i7i8i9iiiiiiRi( t_ebcdic_to_ascii_maptemaptstringt maketranstjointmaptchrtrangetst translate(RwRpRq((R t_ebcdic_to_asciis  @ii ii iiii ii& ii ii! iiii0 ii`ii9 iiRii}ii ii ii ii ii" ii ii iiii"!iiaii: iiSii~iixs&^([A-Za-z][A-Za-z0-9+-.]*://)(/*)(.*?)cCs"tid|}ti||S(Ns\1\3(t _urifixertsubturiturlparseturljointbase(RR|((R t_urljoin;st_FeedParserMixinc BsNtZhdd<dd<dd<dd<dd<dd<dd<dd<d d<d d<d d<d d<d d<dd<dd<dd<dd<dd<dd<dd<dd<dd<d d!<d"d#<d$d%<d&d'<d(d)<d*d+<d,d-<d.d/<d0d1<d2d3<d4d5<d6d7<d8d7<d9d:<d;d<<d=d><d?d@<dAdB<dCdD<dEdF<dGdH<dIdJ<dKdL<dMdN<dOdP<dQdR<dSdT<dUdV<dWdX<dYdZ<d[d\<d]d^<d_d`<dadb<dcdd<dedf<dgdhdZ?e?Z@dZAeAZBdZCdZDdZEdZFdZGeGZHeGZIdZJeJZKeJZLdZMeMZNdZOeOZPdZQddZRdZSddZTdZUeUZVeUZWdZXeXZYeXZZdZ[e[Z\e[Z]dZ^e^Z_e^Z`dZaeaZbeaZcdZdedZedZfefZgdZhehZidZjejZkdZlelZmdZnenZoenZpdZqeqZreqZsdZtetZuetZvetZwetZxdZyeyZzeyZ{eyZ|eyZ}dZ~e~ZdZeZdZdZdZdZdZdZdZeZeZdZdZdZeZeZeZdZdZeZdZeZdZdZdZeZeZdZeZeZdZdZdZeZdZeZdZeZdZdZdZdZdZeZdZeZdZdZdZdZdZdZeZdZeZdZeZeZeZeZeZdZeZdZdZRS(NRshttp://backend.userland.com/rsss%http://blogs.law.harvard.edu/tech/rssshttp://purl.org/rss/1.0/s&http://my.netscape.com/rdf/simple/0.9/shttp://example.com/newformat#shttp://example.com/nechoshttp://purl.org/echo/suri/of/echo/namespace#shttp://purl.org/pie/shttp://purl.org/atom/ns#shttp://www.w3.org/2005/Atoms'http://purl.org/rss/1.0/modules/rss091#shttp://webns.net/mvcb/tadmins,http://purl.org/rss/1.0/modules/aggregation/tags)http://purl.org/rss/1.0/modules/annotate/tannotates!http://media.tangent.org/rss/1.0/taudios-http://backend.userland.com/blogChannelModulet blogChannelshttp://web.resource.org/cc/tccs4http://backend.userland.com/creativeCommonsRssModuletcreativeCommonss'http://purl.org/rss/1.0/modules/companytcos(http://purl.org/rss/1.0/modules/content/tcontents&http://my.theinfo.org/changed/1.0/rss/tcps http://purl.org/dc/elements/1.1/tdcshttp://purl.org/dc/terms/tdctermss&http://purl.org/rss/1.0/modules/email/temails&http://purl.org/rss/1.0/modules/event/tevs*http://rssnamespace.org/feedburner/ext/1.0t feedburnershttp://freshmeat.net/rss/fm/tfmshttp://xmlns.com/foaf/0.1/tfoafs(http://www.w3.org/2003/01/geo/wgs84_pos#tgeoshttp://postneo.com/icbm/ticbms&http://purl.org/rss/1.0/modules/image/timages*http://www.itunes.com/DTDs/PodCast-1.0.dtdtituness'http://example.com/DTDs/PodCast-1.0.dtds%http://purl.org/rss/1.0/modules/link/tlshttp://search.yahoo.com/mrsstmedias4http://madskills.com/public/xml/rss/module/pingback/tpingbacks.http://prismstandard.org/namespaces/1.2/basic/tprisms+http://www.w3.org/1999/02/22-rdf-syntax-ns#trdfs%http://www.w3.org/2000/01/rdf-schema#trdfss*http://purl.org/rss/1.0/modules/reference/trefs*http://purl.org/rss/1.0/modules/richequiv/treqvs'http://purl.org/rss/1.0/modules/search/tsearchs&http://purl.org/rss/1.0/modules/slash/tslashs)http://schemas.xmlsoap.org/soap/envelope/tsoaps.http://purl.org/rss/1.0/modules/servicestatus/tsss-http://hacks.benhammersley.com/rss/streaming/tstrs-http://purl.org/rss/1.0/modules/subscription/R{s,http://purl.org/rss/1.0/modules/syndication/tsys)http://purl.org/rss/1.0/modules/taxonomy/ttaxos*http://purl.org/rss/1.0/modules/threading/tthrs*http://purl.org/rss/1.0/modules/textinput/ttis5http://madskills.com/public/xml/rss/module/trackback/t trackbacks$http://wellformedweb.org/commentAPI/twfws%http://purl.org/rss/1.0/modules/wiki/twikishttp://www.w3.org/1999/xhtmltxhtmls$http://www.w3.org/XML/1998/namespacetxmls/http://schemas.pocketsoap.com/rss/myDescModule/tszftlinkR2t wfw_commenttwfw_commentrsstdocsR:R;tcommentstlicenseticontlogottitleR9tinfoRFR8RBRCR7s text/htmlsapplication/xhtml+xmlsutf-8cCs^totiidn|ip7x4|iiD]\}}||i|i                        cCstotiid||fng}|D]\} } || i | fq0~}g}|D]3\} } || | d"jo | i p| fqc~}t |} | i d| i dp|i}t|i||_| i d| i d}|djo d}n|djo |i}n|o"|d#jo||id tescapeiiRcRRR7tnameR:R;twidththeightt_start_(Rstype(sfeedsrsssrdf:RDF(stitleslinks descriptionsname(stitleslinks descriptionsurlshrefswidthsheight(+RRRRRStattrsRRR)R*RR+tattrsDRURQRRRRkRRtappendRtprefixR|RfttrackNamespaceRRRZtendswithtsplitt handle_datatstrattrstfindtsuffixRRRt methodnametgetattrtmethodRatpush(RQRSRRRRRRRRR|RRR*R)((R tunknown_starttagsZ3G %        =# *   cCstotiid|n|iddjo|idd\}}nd|}}|i i ||}|o|d}nd||}yt ||}|Wn$tj o|i||nX|ioD|iido1|ii dd id  od |idRi(RRRRRSRRRRRQRRURRRRatpopRRRZRRRRRR(RQRSRRRR((R tunknown_endtags6  =#     c Cs|ipdSn|i}|djod |}nJ|d d jot|dd}n t|}t|id}|iddi |dS(Nt34t38t39t60t62tx22tx26tx27tx3ctx3es&#%s;itxiisutf-8ii( RRRRRRRRRR( RQRRRttexttinttctunichrtencodeR(RQRRR((R thandle_charrefs    cCs|ipdSntotiid|n|d jod|}nSd}y||Wnt j od|}nXt ||i d }|id d i |dS( Ns"entering handle_entityref with %s tlttgttquottamptaposs&%s;cCstdk}t|do|i|Sn|i|}|ido%|idot|dd!Snt|S(Ntname2codepoints&#t;ii( thtmlentitydefsR`R R)t entitydefsRfRRtord(R)R ((R tname2cp's   sutf-8ii(sltRRRR( RQRRRRRRRRReRRR(RQRRR((R thandle_entityrefs   icCs^|ipdSn|o)|iiddjot|}n|iddi|dS(NRWsapplication/xhtml+xmlii(RQRRRRUR RR(RQRR((R R4s   cCsdS(N((RQR((R thandle_comment<scCsdS(N((RQR((R t handle_pi@scCsdS(N((RQR((R t handle_declDscCstotiidn|i||d!djoe|iid|}|djot |i}n|i t |i|d|!d|dSn|iid|}|d SdS( Nsentering parse_declaration i s iiiRi( RRRRRQtrawdatatiRR)tlenRR (RQRR)((R tparse_declarationGs $ cCsU|i}|djo d}n/|djo d}n|djo d}n|S(NRs text/plainthtmls text/htmlRsapplication/xhtml+xml(t contentTypeR(RQR((R tmapContentTypeSs       cCs|i}||fd jo|i o d|_n|djo|i o d|_n|djo|i o d|_n|iddjod }|}n|ii |o,|i||i |<||i |i|s
sRtbase64tmodes text/htmlusutf-8s iso-8859-1RIRR]RtlinksR;R7R9t_detailR8(5RQRRRR tpiecesRRRURftdepthtpieceRRstoutputtstripWhitespacetstripR!t decodestringtbinasciitErrort Incompletetcan_be_relative_uriRRReRt html_typestcan_contain_relative_urist_resolveRelativeURIsRRtcan_contain_dangerous_markupt _sanitizeHTMLRWtunicodeRRRRtcp1252RRR0R_tcopytdeepcopyRRRRRt _getContexttcontext( RQRR)RRRRR'R:R R%R&R(((R R{s ,=   ! ( (#)F    !    *       cCs|id7_thd|i|id|<d|i<d|i<|_ |i ||i |i d<|i ||dS(NiRWRRR!(RQRR,RRRUtdefaultContentTypeRRRt _isBase64RRSR (RQRSRR;R ((R t pushContentsEcCs/|i|}|id8_|ii|S(Ni(RQRRSR]RRtclear(RQRSR]((R t popContents cCs_|id}|djo?|| }||d}|ii||}|d|}n|S(NRii(RRtcolonposRRRQRRU(RQRR@RR((R t_mapToStandardPrefixs  cCs|i|i|S(N(RRURQRAR(RQRR((R t _getAttribute scCs|idddjodSn|ididodSn|ididodSn|idid odSndS( NR"RR!iRWstext/is+xmls/xml(RRURQRRfR(RQRR((R R< scCs|id|id|idd}|oPy |d=Wntj onXy |d=Wntj onX||d|i|}|ido|i|d|d(RQ((R t _end_sourceLs cCsQ|id|dd|id}|o||ids>s't's"Ru(Rstcompilet IGNORECASER{RRQR5RRRWRR0R1R.tclose(RQR((R R.s!#cCs~g}|D]\}}||i|fq ~}g}|D]3\}}|||djo |ip|fq>~}|S(NRRW(srelstype(RRRR)R*R(RQRRRR)R*((R tnormalize_attrss3GcCstotiid|ng}xb|D]Z\}}t |t djot ||i }n|i t ||i |fq,Wdig}|D]\}}|d||fq~i|i }||ijo|ii dtn|ii dtdS(Ns-_BaseHTMLProcessor, unknown_starttag, tag=%s uu %s="%s"s<%(tag)s%(strattrs)s />s<%(tag)s%(strattrs)s>(RRRRRStuattrsRRNR]RWR5RQRRRsRRRRR4R%tlocals(RQRSRRRRR;R]RN((R Rs #FcCs/||ijo|iidtndS(Ns (RSRQR4R%RR<(RQRS((R RscCs|iidtdS(Ns &#%(ref)s;(RQR%RR<(RQR((R RscCscdk}t|d p|ii|o|iidtn|iidtdS(NR s &%(ref)s;s &%(ref)s( R R`R RZRRQR%RR<(RQRR ((R Rs $cCs3totiid|n|ii|dS(Ns)_BaseHTMLProcessor, handle_text, text=%s (RRRRRRQR%R(RQR((R RscCs|iidtdS(Ns(RQR%RR<(RQR((R RscCs|iidtdS(Ns (RQR%RR<(RQR((R RscCs|iidtdS(Ns (RQR%RR<(RQR((R Rss-zA-Z][-_.a-zA-Z0-9:]*\s*cCs|i}t|}||jodSn|i||}|oQ|i}|i }|t||jodSn|i |i fSn|i|dSdS(Ni(Ni(Ni(Ni(RQRRtnRRkt_new_declname_matchtmRuRwR*RRtendR(RQRt declstartposRR?R=RwR((R t _scan_name's      cCs1dig}|iD]}|t|q~S(s(Return processed HTML as a single stringRN(RsRRRQR%tpR(RQRRRC((R R(8s(RRR4RR2R5R.R:RRRRRRRRRsR7R3R>RBR((((R R$s"-            t_LooseFeedParsercBs#tZdZdZdZRS(NcCs*tii|ti||||dS(N(R0R1RRQRRRR(RQRRR((R R=scCsm|idd}|idd}|idd}|idd}|idd}|idd}|id d }|id d }|id d }|idd }|idd}|idd}|iido~|iiddid o^|idd}|idd}|id d}|id d}|idd}n|S(Ns<s<s<s<s>s>s>s>s&s&s&s"s"s"s's's'RWRRRRRR6(RRRQRRZRUR(RQRR((R RAs&3cCs,dig}|D]}|d|q~S(NRs %s="%s"(RsRRRR(RQRRRR((R RVs(RRRRR(((R RD<s  t_RelativeURIResolvercBsttZd d!d"d#d$d%d&d'd(d)d*d+d,d-d.d/d0d1d2d3d4d5d6d7d8gZdZdZdZRS(9NtaR;tapplettcodebaseR%t blockquotetcitetbodyt backgroundtdeltformtactionR)tlongdescRtiframetheadtprofileR+tusemapR,tinsRtobjecttclassidRtqtscriptcCsti||||_dS(N(R$RRQRR(RQRR((R RtscCst|i|S(N(RRQRR|(RQR|((R RxscCsy|i|}g}|D]?\}}||||f|ijo|i|p|fq~}t i |||dS(N( RQR:RRRRNR]RSt relative_urisRR$R(RQRSRR]RRRN((R R{sS(RFshref(RGRH(sareashref(RIRJ(RKRL(RMRJ(RNsaction(sframeRP(sframessrc(RQRP(RQssrc(sheadRS(simgRP(simgssrc(simgRT(sinputssrc(sinputRT(RURJ(slinkshref(sobjectRW(sobjectRH(sobjectsdata(sobjectRT(RXRJ(RYssrc(RRRZRRR(((R REYsQ  cCsAtotiidnt||}|i||i S(Nsentering _resolveRelativeURIs ( RRRRREtbaseURIRRCR.t htmlSourceR((R\R[RRC((R R2s  t_HTMLSanitizercHBstZddddddddd d d d d ddddddddddddddddddd d!d"d#d$d%d&d'd(d)d*d+d,d-d.d/d0d1d2d3d4d5d6d7d8d9d:d;d<d=d>d?d@dAdBdCdDdEdFdGgGZddHdIdJdKdLdMdNdOdPdQdRdSdTdUd dVdWdXdYdZd[d\d]dd^d_d`dadbdcdddedfdgdhd(didjdkdldmdndodpdqdrdsdtdudvdwdxdydzd{d|d}d7d~ddddddddddddgHZddgZdZdZdZdZdZ dZ RS(NRFtabbrtacronymtaddressR%tbtbigRIR'tbuttontcaptiontcenterRJtcodeR(tcolgrouptddRMtdfntdirtdivtdltdttemtfieldsettfontRNth1th2th3th4th5th6R*RR+R,RUtkbdRtlegendtliRttmenutoltoptgrouptoptionRCtpreRXRwtsamptselecttsmalltspantstriketstrongR{tsupttablettbodyttdttextareattfoottthttheadttrttttutultvartacceptsaccept-charsett accesskeyROtaligntalttaxistbordert cellpaddingt cellspacingR tcharofftcharsettcheckedtclassR>tcolstcolspantcolortcompacttcoordstdatetimetdisabledtenctypetforR)theadersRR;threflangthspaceR2tismapRRPt maxlengthRRtmultipleRtnohreftnoshadetnowraptprompttreadonlyRtrevtrowstrowspantrulestscopetselectedtshapetsizeRtstartR9ttabindexttargetRRWRTtvalignR]tvspaceRsxml:langRYRGcCsti|d|_dS(Ni(R$R2RQtunacceptablestack(RQ((R R2s cCs||ijo+||ijo|id7_ndSn|i|}g}|D]-\}}||i jo|||fqUqU~}t i |||dS(Ni( RSRQtacceptable_elementst"unacceptable_elements_with_end_tagRR:RRRRNR]tacceptable_attributesR$R(RQRSRR]RRRN((R RsAcCsO||ijo+||ijo|id8_ndSnti||dS(Ni(RSRQRRRR$R(RQRS((R Rs cCsdS(N((RQR((R RscCsdS(N((RQR((R RscCs"|ipti||ndS(N(RQRR$RR(RQR((R Rs ( RRRRRR2RRRRR(((R R]s     c st|}|i||i}toxd}xrt D]j}yZ|djodk l d}Pn,|djodkld}PnWq9q9Xq9W|ot|tdj}|o|id}n||d d d d d d dd}|ot|d}n|idoD|idd d }|ido|idd d }qrn|ido|idd d }qqn|iidd}|S(NR(s parseStringc st||S(N(Rt_utidyRtkwargs(RR(R(R t_tidysR(sTidyc s"i||\}}}}|S(N(t_mxtidyttidyRRtnerrorst nwarningst errordata(RRRRR(R(R Rsusutf-8t output_xhtmlitnumeric_entitiestwrapit char_encodingtutf8stZdZdZdZeZeZeZdZRS(NcCsc|ddjo*|djo|i|||||Snti|||i }||_ |S(Nidii0( RfRQthttp_error_302treqtfptmsgRturllibt addinfourlt get_full_urltinfourltstatus(RQRRRfRRR((R thttp_error_defaults  cCst|iido%tii||||||}nt i |||i}t|dp ||_n|S(NtlocationR(RR+RZturllib2tHTTPRedirectHandlerRRQRRRfRRRRRR`R(RQRRRfRRR((R Rs % cCst|iido%tii||||||}nt i |||i}t|dp ||_n|S(NRR(RR+RZRRthttp_error_301RQRRRfRRRRRR`R(RQRRRfRRR((R Rs % c Csti|id}ytiiddjpttdjptti |i diddid\} } tid|dd}|i||| | |id |||}|i|SWn |i|||||SnXdS( Niis2.3.3t Authorizationt Rsrealm="([^"]*)"sWWW-Authenticateswww-authenticate(R}RRthostRRRRgR!RkR+RtusertpasswRstfindalltrealmRQt add_passwordthttp_error_auth_reqedtretrytreset_retry_countRRRfR( RQRRRfRRRRRRR((R thttp_error_401s !2 ( RRRRRthttp_error_300thttp_error_303thttp_error_307R(((R Rs  c Cst|do|Sn|djo tiSnti|dd1jo|p t}nd0}t ot i |\}} t i| \} } | oLt i| \}} |o,d|| | f}t i|i}qqnti|} | id||o| id |n|od d d d dddg}ddddddddddddg } | idd||d|d | |d!d!|d|d"|d#|d$fn|o| id%|ntoto| id&d'nGto| id&d(n,to| id&d)n| id&d*|o| id+d,|nto| id-tn| id.d/t ti!t"t#g|} g| _&z| i'| SWd0| i(Xnyt'|SWnnXt)t*|S(2s8URL, filename, or string --> stream This function lets you define parsers that take any input source (URL, pathname to local or network file, or actual data as a string) and deal with it in a uniform manner. Returned object is guaranteed to have all the basic stdio read methods (read, readline, readlines). Just .close() the object when you're done with it. If the etag argument is supplied, it will be used as the value of an If-None-Match request header. If the modified argument is supplied, it must be a tuple of 9 integers as returned by gmtime() in the standard Python time module. This MUST be in GMT (Greenwich Mean Time). The formatted date/time will be used as the value of an If-Modified-Since request header. If the agent argument is supplied, it will be used as the value of a User-Agent request header. If the referrer argument is supplied, it will be used as the value of a Referer[sic] request header. If handlers is supplied, it is a list of handlers used to build a urllib2 opener. treadt-ithttpthttpstftps %s://%s%ss User-Agents If-None-MatchtMontTuetWedtThutFritSattSuntJantFebtMartAprtMaytJuntJultAugtSeptOcttNovtDecsIf-Modified-Sinces#%s, %02d %s %04d %02d:%02d:%02d GMTiiiiiitReferersAccept-encodings gzip, deflatetgziptdeflateRRsBasic %stAcceptsA-IMR.N(shttpshttpssftp(+R`turl_file_stream_or_stringRtstdinR}tagentt USER_AGENTRktauthR!Rt splittypeturltypetrestt splithosttrealhostt splitusert user_passwdt encodestringR*RtRequesttrequestt add_headertetagR<tshort_weekdaystmonthstreferrerRtzlibt ACCEPT_HEADERtapplyt build_openerttupleRthandlerstopenert addheaderstopenR9t _StringIOR(RR!R<RR$R*RR"RRR+RR#RR((R t_open_resourcesd   !*U" cCstid|dS(sLRegister a date handler function (takes string, returns 9-tuple date in GMT)iN(t_date_handlerstinserttfunc(R2((R tregisterDateHandlersss YYYY-?MM-?DDsYYYY-MMs YYYY-?OOOs YY-?MM-?DDsYY-?OOOtYYYYs-YY-?MMs-OOOs-YYs--MM-?DDs--MMs---DDtCCs(?P\d{4})tYYs(?P\d\d)tMMs(?P[01]\d)tDDs(?P[0123]\d)tOOOs(?P[0123]\d\d)s(?P\d\d$)s$(T?(?P\d{2}):(?P\d{2})s(:(?P\d{2}))?s6(?P[+-](?P\d{2})(:(?P\d{2}))?|Z)?)?c Csd}x&tD]}||}|oPq q W|pdSn|idjodSn|i} | idd}|ot |}nd}| idd} | p | djot i d} nLt| djo,dt t i ddt | } n t | } | idd } | p | d jo%|o d } qft i d } nt | } | id d}|ph|o |}q| id dp&| iddp| iddo d }qt i d}n t |}d | ijo t | d d dd } nx>d ddddgD]'}| i|dpd| |(\d{4})-(\d{2})-(\d{2})\s+(%s|%s)\s+(\d{,2}):(\d{,2}):(\d{,2})cCsti|}|pdSndhd|id<d|id<d|id<d |id <d |id <d |id<dd<}totii d|nt |S(s8Parse a string according to the OnBlog 8-bit date formatNsE%(year)s-%(month)s-%(day)sT%(hour)s:%(minute)s:%(second)s%(zonediff)sR;iR<iR=iR?iR@iRAitzonediffs+09:00sOnBlog date parsed as: %s ( t_korean_onblog_date_reR3RIR?Rut w3dtfdateRRRRt_parse_date_w3dtf(RIRWR?((R t_parse_date_onblogscCs)ti|}|pdSnt|id}|id}|tjo|d7}nt |}t |djod|}ndhd|id<d |id <d |id <d |<d|id<d|id<dd<}t ot iid|nt|S(s6Parse a string according to the Nate 8-bit date formatNiii it0sE%(year)s-%(month)s-%(day)sT%(hour)s:%(minute)s:%(second)s%(zonediff)sR;R<iR=iR?R@iRAiRUs+09:00sNate date parsed as: %s (t_korean_nate_date_reR3RIR?RRuR?tampmt _korean_pmRRRWRRRRRX(RIRWR?R\R?((R t_parse_date_nates  vs9(\d{4})-(\d{2})-(\d{2})\s+(\d{2}):(\d{2}):(\d{2})(\.\d+)?cCsti|}|pdSndhd|id<d|id<d|id<d |id <d |id <d |id<dd<}totii d|nt |S(s2Parse a string according to the MS SQL date formatNsE%(year)s-%(month)s-%(day)sT%(hour)s:%(minute)s:%(second)s%(zonediff)sR;iR<iR=iR?iR@iRAiRUs+09:00sMS SQL date parsed as: %s ( t_mssql_date_reR3RIR?RuRWRRRRRX(RIRWR?((R t_parse_date_mssqlsuΙανuJanuΦεβuFebuΜάώuMaruΜαώuΑπρuApruΜάιuMayuΜαϊuΜαιuΙούνuJunuΙονuΙούλuJuluΙολuΑύγuAuguΑυγuΣεπuSepuΟκτuOctuΝοέuNovuΝοεuΔεκuDecuΚυρuSunuΔευuMonuΤριuTueuΤετuWeduΠεμuThuuΠαρuFriuΣαβuSatuL([^,]+),\s+(\d{2})\s+([^\s]+)\s+(\d{4})\s+(\d{2}):(\d{2}):(\d{2})\s+([^\s]+)cCsti|}|pdSny*t|id}t|id}Wn dSnXdhd|<d|id<d|<d |id <d |id <d |id<d|id<d|id<}t ot i i d|nt|S(s6Parse a string according to a Greek 8-bit date format.NiisP%(wday)s, %(day)s %(month)s %(year)s %(hour)s:%(minute)s:%(second)s %(zonediff)stwdayR=iR<R;iR?iR@iRAiRUisGreek date parsed as: %s (t_greek_date_format_reR3RIR?t _greek_wdaysRuRat _greek_monthsR<t rfc822dateRRRRt_parse_date_rfc822(RIR?R<ReRa((R t_parse_date_greekEsujanuáru01u februáriu02umárciusu03uáprilisu04umáujusu05ujúniusu06ujúliusu07u augusztusu08u szeptemberu09uoktóberu10unovemberu11udecemberu12u?(\d{4})-([^-]+)-(\d{,2})T(\d{,2}):(\d{2})((\+|-)(\d{,2}:\d{2}))cCs'ti|}|pdSnywt|id}|id}t|djod|}n|id}t|djod|}nWn dSnXdhd|id<d |<d |<d |<d |id <d|id<}t ot i id|nt|S(s:Parse a string according to a Hungarian 8-bit date format.NiiiRZis:%(year)s-%(month)s-%(day)sT%(hour)s:%(minute)s%(zonediff)sR;R<R=R?R@iRUisHungarian date parsed as: %s (t_hungarian_date_format_reR3RIR?t_hungarian_monthsRuR<R=RR?RWRRRRRX(RIR?RWR?R<R=((R t_parse_date_hungarianjs$[c Csd}d}d} d}d}ti|} d|}d||f}ti|}|i |}|djp|i|jodSn||||d } | ddjodSntiti| | |tiS( Nc Cst|id}|djo,dttiddt|}n|djod Sn|id}|ot|}|dd}|dd}d}x||joti |||ddddddf }ti|d}t ||}||jo/||jo||}qz|d}d }q||jo-||d jo||}qz|d}qqW|||fSn|id }d}|djo d}n9t|}|id }|ot|}nd}|||fS(NR;idiitjulianiiiiiR<R=(iii(RR?RuR;RLRMRkR<R=RktjdayRSRtabstdiff(R?RkR;R<RlRRnR=((R t__extract_datesH ,   *        cCs|pdSn|id}|pdSnt|}t|id}|id}|ot|}nd}|||fS(Nithourstminutestseconds(iii(iii(R?RuRpRRqRr(R?RpRqRr((R t__extract_times cCs|pdSn|id}|pdSn|djodSnt|id}|id}|ot|}nd}|d|d}|ddjo | Sn|S( sAReturn the Time Zone Designator as an offset in seconds from UTC.ittzdREttzdhourst tzdminutesi<RFN(R?RuRtRRpRqtoffset(R?RpRtRwRq((R t __extract_tzds"  sd(?P\d\d\d\d)(?:(?P-|)(?:(?P\d\d\d)|(?P\d\d)(?:(?P=dsep)(?P\d\d))?))?s;(?P[-+](?P\d\d)(?::?(?P\d\d))|Z)sW(?P\d\d)(?P:|)(?P\d\d)(?:(?P=tsep)(?P\d\d(?:[.,]\d+)?))?s %s(?:T%s)?i(iii(RoRsRxt __date_ret__tzd_reRsR7t__tzd_rxt __time_ret __datetime_ret __datetime_rxR3RIR?RkRutgmtRLRMRSttimezone( RIRzRsR}R?RyRoR~R|RR{Rx((R RXs  )    cCs|i}|dddjp|ditijo |d=nt|djof|d}|id}|djo || ||dg|d)n|i d d i |}nt|d jo|d 7}nti |}|otiti|Snd S(s8Parse an RFC822, RFC1123, RFC2822, or asctime-style dateiit,t.iiRFiRRis 00:00:00 GMTN(RR(RIRRRtrfc822t _daynamesRRwRRRRst parsedate_tzRQRLRMt mktime_tz(RIRRwRQR((R Rfs  /     tATiptETi tCTitMTiDtPTicCsxtD]}yg||}|pwnt|djo%totiidnt nt t ||SWqt j o7}to'tiid|it|fqqXqWdS(s6Parses a variety of date formats into a 9-tuple in GMTi s*date handler function must return 9-tuple s %s raised %s N(R0RRIt date9tupleRRRRRt ValueErrorRtRt ExceptionteRRRk(RIRRR((R Rs$   ' c Csd}d}d}d}||id\} }yj|d djot |}n.|d djo"d}t |di d}nt |djoK|d d jo:|d d!d jo&d}t |d di d}n|d d jo"d }t |d i d}njt |djoK|d djo:|d d!d jo&d }t |d d i d}n |d djo"d}t |di d}n|d djo"d}t |di d}n|d djo&d}t |ddi d}no|d djo&d}t |ddi d}n8|d djo&d}t |ddi d}nt idi|} Wn d0} nX| o8| idi}|o|d1jo |}qnd} d2}d3}| |jp | id*o.| id+od,} |p |pd}n| |jp | id-o'| id+od,} |pd.}nX| id-o|pd.}n7|o"|id o|pd/}n|pd}||||| fS(4s Get the character encoding of the XML document http_headers is a dictionary xml_data is a raw string (not Unicode) This is so much trickier than it sounds, it's not even funny. According to RFC 3023 ('XML Media Types'), if the HTTP Content-Type is application/xml, application/*+xml, application/xml-external-parsed-entity, or application/xml-dtd, the encoding given in the charset parameter of the HTTP Content-Type takes precedence over the encoding given in the XML prefix within the document, and defaults to 'utf-8' if neither are specified. But, if the HTTP Content-Type is text/xml, text/*+xml, or text/xml-external-parsed-entity, the encoding given in the XML prefix within the document is ALWAYS IGNORED and only the encoding given in the charset parameter of the HTTP Content-Type header should be respected, and it defaults to 'us-ascii' if not specified. Furthermore, discussion on the atom-syntax mailing list with the author of RFC 3023 leads me to the conclusion that any document served with a Content-Type of text/* and no charset parameter must be treated as us-ascii. (We now do this.) And also that it must always be flagged as non-well-formed. (We now do this too.) If Content-Type is unspecified (input was local file or non-HTTP source) or unrecognized (server just got it totally wrong), then go by the encoding given in the XML prefix of the document and default to 'iso-8859-1' as per the HTTP specification (RFC 2616). Then, assuming we didn't find a character encoding in the HTTP headers (and the HTTP Content-type allowed us to look in the body), we need to sniff the first few bytes of the XML data and try to determine whether the encoding is ASCII-compatible. Section F of the XML specification shows the way here: http://www.w3.org/TR/REC-xml/#sec-guessing-no-ext-info If the sniffed encoding is not ASCII-compatible, we need to make it ASCII compatible so that we can sniff further into the XML declaration to find the encoding attribute, which will tell us the true encoding. Of course, none of this guarantees that we will be able to parse the feed in the declared character encoding (assuming it was declared correctly, which many are not). CJKCodecs and iconv_codec help a lot; you should definitely install them if you can. http://cjkpython.i18n.org/ cCsD|pd}ti|\}}||iddiddfS(s takes HTTP Content-Type header and returns (content type, charset) If no charset is specified, returns (content type, '') If no content type is specified, returns ('', '') Both return parameters are guaranteed to be lowercase strings RRR6N(t content_typetcgit parse_headerRKRUR(RRK((R t_parseHTTPContentType; s Rs content-typeisLot<?sutf-16besutf-8ists<?sutf-16lestisiso-10646-ucs-2sucs-2t csunicodesiso-10646-ucs-4sucs-4tcsucs4sutf-16sutf-32tutf_16tutf_32tutf16tu16sapplication/xmlsapplication/xml-dtds&application/xml-external-parsed-entitystext/xmlstext/xml-external-parsed-entitys application/s+xmlistext/sus-asciis iso-8859-1N( siso-10646-ucs-2sucs-2Rsiso-10646-ucs-4sucs-4Rsutf-16sutf-32sutf_16Rsutf16su16(sapplication/xmlsapplication/xml-dtds&application/xml-external-parsed-entity(stext/xmlstext/xml-external-parsed-entity(Rtsniffed_xml_encodingt xml_encodingt true_encodingt http_headersRUthttp_content_typet http_encodingtxml_dataRyR5RRRsR7R3txml_encoding_matchRktgroupsRtacceptable_content_typetapplication_content_typesttext_content_typesRfRRZ( RRRRRRRRRRRR((R t_getCharacterEncoding sv. 8 8     -- cCstotiid|nt|djou|d djod|dd!djoPto5tiid|djotiidqnd}|d}nt|djou|d d jod|dd!djoPto5tiid|d jotiid qnd }|d}n$|d d joPto5tiid|djotiidq|nd}|d }n|d djoPto5tiid|djotiidqnd}|d}nb|d djoPto5tiid|djotiidq>nd}|d}nt||}totiid|nt i d}d}|i |o|i||}n|d|}|idS(sChanges an XML data stream on the fly to specify a new encoding data is a raw sequence of bytes (not Unicode) that is presumed to be in %encoding already encoding is a string recognized by encodings.aliases s%entering _toUTF8, trying encoding %s iisRsstripping BOM sutf-16bestrying utf-16be instead ssutf-16lestrying utf-16le instead issutf-8strying utf-8 instead Rsutf-32bestrying utf-32be instead ssutf-32lestrying utf-32le instead s*successfully converted %s data to unicode s^<\?xml[^>]*?>s&u N(RRRRRRRR5tnewdataRsR7t declmatchtnewdeclRR{R(RRRRR((R t_toUTF8 s^8 8    cCstidti}|id|}tidti}|i|}|o |dpd}|i i do d}nd}|id|}||fS(sStrips DOCTYPE from XML document, returns (rss_version, stripped_data) rss_version may be 'rss091n' or None stripped_data is the same XML document, minus the DOCTYPE s]*?)>Rs]*?)>itnetscapeRN(RsR7t MULTILINEtentity_patternR{Rtdoctype_patternRtdoctype_resultstdoctypeRRRRk(RRRRRR((R t _stripDoctype s cCst} t| dy0d"} | i4| t5|| }d}}WqqXn| oKd#| jo>y0d#} | i4| t5|| }d}}WqiqiXn|p:d| dt?}|iAt<i=iBiCd|iD||iE|t<i=iFiG} | iIt|t|d'o|iJi4hd(d)<ny|iK| Wq;tj oe}tLo1d*kM}|iN|iOtPiQiRd+nd| d<|iSp|| d