m y MDc@sdZddgZdgZdZdZdZdZddgZd kZd k Z y6d k Z eod k Z d e i _nd Z Wnd Z d Z nXd eifdYZdefdYZddZhdd<dd<dd<dd<dds&Aaron Swartz s#Sam Ruby tBSDs0.25ituTidytmxTidyNicCsti|dS(Ntencoding(tchardettdetecttdata(R((tF/home/users/jdub/public_html/bzr/planet/devel/trunk/planet/sanitize.pyt scCsdS(N(tNone(R((RR#st_BaseHTMLProcessorc BstZddddddddd d d d d g ZeideiZeidZeidZdZ dZ dZ dZ dZ dZdZdZdZdZdZdZdZeidiZdZd ZRS(!NtareatbasetbasefonttbrtcoltframethrtimgtinputtisindextlinktmetatparamscCs?||_totiid|intii|dS(Ns(entering BaseHTMLProcessor, encoding=%s ( Rtselft_debugtsyststderrtwritetsgmllibt SGMLParsert__init__(RR((RR-s cCsg|_tii|dS(N(RtpiecesRRtreset(R((RR!2s cCsG|id}||ijod|dSnd|d|dSdS(Nits>(tmatchtgroupttagRtelements_no_end_tag(RR$R&((Rt_shorttag_replace6scCs|iid|}|iid|}|ii|i|}|io/t|tdjo|i |i}nt i i ||dS(Ns<!\1s&u( Rt _r_barebangtsubRt _r_bareampt _r_shorttagR(RttypetencodeRRtfeed(RR((RR/=s #cCs~g}|D]\}}||i|fq ~}g}|D]3\}}|||djo |ip|fq>~}|S(NtrelR-(srelstype(t_[1]tattrstktvtlower(RR2R1R3R4((Rtnormalize_attrsEs3GcCstotiid|ng}xb|D]Z\}}t |t djot ||i }n|i t ||i |fq,Wdig}|D]\}}|d||fq~i|i }||ijo|ii dtn|ii dtdS(Ns-_BaseHTMLProcessor, unknown_starttag, tag=%s uu %s="%s"s<%(tag)s%(strattrs)s />s<%(tag)s%(strattrs)s>(RRRRR&tuattrsR2tkeytvalueR-tunicodeRRtappendtjoinR1R.tstrattrsR'R tlocals(RR&R2R1R=R7R9R8((Rtunknown_starttagKs #FcCs/||ijo|iidtndS(Ns (R&RR'R R;R>(RR&((Rtunknown_endtag\scCs|iidtdS(Ns &#%(ref)s;(RR R;R>(Rtref((Rthandle_charrefbscCs|iidtdS(Ns &%(ref)s;(RR R;R>(RRA((Rthandle_entityrefgscCs3totiid|n|ii|dS(Ns)_BaseHTMLProcessor, handle_text, text=%s (RRRRttextRR R;(RRD((Rt handle_datalscCs|iidtdS(Ns(RR R;R>(RRD((Rthandle_commentsscCs|iidtdS(Ns (RR R;R>(RRD((Rt handle_pixscCs|iidtdS(Ns (RR R;R>(RRD((Rt handle_decl}ss-zA-Z][-_.a-zA-Z0-9:]*\s*cCs|i}t|}||jodSn|i||}|oQ|i}|i }|t||jodSn|i |i fSn|i|dSdS(Ni(Ni(Ni(Ni(RtrawdatatlentntiR t_new_declname_matchtmR%tststriptnameR5tendRE(RRLt declstartposRQRNRKRORI((Rt _scan_names      cCs1dig}|iD]}|t|q~S(s(Return processed HTML as a single stringtN(R<R1RR tptstr(RR1RV((Rtoutputs(t__name__t __module__R'tretcompilet IGNORECASER)R+R,RR!R(R/R6R?R@RBRCRERFRGRHR$RMRTRX(((RR %s(-              t_HTMLSanitizercGBs tZddddddddd d d d d ddddddddddddddddddd d!d"d#d$d%d&d'd(d)d*d+d,d-d.d/d0d1d2d3d4d5d6d7d8d9d:d;d<d=d>d?d@dAdBdCdDdEdFdGgGZddHdIdJdKdLdMdNdOdPdQdRdSdTdUd dVdWdXdYdZd[d\d]dd^d_d`dadbdcdddedfdgdhd(didjdkdldmdndodpdqdrdsdtdudvdwdxdydzd{d|d}d7d~dddddddddddgGZdddgZdZdZdZdZdZ dZ dZ RS(NtatabbrtacronymtaddressR tbtbigt blockquoteRtbuttontcaptiontcentertcitetcodeRtcolgrouptddtdeltdfntdirtdivtdltdttemtfieldsettfonttformth1th2th3th4th5th6RRLRRtinstkbdtlabeltlegendtlitmaptmenutoltoptgrouptoptionRVtpretqROtsamptselecttsmalltspantstriketstrongR*tsupttablettextareattbodyttdttfoottthttheadttrttttutultvartacceptsaccept-charsett accesskeytactiontaligntalttaxistbordert cellpaddingt cellspacingtchartcharofftcharsettcheckedtclasstcleartcolstcolspantcolortcompacttcoordstdatetimetdisabledtenctypetforRtheaderstheightthrefthreflangthspacetidtismaptlangtlongdesct maxlengthtmediatmethodtmultipleRQtnohreftnoshadetnowraptprompttreadonlyR0trevtrowstrowspantrulestscopetselectedtshapetsizetsrctstarttsummaryttabindexttargetttitleR-tusemaptvalignR9tvspacetwidthtscripttapplettstylecCs#ti|g|_d|_dS(Ni(R R!Rt tag_stackt ignore_level(R((RR!s  cCs?ti||x(|ioti||iiqWdS(N(R R/RRRR@tpop(RR((RR/s cCs||ijo|id7_dSn|iodSn||ijo|i|}g}|D]-\}}||i jo|||fqcqc~}||i jo|i i |nti|||ndS(Ni(R&Rtignorable_elementsRtacceptable_elementsR6R2R1R8R9tacceptable_attributesR'RR;R R?(RR&R2R9R1R8((RR?s AcCs||ijo|id8_dSn|iodSn||ijo~||ijont}xF|io;|ii }||jo t }Pnt i ||qbW|ot i ||qndS(Ni(R&RRRRR'tFalseR$RRttoptTrueR R@(RR&RR$((RR@s     cCsdS(N((RRD((RRGscCsdS(N((RRD((RRHscCs4|ip&|idd}ti||ndS(NR"RU(RRRDtreplaceR RE(RRD((RREs ( RYRZRRRR!R/R?R@RGRHRE(((RR^s       tutf8c st|}|i||i}toxd}xrt D]j}yZ|djodk l d}Pn,|djodkld}PnWq9q9Xq9W|ot|tdj}|o|id}n||d d d d d d dd}|ot|d}n|idoD|idd d }|ido|idd d }qrn|ido|idd d }qqn|iidd}|S(NR(s parseStringc st||S(N(RWt_utidyRtkwargs(RR(R(Rt_tidysR(sTidyc s"i||\}}}}|S(N(t_mxtidyttidyRRtnerrorst nwarningst errordata(RRRRR(R(RRsusutf-8t output_xhtmlitnumeric_entitiestwrapit char_encodingRsd?d@dAdBdCdDdEdFdGdHdIdJdKdLdMdNdOdPdQdRdSdTdUdVdWdXdYdZd[d\d]d^d_d`dadbdcdddedfdgdhdidjdkdldmdndodpdqdrdsdtdudvdwdxdydzd{d|d}d~ddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddf}dk}|iditttdditt|an|i tSdS(Niiiiii iiiiii i i iiiiiiiiiiiiiiiiiiiiiiii iiiiiiiiiiiiiiiiiiiiiiiiiii iiiiiiiiii[i.i<i(i+i!i&iiiiiiiiii]i$i*i)i;i^i-i/iiiiiiiii|i,i%i_i>i?iiiiiiiiii`i:i#i@i'i=i"iiaibicidieifigihiiiiiiiiiijikiliminioipiqiriiiiiiii~isitiuiviwixiyiziiiiiiiiiiiiiiiiiiiiiii{iAiBiCiDiEiFiGiHiIiiiiiii}iJiKiLiMiNiOiPiQiRiiiiiii\iiSiTiUiViWiXiYiZiiiiiii0i1i2i3i4i5i6i7i8i9iiiiiiRUi( t_ebcdic_to_ascii_maptemaptstringt maketransR<RtchrtrangeROt translate(RORR((Rt_ebcdic_to_ascii!s  @cCsdx]t|D]O\}}|djo||djotSq\q |||jotSq q WtS(Nt#t(t enumeratetbomRLtcRDRR(RDR R RL((Rt_startswithbom;s    cCs9x2|iD]$\}}t||o|Sq q WdS(N(tbom_mapt iteritemsR RR RDR (RDRRR ((Rt _detectbomEs   csgd}||pg|tpT|o|ttp7|tp$|dp|dp |dS(sd Takes a string text of unknown encoding and tries to provide a Unicode string for it. csk|o`|joS|djotSnyt|SWntj onXi|ndS(NR(Rt_triedEncodingsRRDR:tUnicodeDecodeErrorR;(R(RDR(Rt tryEncodingQs Rs windows-1252s iso-8859-1N(RRtguessRRDtisXMLt xml_bom_mapt_chardet(RDRRRR((RDRRt charactersKs (t__doc__t __author__t__contributors__t __license__t __version__RRRRR[Rtchardet.constantst constantsRR RR R^Rtunicode_bom_mapRRRR RRR(RRRR R^RRR[RRRRRRRR RRR R((Rt?s6       uO '3<