m uEc@sdZdZddgZdZdkZdkZdkZdkZdkZy dk Z Wndk Z nXdddd d d d fZ dk Z dk Z dkZdkZdkZyd klZWndZnXdeZedeiZdZdZdZdZe idZy eiWneie_nXdZdZdZdZ dZ!dZ"dZ#dei$fdYZ%dZ&d fdYZ'd ei(fdYZ)d ei(fd YZ*dS(!sPlanet aggregator library. This package is a library for developing web sites or software that aggregate RSS, CDF and Atom feeds taken from elsewhere into a single, combined feed. s2.0s(Scott James Remnant s Jeff Waugh tPythonNtcachet feedparserthtmltmpltloggingtPlanettChanneltNewsItem(sescapecCs(|iddiddiddS(Nt&s&t>s>t|iid|o#|iid|d|ddSn|SdS(s<Get a template value from the configuration, with a default.trawtvarsRN( RR1t has_optionttemplatetoptiontgetR<R9tdefault(RR?R@RBR<R=((R ttmpl_config_gets ##Rc Cs|i|dtdd}t|i|dt}|o!ti tid|}nd}h}g}xv|i ddD]b}t||||<|i|||oS|idd} t| djp| di|jod |||d failed(Rt getLoggertlogR#RR1R>RAR6RJR8t planet_namet planet_linkR4R:tsectionstfeed_urlttemplate_filesRRPt subscribetofflineRUtupdatetKeyboardInterruptt exception(RRsRtRwRyRrRvRP((R truns, " cCs{tid} xe|D]]} ti} | id| y| i | }Wn1ti j o"| i t i i| }nX|i| dt}|i| dtdd}|i| dt} t i it i i| d}t i i||}t i i||}|i| \} }|i | | }ti"d d}|i$d ||i$d ||i$d t%|i$d ||i$d||i$d||i$d||i$d||oA|i$d||i$d|i*ddjodpdnt+i,}|i$dt+i.|||i$dt+i.t/||i$dt+i.t0|y| id|t1|d}| i3d$jo|i4|i5|nz| i3d%jo5|i5|i6d}|i4|i8d d!n2|i5|i6d}|i4|i8| d"|i9Wqt:j o q| i;d#|qXqWdS(&Ns planet.runnersProcessing template %st output_dirR-R<itencodingit html_escapetItemstChannelst generatortnametlinkt owner_namet owner_emailRnRtfeedtypetrsstatomR*tdate_isotdate_822s Writing %stwsutf-8tutf8txmlthtmltsgmltasciitxmlcharrefreplaceR sWrite of %s failed(sutf-8sutf8(RRR(<RRqRrRwRHRtTemplateManagertmanagerR#tprepareR?t TemplateErrortostpathtbasenameRRCt OUTPUT_DIRR~RIR-tENCODINGRtsplitexttbasetjoinRtRnt output_fileRWRNRORoR`tTemplateProcessorttptsettVERSIONRsRRt planet_feedtfindR+RLR*R,R.R/topent output_fdtlowertwritetprocesstdecodeRtencodeRR{R|(RRwRsRtRRRRR`RHRRNRRrRRR~R?RR*R-RnROR((R tgenerate_all_filessd     "1 icCsg}xC|iD]8}|p|id o|i|i|fqqW|o|i ng}|D]}||dql~S(sReturn the list of channels.REiN( RNRR2RPRERmRQRRFtsortt_[1]tc(RRERFRRRNRP((R RNEs cCs3x,|iD]!}||ijo|Sq q WdS(N(RR2RPRtcache_basename(RRRP((R tfind_by_basenameQs cCs|ii|dS(s$Subscribe the planet to the channel.N(RR2RQRP(RRP((R RxUscCs|ii|dS(s(Unsubscribe the planet from the channel.N(RR2tremoveRP(RRP((R t unsubscribeYscCs\d }|ioti|iti}nd } |ioti|iti} ng}h} |p|i d|dd}nx|D]} x| iiD]}|p|id od } | ioti| iti} nd }| ioti| iti}n|p| p| p|o6d}|ido |i}n|id} n|o+|i|p |i| pqqn| o+| i|p| i| oqqn| o+| i|p | i| pqqn|o+|i|p|i| oqq4n| i|ip6d| |i<|iti|i|i|fqqqWqW|o|i|i nt!|o|o|| }nt!|ob|o[d}|dd|d}x<|D]0}|d|jo|d7}q|| }PqWng}|D]}||d qD~S( sReturn an optionally filtered list of items in the channel. The filters are applied in the following order: If hidden is true then items in hidden channels and hidden items will be returned. If sorted is true then the item list will be sorted with the newest first. If max_items is non-zero then this number of items, at most, will be returned. If max_days is non-zero then any items older than the newest by this number of days won't be returned. Requires sorted=1 to work. The sharp-eyed will note that this looks a little strange code-wise, it turns out that Python gets *really* slow if we try to sort the actual items themselves. Also we use mktime here, but it's ok because we discard the numbers and just need them to be relatively consistent between each other. RERFiRR!tcontentiixJiN((R9tplanet_filter_reRR:tretcompiletItplanet_exclude_reR;RRt seen_guidsRNRERPt_itemstvaluesR$Rmtchannel_filter_retchannel_exclude_reR!t get_contentRtsearchtidRQR+tmktimeR*torderRFRtreverseRTR[R\t max_counttmax_timeRti(RRERFR[R\RNRRR!RRRPRRRRRRRR$R((R RR]sv                5   (RRRRR9RCRWRotFalseR}RRNRRxRRR(((R Rhs   -& I   c BstZdZdZd Zd ZdZeZdddZdZ dZ dZ ddZ dZ dZdZdZdZRS(sk A list of news items. This class represents a list of news items taken from the feed of a website or other source. Properties: url URL of the feed. url_etag E-Tag of the feed URL. url_modified Last modified time of the feed URL. url_status Last HTTP status of the feed URL. hidden Channel should be hidden (True if exists). name Name of the feed owner, or feed title. next_order Next order number to be assigned to NewsItem updated Correct UTC-Normalised update time of the feed. last_updated Correct UTC-Normalised time the feed was last updated. id An identifier the feed claims is unique (*). title One-line title (*). link Link to the original format feed (*). tagline Short description of the feed (*). info Longer description of the feed (*). modified Date the feed claims to have been modified (*). author Name of the author (*). publisher Name of the publisher (*). generator Name of the feed generator (*). category Category name (*). copyright Copyright information for humans to read (*). license Link to the licence for the content (*). docs Link to the specification of the feed format (*). language Primary language (*). errorreportsto E-Mail address to send error reports to (*). image_url URL of an associated image (*). image_link Link to go with the associated image (*). image_title Alternative text of the associated image (*). image_width Width of the associated image (*). image_height Height of the associated image (*). filter A regular expression that articles must match. exclude A regular expression that articles must not match. Properties marked (*) will only be present if the original feed contained them. Note that the optional 'modified' date field is simply a claim made by the item and parsed from the information given, 'updated' (and 'last_updated') are far more reliable sources of information. Some feeds may define additional properties to those above. tlinkst contributorst textinputtcloudt categoriesRnthrefturl_etagt url_modifiedttagstitunes_explicitcCsetii|ipti|inti|i|}t i |dd}ti i|||ddh|_||_g|_||_||_d|_d|_d|_d|_d|_d|_d|_d|_d|_|i|i|i i!|oLxI|i i"|D]1}|i i$||}|i&||ddq(WndS(NRitrootit0tcachedi('RRtisdirRR6tmakedirsRtfilenameRntcache_filenametdbhashRt cache_filet CachedInfoRRRt_planett_expiredtconfigured_urlR9RRURRtupdatedt last_updatedR:R;t next_ordert cache_readtcache_read_entriesR1t has_sectiontoptionsR@RAtvaluet set_as_string(RRRnRR@RR((R Rs4                cCs|ii|S(s-Check whether the item exists in the channel.N(RRRmtid_(RR((R thas_itemscCs |i|S(s!Return the item from the channel.N(RRR(RR((R tget_itemsicCsg}xX|iiD]G}|p|id o)|iti |i |i |fqqW|o|i |ing}|D]}||dq~S(sReturn the item list.REiN(RRRRRR$RERmRQR+RR*RRFRRRR(RRERFRRRRR$((R RR!s- cCst|iddS(sIterate the sorted item list.RFiN(titerRRR(R((R t__iter__.scCst|ii}x^|D]V}|iddjoqn|i|oqnt||}||i|, but in the case where the current self.url has changed from the original self.configured_url the string will contain both pieces of information. This is so that the URL in question is easier to find in logging output: getting an error about a URL that doesn't appear in your config file is annoying. s<%s>s<%s> (formerly <%s>)N(RRnR(R((R tfeed_informationIs cCs*ti|id|id|id|ii}|i dot |i |_ n||i do)t |idjot d|_ nC|io)|iiidjot d |_ nt d |_ |i d jo|i dot |idjoqtid |i|iy>titi|ii|iti|ii|iWnnX|i|_n|i d jotid|idSn|i djo(tid|i|idSn|i djotid|idSnQt|i djo$tid|i |idSntid|i|i do |ipd|_|i do |i pd|_|idj oti!d|in|idj o#ti!dt"i#t$|in|i%|i&|i'|i|idS(sDownload the feed to refresh the information. This does the actual work of pulling down the feed and if it changes updates the cached information about the feed and entries within it. tetagtmodifiedtagentRVtentriesiitTimeoutiit301s Feed has moved from <%s> to <%s>t304sFeed %s unchangedNRps Feed %s gonet408sFeed %s timed outisError %s while updating feed %ssUpdating feed %ss E-Tag: %ssLast Modified: %s((RtparseRRnRRRR4R#RmtstrRVRURTRtbozotbozo_exceptiont __class__RRrtwarningRRRRR6RRRJterrorRR9RtdebugR+R,R.t update_infoRtupdate_entries(RR#((R RzYsX & 6   ## cCsnxg|iD]Y}||ijp|d|ijoq |i|doq |ido||ido5||io'|i|i dd||in||ido5||i o'|i|i dd||i qfq |djoq |ido8||dj o#|i |t d ||qfq |djo||id o|i|d ||in||id o|i|d ||in||id o|i|d||in||ido%|i|dt||in||ido%|i|dt||iqfq t||ttfoy|d}|i|os||ido_||idjoti||||, unknown formatN(RR%R&Rt IGNORE_KEYSRmtendswithRRR R R9t set_as_dateRTRnRR!RRRt isinstancetunicodetdetailRtsanitizetHTMLRR{RrR|(RRR&R((R RsX $"" ' %) $ cCst|pdSn|i|_ti|_g}g}x|D]}|i dot i |i }n|i dot i |i}n|i do0|idtit i |ii}nQ|i do0|idtit i |ii}ntidqC|i|o|i|}n*t||}||i|<|i||i||i||idjoC|ii o6t||ii jod|_!ti"d |qCqCW|i#x1|D])}t$t%|i&d |_'|_&qWt|}ti"d |x|i)d d D]}}|d joPqM|i |jo|d 8}qM|i*i+d jo4|i|i =|i,i|ti"d|i qMqMWdS(sUpdate entries from the feed. This reads the entries supplied by feedparser and updates the cached information about them. It's at this point we update the 'updated' timestamp and keep the old one in 'last_updated', these provide boundaries for acceptable entry times. If this is the first time a feed has been updated then most of the items will be marked as hidden, according to Planet.new_feed_items. If the feed does not contain items which, according to the sort order, should be there; those items are assumed to have been expired from the feed or replaced and are removed from the cache. NRRR!t/tsummarys,Unable to find or generate id, entry ignoredtyess Marked <%s> as hidden (new feed)isItems in Feed: %dRFt226s%Removed expired or replaced item <%s>(-RTRRRRR+RLt new_itemst feed_itemstentryRmRRRtentry_idRRntmd5tnewR!t hexdigestRRrRRRR$RRQRzR9RR8RERRRRJRRt feed_countRRRiRUR(RRR$R%R$R)R#R"((R RsZ  00     6  '   cCsOxHdD]@}|i|o*|i||ijo|i|SqqWdS(s#Return the key containing the name.RR!RN(snamestitle(R&RRmR'tNULLt get_as_string(RR&((R tget_names )( RRRRs categoriessurlRsurl_etags url_modifiedRR(RRRRRRRt __contains__RRRRRRRRzRRR,(((R Rs 3       8 9 IcBs8tZdZd ZdZd Zd Zd ZRS( sAn item of news. This class represents a single item of news on a channel. They're created by members of the Channel class and accessible through it. Properties: id Channel-unique identifier for this item. id_hash Relatively short, printable cryptographic hash of id date Corrected UTC-Normalised update time, for sorting. order Order in which items on the same date can be sorted. hidden Item should be hidden (True if exists). title One-line title (*). link Link to the original format text (*). summary Short first-page summary (*). content Full HTML content. modified Date the item claims to have been modified (*). issued Date the item claims to have been issued (*). created Date the item claims to have been created (*). expired Date the item claims to expire (*). author Name of the author (*). publisher Name of the publisher (*). category Category name (*). comments Link to a page to enter comments (*). license Link to the licence for the content (*). source_name Name of the original source of this item (*). source_link Link to the original source of this item (*). Properties marked (*) will only be present if the original feed contained them. Note that the various optional date fields are simply claims made by the item and parsed from the information given, 'date' is a far more reliable source of information. Some feeds may define additional properties to those above. RRt enclosuresRt guidislinkR*RcCsltii||i|||_||_t i |i |_ d|_d|_d|_|idS(N(RRRRRPRRRiRR&R'R(tid_hashR9R*RRR(RRPR((R RDs     cCs@x,|iD]}||ijp|d|ijoq |i|doq |ido ||ido5||io'|i|i dd||in||ido5||i o'|i|i dd||i n||idoc||i oU|i id p||i |i i jo'|i|i dd||i q+q |ido8||dj o#|i|td ||q+q |d joj||id o|i|d||in||id o|i|d ||iq+q |d jod}x||D]}|idjoti|i|_n'|idjot|i|_n|idoO|i oE|i id p|i |i i jo|i|d|i n|ti|i7}qMW|i||q t||ttfoy|d}|i|ow||ido_||idjoti||||, unknown formatR*N("R$R%R&RRRmRRRR R R1RiR9RRTRRnR$RRRRRRRRRRR{RrR|Rtget_date(RR$RR$RR&((R RzOsj $""P' #  D   cCsx8dD]*}|i|o|i|}PqqWd}|dj o'||iijo|ii}qnG|i|o*|i ||i jo|i|Sn |ii}|i |||S(sGet (or update) the date key. We check whether the date the entry claims to have been changed is since we last updated this feed and when we pulled the feed off the site. If it is then it's probably not bogus, and we'll sort accordingly. If it isn't then we bound it appropriately, this ensures that entries appear in posting sequence but don't overlap entries added in previous updates and don't creep into the next one. RRt publishedtissuedtcreatedN(supdatedsmodifiedR5R6R7( t other_keyRRmR)R*R9RiRR&R'R*R(RR&R*R8((R R4s   ) cCsOxHdD]@}|i|o*|i||ijo|i|SqqWdS(s&Return the key containing the content.RttaglineRRN(scontentR9ssummary(R&RRmR'R*R+(RR&((R Rs )(s categoriess contributorsR.slinksR/sdatestags(RRRRRRzR4R(((R Rs % B "(+Rt __version__t __authors__t __license__RRRRRRtcompat_loggingt__all__RR&R+RRtxml.sax.saxutilsRRR3R5R7R.R/RqRrRtwarnRRcRdRRIReRKRRR0RRRR(#RRR;R7R3ReRRRRRrR>RdRRRRcRR:R<R/RRR0RRKR5R&RIRRR.RR+R((R t?s^                 Y\