-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME
284 lines (201 loc) · 8.27 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
NAME
XML::RSS::Feed - Persistant XML RSS Encapsulation
VERSION
2.32
SYNOPSIS
A quick and dirty non-POE example that uses a blocking sleep. The magic
is in the late_breaking_news method that returns only headlines it
hasn't seen.
use XML::RSS::Feed;
use LWP::Simple qw(get);
my $feed = XML::RSS::Feed->new(
url => "http://www.jbisbee.com/rdf/",
name => "jbisbee",
delay => 10,
debug => 1,
tmpdir => "/tmp", # optional caching
);
while (1) {
$feed->parse(get($feed->url));
print $_->headline . "\n" for $feed->late_breaking_news;
sleep($feed->delay);
}
ATTENTION! - If you want a non-blocking way to watch multiple RSS
sources with one process use POE::Component::RSSAggregator.
If you want to fetch a feed, mark all the headlines as seen, then get
events for any new headlines, pass 'init_headlines_seen => 1' to the
constructor.
CONSTRUCTOR
XML::RSS::Feed->new( url => $url, name => $name )
Required Params
* name
Identifier and hash lookup key for the RSS feed.
* url
The URL of the RSS feed
Optional Params
* delay
Number of seconds between updates (defaults to 600)
* tmpdir
Directory to keep a cached feed (using Storable) to keep
persistance between instances.
* init_headlines_seen
Mark all headlines as seen from the intial fetch, and only
report new headlines that appear from that point forward.
* debug
Turn debuging on.
* headline_as_id
Boolean value to use the headline as the id when URL isn't
unique within a feed.
* hlobj
A class name sublcassed from XML::RSS::Headline
* max_headlines
The max number of headlines to keep. (default is unlimited)
METHODS
$feed->parse( $xml_string )
Pass in a xml string to parse with XML::RSS and then call process to
process the results.
$feed->process( $items, $title, $link )
$feed->process( $items, $title )
$feed->process( $items )
Calls pre_process, process_items, post_process, title, and link methods
to process the parsed results of an RSS XML feed.
* $items
An array of hash refs which will eventually become
XML::RSS::Headline objects. Look at XML::RSS::Headline->new() for
acceptable arguments.
* $title
The title of the RSS feed.
* $link
The RSS channel link (normally a URL back to the homepage) of the
RSS feed.
$feed->pre_process
Mark all headlines from previous run as seen.
$feed->process_items( $items )
Turn an array refs of hash refs into XML::RSS::Headline objects and
added to the internal list of headlines.
$feed->post_process
Post process cleanup, cache headlines (if tmpdir), and debug messages.
$feed->create_headline( %args)
Create a new XML::RSS::Headline object and add it to the interal list.
Check XML::RSS::Headline->new() for acceptable values for %args.
$feed->init_all_headlines_seen()
After fetching a feed for the first time, mark all headlines as seen so
we don't generate a flood of events. Basically don't issue an event for
any existing headlines, but for any headline from that point on.
$feed->num_headlines
Returns the number of headlines for the feed.
$feed->seen_headline( $id )
Just a boolean test to see if we've seen a headline or not.
$feed->headlines
Returns an array or array reference (based on context) of
XML::RSS::Headline objects
$feed->late_breaking_news
Returns an array or the number of elements (based on context) of the
latest XML::RSS::Headline objects.
$feed->cache
If tmpdir is defined the rss info is cached.
$feed->set_last_updated
$feed->set_last_updated( Time::HiRes::time )
Set the time of when the feed was last processed. If you pass in a value
it will be used otherwise calls Time::HiRes::time.
$feed->last_updated
The time (in epoch seconds) of when the feed was last processed.
$feed->last_updated_hires
The time (in epoch seconds and milliseconds) of when the feed was last
processed.
SET/GET ACCESSOR METHODS
$feed->title
$feed->title( $title )
The title of the RSS feed.
$feed->debug
$feed->debug( $bool )
Turn on debugging messages
$feed->init
$feed->init( $bool )
init is used so that we just load the current headlines and don't return
all headlines. in other words we initialize them. Takes a boolean
argument.
$feed->name
$feed->name( $name )
The identifier of an RSS feed.
$feed->delay
$feed->delay( $seconds )
Number of seconds between updates.
$feed->link
$feed->link( $rss_channel_url )
The url in the RSS feed with a link back to the site where the RSS feed
came from.
$feed->url
$feed->url( $url )
The url in the RSS feed with a link back to the site where the RSS feed
came from.
$feed->headline_as_id
$feed->headline_as_id( $bool )
Within some RSS feeds the URL may not always be unique, in these cases
you can use the headline as the unique id. The id is used to check
whether or not a feed is new or has already been seen.
$feed->hlobj
$feed->hlobj( $class )
Ablity to use a subclass of XML::RSS::Headline. (See Perl Jobs example
in XML::RSS::Headline::PerlJobs). This should just be the name of the
subclass.
$feed->tmpdir
$feed->tmpdir( $tmpdir )
Temporay directory to store cached RSS XML between instances for
persistance.
$feed->init_headlines_seen
$feed->init_headlines_seen( $bool )
Boolean value to mark all headlines as seen from the intial fetch, and
only report new headlines that appear from that point forward.
$feed->max_headlines
$feed->max_headlines( $integer )
The maximum number of headlines you'd like to keep track of. (0 means
infinate)
DEPRECATED METHODS
$feed->failed_to_fetch
This should was deprecated because, the object shouldn't really know
anything about fetching, it just processes the results. This method
currently will always return false
$feed->failed_to_parse
This method was deprecated because, $feed->parse now returns a bool
value. This method will always return false
AUTHOR
Jeff Bisbee, "<jbisbee at cpan.org>"
BUGS
Please report any bugs or feature requests to "bug-xml-rss-feed at
rt.cpan.org", or through the web interface at
<http://rt.cpan.org/NoAuth/ReportBug.html?Queue=XML-RSS-Feed>. I will be
notified, and then you'll automatically be notified of progress on your
bug as I make changes.
SUPPORT
You can find documentation for this module with the perldoc command.
perldoc XML::RSS::Feed
You can also look for information at:
* AnnoCPAN: Annotated CPAN documentation
<http://annocpan.org/dist/XML-RSS-Feed>
* CPAN Ratings
<http://cpanratings.perl.org/d/XML-RSS-Feed>
* RT: CPAN's request tracker
<http://rt.cpan.org/NoAuth/Bugs.html?Dist=XML-RSS-Feed>
* Search CPAN
<http://search.cpan.org/dist/XML-RSS-Feed>
ACKNOWLEDGEMENTS
Special thanks to Rocco Caputo, Martijn van Beers, Sean Burke, Prakash
Kailasa and Randal Schwartz for their help, guidance, patience, and bug
reports. Guys thanks for actually taking time to use the code and give
good, honest feedback.
Thank for to Carl Fürstenberg for providing feedback for new
constructor param of 'init_headlines_seen' so you won't get flooded with
headlines on the first fetch of the feed.
Thanks to Slaven Rezić for pointing out that t/008_store_retrieve.t
pointed to broken rss tests on jbisbee.com (that I don't own anymore)
Thanks to Aaron Krowne for patch for XML::RSS::Headline to use guid as
the unique id instead of url if its available.
COPYRIGHT & LICENSE
Copyright 2006 Jeff Bisbee, all rights reserved.
This program is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.
SEE ALSO
XML::RSS::Headline, XML::RSS::Headline::PerlJobs,
XML::RSS::Headline::Fark, XML::RSS::Headline::UsePerlJournals,
POE::Component::RSSAggregator