1. 程式人生 > >Objective-C裡字串NSString過濾HTML標籤的方法

Objective-C裡字串NSString過濾HTML標籤的方法

- (NSString *)removeHTML:(NSString *)html {

NSScanner *theScanner;

NSString *text = nil;

theScanner = [NSScanner scannerWithString:html];

while ([theScanner isAtEnd] == NO) {

// find start of tag

[theScanner scanUpToString:@"<" intoString:NULL] ;

// find end of tag

[theScanner scanUpToString:@">" intoString:&text] ;

// replace the found tag with a space

//(you can filter multi-spaces out later if you wish)

html = [html stringByReplacingOccurrencesOfString:[NSString stringWithFormat:@"%@>", text] withString:@" "];

}

return html;

}

// 第二種,用NSString自帶的Seprated自截斷方法

- (NSString *)removeHTML2:(NSString *)html{

NSArray *components = [html componentsSeparatedByCharactersInSet:[NSCharacterSet characterSetWithCharactersInString:@"<>"]];

NSMutableArray *componentsToKeep = [NSMutableArray array];

for (int i = 0; i < [components count]; i = i + 2) {

[componentsToKeep addObject:[components objectAtIndex:i]];

}

NSString *plainText = [componentsToKeep componentsJoinedByString:@""];
return plainText;
}

  1. - (NSString *)flattenHTML:(NSString *)html trimWhiteSpace:(BOOL)trim  
  2. {  
  3.     NSScanner *theScanner = [NSScanner scannerWithString:html];  
  4.     NSString *text = nil;  
  5.     while ([theScanner isAtEnd] == NO) {  
  6.         // find start of tag  
  7.         [theScanner scanUpToString:@"<" intoString:NULL] ;  
  8.         // find end of tag  
  9.         [theScanner scanUpToString:@">" intoString:&text] ;  
  10.         // replace the found tag with a space  
  11.         //(you can filter multi-spaces out later if you wish)  
  12.         html = [html stringByReplacingOccurrencesOfString:  
  13.                 [ NSString stringWithFormat:@"%@>", text]  
  14.                                                withString:@""];  
  15.     }  
  16.     return trim ? [html stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]] : html;  


MWFeedParser — An RSS and Atom web feed parser for iOS

MWFeedParser is an Objective-C framework for downloading and parsing RSS (1.* and 2.*) and Atom web feeds. It is a very simple and clean implementation that reads the following information from a web feed:

Feed Information

  • Title
  • Link
  • Summary

Feed Items

  • Title
  • Link
  • Author name
  • Date (the date the item was published)
  • Updated date (the date the item was updated, if available)
  • Summary (brief description of item)
  • Content (detailed item content, if available)
  • Enclosures (i.e. podcasts, mp3, pdf, etc)
  • Identifier (an item's guid/id)

If you use MWFeedParser on your iPhone/iPad app then please do let me know, I'd love to check it out :)

Important: This free software is provided under the MIT licence (X11 license) with the addition of the following condition:

This Software cannot be used to archive or collect data such as (but notlimited to) that of events, news, experiences and activities, for the purpose of any concept relating to diary/journal keeping.

The full licence can be found at the end of this document.

Demo / Example App

There is an example iPhone application within the project which demonstrates how to use the parser to display the title of a feed, list all of the feed items, and display an item in more detail when tapped.

Setting up the parser

Create parser:

// Create feed parser and pass the URL of the feed
NSURL *feedURL = [NSURL URLWithString:@"http://images.apple.com/main/rss/hotnews/hotnews.rss"];
feedParser = [[MWFeedParser alloc] initWithFeedURL:feedURL];

Set delegate:

// Delegate must conform to `MWFeedParserDelegate`
feedParser.delegate = self;

Set the parsing type. Options are ParseTypeFull, ParseTypeInfoOnly,ParseTypeItemsOnly. Info refers to the information about the feed, such as it's title and description. Items are the individual items or stories.

// Parse the feeds info (title, link) and all feed items
feedParser.feedParseType = ParseTypeFull;

Set whether the parser should connect and download the feed data synchronously or asynchronously. Note, this only affects the download of the feed data, not the parsing operation itself.

// Connection type
feedParser.connectionType = ConnectionTypeSynchronously;

Initiate parsing:

// Begin parsing
[feedParser parse];

The parser will then download and parse the feed. If at any time you wish to stop the parsing, you can call:

// Stop feed download / parsing
[feedParser stopParsing];

The stopParsing method will stop the downloading and parsing of the feed immediately.

Reading the feed data

Once parsing has been initiated, the delegate will receive the feed data as it is parsed.

- (void)feedParserDidStart:(MWFeedParser *)parser; // Called when data has downloaded and parsing has begun
- (void)feedParser:(MWFeedParser *)parser didParseFeedInfo:(MWFeedInfo *)info; // Provides info about the feed
- (void)feedParser:(MWFeedParser *)parser didParseFeedItem:(MWFeedItem *)item; // Provides info about a feed item
- (void)feedParserDidFinish:(MWFeedParser *)parser; // Parsing complete or stopped at any time by `stopParsing`
- (void)feedParser:(MWFeedParser *)parser didFailWithError:(NSError *)error; // Parsing failed

MWFeedInfo and MWFeedItem contains properties (title, link, summary, etc.) that will hold the parsed data. ViewMWFeedInfo.h and MWFeedItem.h for more information.

Important: There are some occasions where feeds do not contain some information, such as titles, links or summaries. Before using any data, you should check to see if that data exists:

NSString *title = item.title ? item.title : @"[No Title]";
NSString *link = item.link ? item.link : @"[No Link]";
NSString *summary = item.summary ? item.summary : @"[No Summary]";

The method feedParserDidFinish: will only be called when the feed has successfully parsed, or has been stopped by a call tostopParsing. To determine whether the parsing completed successfully, or was stopped, you can callisStopped.

For a usage example, please see RootViewController.m in the demo project.

Available data

Here is a list of the available properties for feed info and item objects:

MWFeedInfo

  • info.title (NSString)
  • info.link (NSString)
  • info.summary (NSString)

MWFeedItem

  • item.title (NSString)
  • item.link (NSString)
  • item.author (NSString)
  • item.date (NSDate)
  • item.updated (NSDate)
  • item.summary (NSString)
  • item.content (NSString)
  • item.enclosures (NSArray of NSDictionary with keysurl, type and length)
  • item.identifier (NSString)

Using the data

All properties of MWFeedInfo and MWFeedItem return the raw data as provided by the feed. This content may or may not include HTML and encoded entities. If the content does include HTML, you could display the data within a UIWebView, or you could use the provided NSString category (NSString+HTML) which will allow you to manipulate this HTML content. The methods available for your convenience are:

// Convert HTML to Plain Text
//  - Strips HTML tags & comments, removes extra whitespace and decodes HTML character entities.
- (NSString *)stringByConvertingHTMLToPlainText;

// Decode all HTML entities using GTM.
- (NSString *)stringByDecodingHTMLEntities;

// Encode all HTML entities using GTM.
- (NSString *)stringByEncodingHTMLEntities;

// Minimal unicode encoding will only cover characters from table
// A.2.2 of http://www.w3.org/TR/xhtml1/dtds.html#a_dtd_Special_characters
// which is what you want for a unicode encoded webpage.
- (NSString *)stringByEncodingHTMLEntities:(BOOL)isUnicode;

// Replace newlines with <br /> tags.
- (NSString *)stringWithNewLinesAsBRs;

// Remove newlines and white space from string.
- (NSString *)stringByRemovingNewLinesAndWhitespace;

// Wrap plain URLs in <a href="..." class="linkified">...</a>
//  - Ignores URLs inside tags (any URL beginning with =")
//  - HTTP & HTTPS schemes only
//  - Only works in iOS 4+ as we use NSRegularExpression (returns self if not supported so be careful with NSMutableStrings)
//  - Expression: (?<!=")\b((http|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&amp;:/~\+#]*[\w\-\@?^=%&amp;/~\+#])?)
//  - Adapted from http://regexlib.com/REDetails.aspx?regexp_id=96
- (NSString *)stringByLinkifyingURLs;

An example of this would be:

// Display item summary which contains HTML as plain text
NSString *plainSummary = [item.summary stringByConvertingHTMLToPlainText];

Debugging problems

If for some reason the parser doesn't seem to be working, try enabling Debug Logging inMWFeedParser.h. This will log error messages to the console and help you diagnose the problem. Error codes and their descriptions can be found at the top ofMWFeedParser.h.

Other information

MWFeedParser is not currently thread-safe.

Adding to your project

Method 1: Use CocoaPods

CocoaPods is great. If you are using CocoaPods (and here's how to get started), simply addpod 'MWFeedParser' to your podfile and run pod install. You're good to go! Here's an example podfile:

platform :ios, '7'
    pod 'MWFeedParser'

If you are just interested in using the HTML and/or InternetDateTime categories in your app, you can just specify those in your podfile withpod 'MWFeedParser/NSString+HTML' or pod 'MWFeedParser/NSDate+InternetDateTime'.

Method 2: Including Source Directly Into Your Project

  1. Open MWFeedParser.xcodeproj.
  2. Drag the MWFeedParser & Categories groups into your project, ensuring you checkCopy items into destination group's folder.
  3. Import MWFeedParser.h into your source as required.

Outstanding and suggested features

  • Demonstrate the previewing of formatted item summary/content (HTML with images, paragraphs, etc) within aUIWebView in demo app.
  • Provide functionality to list available feeds when given the URL to a webpage with one or more web feeds associated with it.
  • Support for the Media RSS extension (from Flickr, etc.)
  • Support for the GeoRSS extension.
  • Look into web feed icons.
  • Look into supporting/detecting images in feed items.

Feel free to get in touch and suggest/vote for other features.