addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscontroller-playcredit-cardcrossdots-three-verticaleditemptyheartexporteye-with-lineeyefacebookfolderfullheartglobe--smallglobegmailgooglegroupshelp-with-circleimageimagesinstagramFill 1launch-new-window--smalllight-bulblinklocation-pinm-swarmSearchmailmessagesminusmoremuplabelShape 3 + Rectangle 1ShapeoutlookpersonJoin Group on CardStartprice-ribbonprintShapeShapeShapeShapeImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruserwarningyahoo

Re: [ruby-81] Parsing bad characters out of a user submitted chunk of text

From: user 3.
Sent on: Tuesday, October 2, 2007 11:51 PM
It actually is a rails app.    The issue is coming in IE when people view the RSS feeds directly in the browser.  An example would be: http://www.yardbarker.com/rss/user/MichaelConley

Actually thinking about this a little more and a few more google searches later its more of an IE issue not handling unicode very well.  A solution might be converting to a different encoding or doing search and replaces on offending characters as we find them like the other reply uses in php.    

At the very least this reveals I need to do some homework on my encodings!  

On 10/2/07, Dean <[address removed] > wrote:
Actually i'm pretty sure what sumit is talking about is the quotes
that aren't part of the standard ascii table.  These are the left and
right quotes commonly seen.  Standard ascii only has the double quote
character to represent both left & right quotes.  If you use these
non-standard ascii characters in strings in non-unicode aware apps,
you'll see that it shows up as 2 or more funny looking characters.
unicode is multibyte so it could appear as multiple characters if the
app doesn't understand unicode

If you try to use gsub to replace those characters, you'd have to
represent the characters somehow in code first, and you'd probably
have to use a hexadecimal notation because i'm pretty sure you won't
be able to type it in vi/emacs/textmate.  If you want to use gsub, you
might need to do something like blah.gsub(/[^:print:]/, '') where
[^:print:] represents anything not printable.

Rails handles unicode decently so I'm guessing you're not using a
rails app, or maybe you're using some kind of storage that can't
handle unicode properly.


On 10/2/07, Billy <[address removed]> wrote:
> Bad quotes? Like quotes that began but didn't end?
> It's hard to make your script context sensitive but at the very least you
> could count the number of double quotes in the text and make sure that
> number is not odd
>
> And what is wrong with an ellipsoids? They are perfectly legal :)
>
> I don't really see a pattern of "bad characters" here bro  but you could do
> a quick and dirty "strip this certain set of chars out of my string" by
> doing:
>
> bad_chars = %w{\.\.\. cheese spam budweiser}
> string = "I love budweiser beer, cheese fries and ... umm ... spam burgers!"
> string = string.gsub(/#{bad_chars.collect{|c| "(#{c})"}.join("|")}/i, '')
>
> (note you need to specify periods  as \. in regex .. "." is a special char
> that means match anything)
>
> bk
>
> On 10/2/07, Sumit < [address removed]> wrote:
> >
> > Ahoy hoy,
> >
> > I was just about to sit down and write a huge method that looks for and
> replaces bad charcacters that are getting submitted in a form we have.   I
> thought it might be good to peg you guys to see if anyone has already seen a
> solution or tried to make one themselves before I get knee deep in it.
> Characters like bad quotes, ellipses, etc.   Some of our biz guys have taken
> to calling it the riddler issue... I swear, i did not create or propagate
> this name!     A lot of our athlete bloggers write their stuff in microsoft
> word and then copy and paste it in so we're seeing it more and more.   Any
> ideas, points in the right direction, or hitting me over the head and
> telling me to just write it are greatly appreciated.
> >
> > Thanks in advance!
> > Sumit
> >
> > btw,  im sorry for those of you seeing this twice as im sending it both
> the SF and East Bay lists.    =]
> >
> >
> >
> > --
> > Sumit N Gupta
> > [address removed]
> >
> >
> >
> > --
> > Please Note: If you hit "REPLY", your message will be sent to everyone on
> this mailing list ( [address removed])
> > This message was sent by Sumit ( [address removed]) from The East Bay
> Ruby Meetup Group.
> > To learn more about Sumit, visit his/her member profile
> > To unsubscribe or to update your mailing list settings, click here
> >
> > Meetup.com Customer Service: [address removed]
> > 632 Broadway New York NY 10012 USA
>
>
>
>
>
>  --
>  Please Note: If you hit "REPLY", your message will be sent to everyone on
> this mailing list ([address removed])
>  This message was sent by Billy ( [address removed]) from The East Bay Ruby
> Meetup Group.
>  To learn more about Billy, visit his/her member profile
>  To unsubscribe or to update your mailing list settings, click here
>
>  Meetup.com Customer Service: [address removed]
>  632 Broadway New York NY 10012 USA



--
Please Note: If you hit "REPLY", your message will be sent to everyone on this mailing list ( [address removed])
This message was sent by Dean ([address removed]) from The East Bay Ruby Meetup Group.
To learn more about Dean, visit his/her member profile: http://ruby.meetup.com/81/members/3193291/
To unsubscribe or to update your mailing list settings, click here: http://www.meetup.com/account/?tab=comm

Meetup.com Customer Service: [address removed]
632 Broadway New York NY 10012 USA




--
Sumit N Gupta
[address removed]

Our Sponsors

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy