Re: [ruby-81] Parsing bad characters out of a user submitted chunk of text

From: user 3.
Sent on: Wednesday, October 3, 2007 12:27 AM
Yeah if you want to quickly clean your unicode of any non-standard
ascii, you can do the following:

"unicode text".gsub(/[^\x21-\­x7e]/, '')

The normal ascii are between 0x21 to 0x7E.  Might be useful to consult
a unicode to ascii table here.  The left double quote is
0xE2,0x80,0x9C in unicode, which is actually 3 separate characters in
ascii...  no wonder the output looks weird!

Dean



On 10/2/07, Sumit <[address removed]> wrote:
> It actually is a rails app.    The issue is coming in IE when people view
> the RSS feeds directly in the browser.  An example would be:
> http://www.yardba...­
>
> Actually thinking about this a little more and a few more google searches
> later its more of an IE issue not handling unicode very well.  A solution
> might be converting to a different encoding or doing search and replaces on
> offending characters as we find them like the other reply uses in php.
>
> At the very least this reveals I need to do some homework on my encodings!
>
>
> On 10/2/07, Dean <[address removed] > wrote:
> > Actually i'm pretty sure what sumit is talking about is the quotes
> > that aren't part of the standard ascii table.  These are the left and
> > right quotes commonly seen.  Standard ascii only has the double quote
> > character to represent both left & right quotes.  If you use these
> > non-standard ascii characters in strings in non-unicode aware apps,
> > you'll see that it shows up as 2 or more funny looking characters.
> > unicode is multibyte so it could appear as multiple characters if the
> > app doesn't understand unicode
> >
> > If you try to use gsub to replace those characters, you'd have to
> > represent the characters somehow in code first, and you'd probably
> > have to use a hexadecimal notation because i'm pretty sure you won't
> > be able to type it in vi/emacs/textmate.  If you want to use gsub, you
> > might need to do something like blah.gsub(/[^:print:­]/, '') where
> > [^:print:] represents anything not printable.
> >
> > Rails handles unicode decently so I'm guessing you're not using a
> > rails app, or maybe you're using some kind of storage that can't
> > handle unicode properly.
> >
> >
> > On 10/2/07, Billy <[address removed]> wrote:
> > > Bad quotes? Like quotes that began but didn't end?
> > > It's hard to make your script context sensitive but at the very least
> you
> > > could count the number of double quotes in the text and make sure that
> > > number is not odd
> > >
> > > And what is wrong with an ellipsoids? They are perfectly legal :)
> > >
> > > I don't really see a pattern of "bad characters" here bro  but you could
> do
> > > a quick and dirty "strip this certain set of chars out of my string" by
> > > doing:
> > >
> > > bad_chars = %w{\.\.\. cheese spam budweiser}
> > > string = "I love budweiser beer, cheese fries and ... umm ... spam
> burgers!"
> > > string = string.gsub(/#{bad_c­hars.collect{|c| "(#{c})"}.join("|")}­/i,
> '')
> > >
> > > (note you need to specify periods  as \. in regex .. "." is a special
> char
> > > that means match anything)
> > >
> > > bk
> > >
> > > On 10/2/07, Sumit < [address removed]> wrote:
> > > >
> > > > Ahoy hoy,
> > > >
> > > > I was just about to sit down and write a huge method that looks for
> and
> > > replaces bad charcacters that are getting submitted in a form we have.
> I
> > > thought it might be good to peg you guys to see if anyone has already
> seen a
> > > solution or tried to make one themselves before I get knee deep in it.
> > > Characters like bad quotes, ellipses, etc.   Some of our biz guys have
> taken
> > > to calling it the riddler issue... I swear, i did not create or
> propagate
> > > this name!     A lot of our athlete bloggers write their stuff in
> microsoft
> > > word and then copy and paste it in so we're seeing it more and more.
> Any
> > > ideas, points in the right direction, or hitting me over the head and
> > > telling me to just write it are greatly appreciated.
> > > >
> > > > Thanks in advance!
> > > > Sumit
> > > >
> > > > btw,  im sorry for those of you seeing this twice as im sending it
> both
> > > the SF and East Bay lists.    =]
> > > >
> > > >
> > > >
> > > > --
> > > > Sumit N Gupta
> > > > [address removed]
> > > >
> > > >
> > > >
> > > > --
> > > > Please Note: If you hit "REPLY", your message will be sent to everyone
> on
> > > this mailing list ( [address removed])
> > > > This message was sent by Sumit ( [address removed]) from The East
> Bay
> > > Ruby Meetup Group.
> > > > To learn more about Sumit, visit his/her member profile
> > > > To unsubscribe or to update your mailing list settings, click here
> > > >
> > > > Meetup.com Customer Service: [address removed]
> > > > 632 Broadway New York NY 10012 USA
> > >
> > >
> > >
> > >
> > >
> > >  --
> > >  Please Note: If you hit "REPLY", your message will be sent to everyone
> on
> > > this mailing list ([address removed])
> > >  This message was sent by Billy ( [address removed]) from The East Bay
> Ruby
> > > Meetup Group.
> > >  To learn more about Billy, visit his/her member profile
> > >  To unsubscribe or to update your mailing list settings, click here
> > >
> > >  Meetup.com Customer Service: [address removed]
> > >  632 Broadway New York NY 10012 USA
> >
> >
> >
> > --
> > Please Note: If you hit "REPLY", your message will be sent to everyone on
> this mailing list ( [address removed])
> > This message was sent by Dean ([address removed]) from The East Bay Ruby
> Meetup Group.
> > To learn more about Dean, visit his/her member profile:
> http://ruby.meetu...­
> > To unsubscribe or to update your mailing list settings, click here:
> http://www.meetup...­
> >
> > Meetup.com Customer Service: [address removed]
> > 632 Broadway New York NY 10012 USA
> >
> >
>
>
>
> --
> Sumit N Gupta
> [address removed]
>
>
>
>  --
>  Please Note: If you hit "REPLY", your message will be sent to everyone on
> this mailing list ([address removed])
>  This message was sent by Sumit ([address removed]) from The East Bay Ruby
> Meetup Group.
>  To learn more about Sumit, visit his/her member profile
>  To unsubscribe or to update your mailing list settings, click here
>
>  Meetup.com Customer Service: [address removed]
>  632 Broadway New York NY 10012 USA

Our Sponsors

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy