Re: [orlandophp] Health Insurance Web Services and/or Scraping

From: Tony T.
Sent on: Friday, September 21, 2012 3:47 PM
Jorge,

I may not have the most intelligent answer for you because most of the times I've used this I've been scraping sites with rate limiting protections. Last site I did had a 200 ms delay between each request but I had to do it that way or my IP would be shunned. I haven't had the pleasure of benchmarking against a site where I had a large enough pool of requests without such protections to give accurate numbers here. I do know that Scrapy uses an asynchronous method using Twisted as opposed to a threaded approach which is faster than using curl.

-Tony

On Fri, Sep 21, 2012 at 3:04 PM, Jorge Colon <[address removed]> wrote:
Tony,

Had a look at the docs. How's this in terms of speed? Anything that involves a lot of parsing, especially HTML and XML is slow. 

Regards,

Jorge Colon
Director of Web Development

Zend Certified Engineer

Sent from my mobile device

On Sep 21, 2012, at 12:07 AM, Tony Turner <[address removed]> wrote:

I've been using Scrapy for my Web scraping projects and it's very tunable. Can change user agents, set delays between requests, set or even disable cookies, and a multitude of other options . You can export to CSV, JSON and other formats as well as db connectors, though I've not gotten that fancy yet. It includes an interactive shell so you can pretty easily work out the Xpaths without having to recode so usually works the first time you run the spider. It's also Python so it's super easy to setup.

On Sep 20,[masked]:30 PM, "Joseph Persie" <[address removed]> wrote:
Off Topic
don't abuse the mailing list, its a great resource and should be used when absolutely needed.

Intention

im building a health insurance rate request application the requires the following parameters:
zip, gender, age

With these args i want to retrieve near accurate rates for various insurance companies

Web Services?
unfortunately we are still in the eve of the 21'st century and people are still using file cabinets!
Does anyone tap into health insurance web services?
I've tried contacting 

eHealth
it seems they are pretty preoccupied with their corporate customers

eligibleAPI
they require to many parameters and seem geared towards hospitals and insurance agents.

So if you know of service that I can send the following paremte

Scrape time!

Additionally lets say i wanted the rates from the following:

after having a gander at the source thier is csrf protection in place.
using curl with an htmldom parser should get me to the rates im looking for.
What is the best method of mimicking a legit user from a browser via curl to trick csrf into thinking im
real. I undestand curl has a cookiejar but i was curious if someone had a gist of implementation they could point me to,

Thanks!





--
Please Note: If you hit "REPLY", your message will be sent to everyone on this mailing list ([address removed])
This message was sent by Joseph Persie ([address removed]) from The Orlando PHP User Group.
To learn more about Joseph Persie, visit his/her member profile
Set my mailing list to email me As they are sent | In one daily email | Don't send me mailing list messages

Meetup, PO Box 4668 #37895 New York, New York[masked] | [address removed]




--
Please Note: If you hit "REPLY", your message will be sent to everyone on this mailing list ([address removed])
This message was sent by Tony Turner ([address removed]) from The Orlando PHP User Group.
To learn more about Tony Turner, visit his/her member profile
Set my mailing list to email me As they are sent | In one daily email | Don't send me mailing list messages

Meetup, PO Box 4668 #37895 New York, New York[masked] | [address removed]




--
Please Note: If you hit "REPLY", your message will be sent to everyone on this mailing list ([address removed])
This message was sent by Jorge Colon ([address removed]) from The Orlando PHP User Group.
To learn more about Jorge Colon, visit his/her member profile



--
Tony Turner
OWASP Orlando Chapter Founder/Co-Leader
[address removed]


Our Sponsors

  • Accelebrate Training

    Win a $25 Amazon.com gift card, usable as AWS credit, by attending!

  • Green Key Resources

    Thanks for helping provide food, drinks, and other great events!

  • Consultis

    Many thanks for sponsoring the food, drinks, and sponsors every month!

  • Veredus

    Thanks so much for helping us cover the cost of food and drinks!

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy