addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscrossdots-three-verticaleditemptyheartexporteye-with-lineeyefacebookfolderfullheartglobegmailgooglegroupsimageimagesinstagramlinklocation-pinm-swarmSearchmailmessagesminusmoremuplabelShape 3 + Rectangle 1outlookpersonStartprice-ribbonImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruseryahoo

Harry W.


Hometown: London

Member since:

January 14, 2013

What pipeline systems are you using or have used?

The dreaded custom framework.

What search technologies are you interested in?

Lucene, ElasticSearch

What problems are you trying to solve with pipelining?

Using pipelining for document normalization, article extraction, publication date extraction, link extraction, language detection, deduplication. The problem we're trying to solve is doing the above in a high throughput, scalable, fashion without it all falling appart when I'm asleep.


Work at Arachnys on gathering business intelligence data from every pathologically made site you could think of. Includes on demand user simulated searches across business registries, publication date extraction from new articles, dirty data etc.

Our Sponsors

  • Flax

    Meetup running costs

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy