addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwchatcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscrossdots-three-verticaleditemptyheartexporteye-with-lineeyefacebookfolderfullheartglobegmailgoogleimageimagesinstagramlinklocation-pinmagnifying-glassmailminusmoremuplabelShape 3 + Rectangle 1outlookpersonplusprice-ribbonImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruseryahoo

Re: [php-49] analytics logging - where to store data

From: Will W.
Sent on: Thursday, November 15, 2012 12:58 PM
This is part of the big data problem that a lot of major companies are dealing with. Though the chances are good that you won't ever develop terabtyes of data on daily basis, but you can generate quite a bit and still have concerns.  For example, I regularly work with 100m user data sets that are in the 20gb file size.

And you are definitely right to think about the future and there is a lot to consider.    here is a cool article i read just the other day, it has some comparisons about data warehouse/query speeds:

MySQL seems to be the norm right now, coupled with tools like hadoop/hive/aws.     I don't work with the data warehousing side on a e-commerce level so I can't comment on the db architecture.   If I were to approach it I would definitely try and keep a series of static tables though, not just an annual one (customer, job, page, etc).   By segmenting the data and using guids with proper keys you should be able to keep things moving along fast enough.  Also, remember that for a lot of reporting/analysis, you shouldn't be working off a production server, I keep a small, slightly outdated data set, on a local MSSQL server running so I can bog it down without pissing off the bosses when I need new queries.  

On Thu, Nov 15, 2012 at 11:50 AM, Mark Steudel <[address removed]> wrote:

So I'm working on a project that is all about tracking user activity on a site:

1. How much time spent on site
2. how many activities a user has started, finished,
3. How many times a user has come to the site
4. etc.

And of course they want lots of reporting on all these aspects.

The site gets a fair amount of traffic and I can see all this logging
creating quite a bit of traffic especially over years.


1. Not knowing much of anything about nosql implementations, would
this project be worth exploring  nosql?

2. If I stick with MySQL, are there some ways I can architect at the
get go that won't make it a bear to work with in say 3 years. (ie some
ginormous table). I was thinking either try and get some sort of
business process retention policy (ie we only need to worry about a
years worth of data) or database retention (ie dynamic tables that
store a years worth of data user_time_2012, user_time_2013 ).

Anyway would love to get some feedback on this.

Thanks, Mark

Mark Steudel
P: [masked]
F: [masked]
[address removed]

. : Work : .

. : Play : .

. : LinkedIn : .

Please Note: If you hit "REPLY", your message will be sent to everyone on this mailing list ([address removed])
This message was sent by Mark Steudel ([address removed]) from The Seattle PHP Meetup Group.
To learn more about Mark Steudel, visit his/her member profile:
Set my mailing list to email me

As they are sent

In one daily email

Don't send me mailing list messages
Meetup, PO Box 4668 #37895 New York, New York[masked] | [address removed]

Our Sponsors

  • TUNE

    Meeting space and food

  • PluralSight

    PluralSight subscriptions for developer training

  • O'Reilly

    Disc Code: PCBW is good for 40% off print and 50% off ebooks and videos

  • JetBrains PhpStorm

    Occasional free licenses to raffle off at meetups

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy