addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscrossdots-three-verticaleditemptyheartexporteye-with-lineeyefacebookfolderfullheartglobegmailgooglegroupsimageimagesinstagramlinklocation-pinm-swarmSearchmailmessagesminusmoremuplabelShape 3 + Rectangle 1outlookpersonJoin Group on CardStartprice-ribbonImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruseryahoo

Re: [php-49] analytics logging - where to store data

From: Mark S.
Sent on: Thursday, November 15, 2012 3:04 PM
I like the idea of "roll up" tables. If I understand you correctly, I would log raw data, say:

activities_logging_table

user_id
activity_id
session
created_on

And then my roll up table would just have up-to-date statistics which would prevent the system to have to run through all the data

activities_roll_up_table

user_id
activity_id
total_opens
created_on
updated_on

Is that sorta what you mean by a roll-up table? I think you were also referring to archiving tables and have the summary information fromt those archived tables available in the roll-up tables ....


On Thu, Nov 15, 2012 at 1:08 PM, Dan Munro <[address removed]> wrote:
I've known people who have built similar systems and a couple of the main sticking points I learned from them:

* You'll be inserting a lot of information, whatever db you use should have good raw insert speed.

* Have archive tables/sections that you can cycle information to as it becomes less useful. Don't delete anything.

* Include roll up tables/sections that just hold statistics, so that when you archive data you keep statistics about it and can query those roll up tables instead, which will keep the queries fast.


On Thu, Nov 15, 2012 at 1:00 PM, Will Wheeler <[address removed]> wrote:
as an addendum, you could consider using external sources like mixpanel or google analytics instead of building out a custom system.


On Thu, Nov 15, 2012 at 12:58 PM, Will Wheeler <[address removed]> wrote:
This is part of the big data problem that a lot of major companies are dealing with. Though the chances are good that you won't ever develop terabtyes of data on daily basis, but you can generate quite a bit and still have concerns.  For example, I regularly work with 100m user data sets that are in the 20gb file size.

And you are definitely right to think about the future and there is a lot to consider.    here is a cool article i read just the other day, it has some comparisons about data warehouse/query speeds:  http://37signals.com/svn/posts/3315-how-i-came-to-love-big-data-or-at-least-acknowledge-its-existence

MySQL seems to be the norm right now, coupled with tools like hadoop/hive/aws.     I don't work with the data warehousing side on a e-commerce level so I can't comment on the db architecture.   If I were to approach it I would definitely try and keep a series of static tables though, not just an annual one (customer, job, page, etc).   By segmenting the data and using guids with proper keys you should be able to keep things moving along fast enough.  Also, remember that for a lot of reporting/analysis, you shouldn't be working off a production server, I keep a small, slightly outdated data set, on a local MSSQL server running so I can bog it down without pissing off the bosses when I need new queries.  




On Thu, Nov 15, 2012 at 11:50 AM, Mark Steudel <[address removed]> wrote:
BACKGROUND

So I'm working on a project that is all about tracking user activity on a site:

1. How much time spent on site
2. how many activities a user has started, finished,
3. How many times a user has come to the site
4. etc.

And of course they want lots of reporting on all these aspects.

The site gets a fair amount of traffic and I can see all this logging
creating quite a bit of traffic especially over years.

QUESTIONS

1. Not knowing much of anything about nosql implementations, would
this project be worth exploring  nosql?

2. If I stick with MySQL, are there some ways I can architect at the
get go that won't make it a bear to work with in say 3 years. (ie some
ginormous table). I was thinking either try and get some sort of
business process retention policy (ie we only need to worry about a
years worth of data) or database retention (ie dynamic tables that
store a years worth of data user_time_2012, user_time_2013 ).

Anyway would love to get some feedback on this.

Thanks, Mark


--
-----------------------------------------
Mark Steudel
P: [masked]
F: [masked]
[address removed]

. : Work : .
http://www.mindfulinteractive.com

. : Play : .
http://www.steudel.org/blog

. : LinkedIn : .
http://www.linkedin.com/in/steudel



--
Please Note: If you hit "REPLY", your message will be sent to everyone on this mailing list ([address removed])
http://www.meetup.com/php-49/
This message was sent by Mark Steudel ([address removed]) from The Seattle PHP Meetup Group.
To learn more about Mark Steudel, visit his/her member profile: http://www.meetup.com/php-49/members/2764664/
Set my mailing list to email me

As they are sent
http://www.meetup.com/php-49/list_prefs/?pref=1

In one daily email
http://www.meetup.com/php-49/list_prefs/?pref=2

Don't send me mailing list messages
http://www.meetup.com/php-49/list_prefs/?pref=0
Meetup, PO Box 4668 #37895 New York, New York[masked] | [address removed]







--
Please Note: If you hit "REPLY", your message will be sent to everyone on this mailing list ([address removed])
This message was sent by Will Wheeler ([address removed]) from The Seattle PHP Meetup Group.
To learn more about Will Wheeler, visit his/her member profile
Set my mailing list to email me As they are sent | In one daily email | Don't send me mailing list messages

Meetup, PO Box 4668 #37895 New York, New York[masked] | [address removed]



--
From the desk of Dan Munro




--
Please Note: If you hit "REPLY", your message will be sent to everyone on this mailing list ([address removed])
This message was sent by Dan Munro ([address removed]) from The Seattle PHP Meetup Group.
To learn more about Dan Munro, visit his/her member profile



--
-----------------------------------------
Mark Steudel
P:[masked]
F:[masked]
[address removed]

. : Work : .
http://www.mindfulinteractive.com
 
. : Play : .
http://www.steudel.org/blog

. : LinkedIn : .
http://www.linkedin.com/in/steudel

Our Sponsors

  • TUNE

    Meeting space and food

  • PluralSight

    PluralSight subscriptions for developer training

  • O'Reilly

    Disc Code: PCBW is good for 40% off print and 50% off ebooks and videos

  • JetBrains PhpStorm

    Occasional free licenses to raffle off at meetups

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy