I would be suspicious of most tech talks on this. If someone is giving a tech ta...

buremba · on June 12, 2019

That's actually what we do at Rakam. Postgresql fits in many analytics workloads with partitioned tables, parallel queries, and BRIN indexes. The only limitation is that since it's not horizontally scalable, your data must fit in one server. `it just works` up to ~10M events per month.

The SDKs provide ways to send the event data with the following format:

rakam.logEvent('pageview', {url: 'https://test.com'})

The event types and attributes are all dynamic. The API server automatically infers the attribute types from the JSON blob and creates the tables which correspond to event types and the columns which correspond to event attributes and inserts the data into that table. It also enriches the events with visitor information such as user agent, location, referrer, etc. The users just run the following SQL query:

SELECT url, count(*) from pageview where _city = 'New York' group by 1

All the project is open-source: https://github.com/rakam-io/rakam Would love to get some contribution!

mrchlhblng · on June 24, 2019

To scale a PG db horizontally, you may want to look at https://www.citusdata.com/ (they were recently bought by Microsoft but I don't expect any change on the Open Source part).

btown · on June 12, 2019

This also makes a lot of things really easy. Want to join against your products table to see what product categories are most popular for certain customer segments? It’s a single query or Tableau drag-and-drop away. You don’t know what you’ll need to access fast to answer business questions, so use a system designed for flexibility until you can’t.

C1sc0cat · on June 12, 2019

Dumping out to a text file quickly and have some aysnronous queue insert that into your database Is one solution but you have to watch uniqifiers, reconciliation and handling failed inserts so you can fix and reinject any failed records.

okjustgo · on June 12, 2019

Can you elaborate on the log then load advice? Specifically the problems it solves or issues it prevents?

psds2 · on June 12, 2019

If you mess up the database you can just reread the log files. It also helps manage backpressure during activity spikes where your db can't keep up.