As an aside I experimented a bit with the build process for this slide deck. It's a Gitlab project and I set up a Gitlab CI pipeline to run Latex to build the beamer presentation and serve it from Gitlab pages. So you can see the most recent version of the slide deck from https://_stark.gitlab.io/monitoring-postgres-pgconf.eu-2018/monitoring.pdf any time and it's automatically rebuilt each time I do a git push.
You can also see the source code at https://gitlab.com/_stark/monitoring-postgres-pgconf.eu-2018 and feel free to submit issues if you spot any errors in the slides or even just suggestions on things that were unclear. **
But now the real question. I want to improve the monitoring situation in Postgres. I have all kinds of grand plans but would be interested to hear what people's feelings about what's the top priority and most practical changes they want to see in Postgres.
Personally I think the most important first step is to implement native Prometheus support in Postgres -- probably a background worker that would start up and expose all of pg_stats directly from shared memory to Prometheus without having to start an SQL session with all its transaction overhead. That would make things more efficient but also more reliable during outages. It would also make it possible to export data for all databases instead of having the agent have to reconnect for each database!
I have future thoughts about distributed tracing, structured logging, and pg_stats changes to support application profiling but they are subjects for further blog posts. I have started organizing my ideas as issues in https://gitlab.com/_stark/postgresql/issues feel free to comment on them or create new issues if you think it's something in these areas!
** (These URLs may have to change as the underscore is actually not legal, c.f. https://gitlab.com/gitlab-org/gitlab-ce/issues/40241)