Installing and using MADlib with PostgreSQL on OSX
We cover basic installation and usage of PostgreSQL and MADlib on OSX and Ubuntu. Instructions for other environments should be similar. PostgreSQL is an Open Source database with enterprise functionalities which often lack in MySQL. MADlib is an Open Source library which enhances a PostgreSQL or Greenplum database with functionalities for scalable in-database analytics.
First, we need to have a PostgreSQL installation. On OSX, we will use Homebrew instead of compiling the source by ourselves. On Ubuntu, apt-get will do it. Note, Homebrew is expected to be present on your machine. At the time of this writing, the current proposed version is 9.0.1. The following commands will install and configure a new PostgreSQL installation. Additionally, it will be registered as a startup service.
brew update brew install postgres initdb /usr/local/var/postgres mkdir -p ~/Library/LaunchAgents cp /usr/local/Cellar/postgresql/9.1.3/homebrew.mxcl.postgresql.plist ~/Library/LaunchAgents/ launchctl load -w ~/Library/LaunchAgents/homebrew.mxcl.postgresql.plist
In case your having trouble on OSX lion with the following message:
FATAL: could not create shared memory segment: Cannot allocate memory DETAIL: Failed system call was shmget(key=1, size=1646592, 03600).
You should edit “/etc/sysctl.conf” or create it if it doesn’t exist and write:
apt-get update apt-get install postgres
Make sure your
postgres command map to your new PostgreSQL installation, not to the default one. Executing
which postgres should print “/usr/local/bin/postgres”. If you see “/usr/bin/postgres” instead, you need to modify your path and place “/usr/local/bin” before “/usr/bin”. To do do, you may edit your ”
/.bashrc”, ”/.profile” or “/etc/paths” file. Here’s an example:
echo "export PATH=/usr/local/bin:\$PATH" >> ~/.bashr . ~/.profile which postgres
Your PostgreSQL server should already be up and running. In the future, if you want to start/stop the server:
# Start pg_ctl -D /usr/local/var/postgres -l /usr/local/var/postgres/server.log start # Stop pg_ctl -D /usr/local/var/postgres stop -s -m fast
Note, the Homebrew formula configures the system with local “trust” authentication. Authentication is done against the current logged-in Unix user. Homebrew does not create a “postgres” user and the server is launch using the same user account under which you ran Homebrew.
You can test to connect to your new PostgreSQL server by issuing with
psql. For example, the command
psql -d postgres -c "\l" with open a connection to the “postgres” database (using your current username) and run the command
\l to list the databases.
Now, let’s please pgadmin by installing the Instrumentation functions not present by default.
psql -d postgres -c "CREATE EXTENSION "adminpack";" # List installed extensions psql -d postgres -c "select * from pg_extension" # List available extensions psql -d postgres -c "select * from pg_available_extensions"
That’s pretty much with PostgreSQL, now let’s move on with installing the MADlib library.
We start by downloading the MADlib package and run the installation program.
Once installed, MADlib needs to be registered against a PostgreSQL database. We start by creating a database “mad” and then we register it:
psql -d postgres -c "CREATE DATABASE mad" /usr/local/madlib/bin/madpack -p postgres -c $USER@$HOST/mad install
MADlib comes with a complete set of test which you can also use as a usage reference or simply as a source of inspiration. The script are present in “/usr/local/madlib/ports/postgres/modules//test/.sql”. To run the test suite:
/usr/local/madlib/bin/madpack -p postgres -c $USER@$HOST/mad install-check
We will test quantile function. For a simple definition, a quantile is the division of a total into equal subgroups.
As an example, the “.5” quantile of the serie ‘.25,.45,.5,.5,.55,.75’ is “.5”. If we substract “.1” to each element and use the same quantile, then the “.5” quantile of the serie ‘.15,.35,.4,.4,.45,.65’ is “.4”. We may interpret the result by saying that the serie is divided in two at “.4”. Finally, using the last serie, the “.75” quantile is “.425” meaning that one fourth of the values are after “.415”.
We will use the MADlib quantile test as a source of inspiration. It is located at “/usr/local/madlib/ports/postgres/modules/quantile/test/quantile.sql_in”.
Our simplified version will:
- Create a new table
- Insert some random values
- Compute the “.5” quantile
- Drop the table
CREATE TABLE TestQuantile ( val FLOAT ); INSERT INTO TestQuantile SELECT random()*100 FROM generate_series(1,1000); SELECT MADLIB.quantile('TestQuantile', 'val', 0.5); DROP TABLE TestQuantile;
random()*100 function will generate a uniform representation between 0 and 100, we expect a result relatively close to 50.