How many times you tried to install Ceph? How many fails with no reason?
All Ceph operator should agree with me when i say that Ceph installer doesn’t really works as expected so far.
Yes, i’m talking about ceph-deploy and the main reason why i’m posting this guide about deploying Ceph with Ansible.
At this post, i will show how to install a Ceph cluster with Ansible on baremetal servers.
My configuration is as follows:
- 3 x ceph monitors 8GB of RAM each one
- 3 x OSD nodes 16GB of RAM and 3×100 GB of Disk
- 1 x RadosGateway node 8GB of RAM
First, download Ceph-Ansible playbooks
git clone https://github.com/ceph/ceph-ansible/ Cloning into 'ceph-ansible'... remote: Counting objects: 5764, done. remote: Compressing objects: 100% (38/38), done. remote: Total 5764 (delta 7), reused 0 (delta 0), pack-reused 5726 Receiving objects: 100% (5764/5764), 1.12 MiB | 1.06 MiB/s, done. Resolving deltas: 100% (3465/3465), done. Checking connectivity... done.
Move to the newly created folder called ceph-ansible
cd ceph-ansible/
Copy sample vars files, we will configure our environment in these variable files.
cp site.yml.sample site.yml cp group_vars/all.sample group_vars/all cp group_vars/mons.sample group_vars/mons cp group_vars/osds.sample group_vars/osds
Next step is configure the inventory with our servers, i don’t really like use /etc/ansible/host file, i prefer create a new file per environment inside playbook’s folder.
Create a file with the following content, use you own IPs to match your servers on the desired role inside the cluster
[root@ansible ~]# vi inventory_hosts [mons] 192.168.1.48 192.168.1.49 192.168.1.52 [osds] 192.168.1.50 192.168.1.53 192.168.1.54 [rgws] 192.168.1.55
Test connectivity to you servers pinging them through Ansible ping module
[root@ansible ~]# ansible -m ping -i inventory_hosts all 192.168.1.48 | success >> { "changed": false, "ping": "pong" } 192.168.1.50 | success >> { "changed": false, "ping": "pong" } 192.168.1.55 | success >> { "changed": false, "ping": "pong" } 192.168.1.53 | success >> { "changed": false, "ping": "pong" } 192.168.1.49 | success >> { "changed": false, "ping": "pong" } 192.168.1.54 | success >> { "changed": false, "ping": "pong" } 192.168.1.52 | success >> { "changed": false, "ping": "pong" }
Edit site.yml file, i will remove/comment mds nodes since i’m not going to use them.
[root@ansible ~]# vi site.yml - hosts: mons become: True roles: - ceph-mon - hosts: agents become: True roles: - ceph-agent - hosts: osds become: True roles: - ceph-osd #- hosts: mdss # become: True # roles: # - ceph-mds - hosts: rgws become: True roles: - ceph-rgw - hosts: restapis become: True roles: - ceph-restapi
Edit main variable file, here we are going to configure our environment
[root@ansible ~]# vi group_vars/all
Here we configure from where ceph packages are going to be installed, for now we use upstream code with the stable release Infernalis.
## Configure package origin ceph_origin: upstream ceph_stable: true ceph_stable_release: infernalis
Configure interface on which monitor will be listening
## Monitor options monitor_interface: eth2
Here we configure some OSD options, like journal size and what networks will be used by public and cluster data replication
## OSD options journal_size: 1024 public_network: 192.168.1.0/24 cluster_network: 192.168.200.0/24
Edit osds variable file
[root@ansible ~]# vi group_vars/osds
I will use auto discovery option to allow ceph ansible select empy or not used devices in my servers to create OSDs.
# Declare devices osd_auto_discovery: True journal_collocation: True
Of course you can use other options, i’ll highly suggest you to read variable comments, as they provide valuable information about usage.
We’re ready to deploy ceph with ansible with our custom inventory_hosts file.
[root@ansible ~]# ansible-playbook site.yml -i inventory_hosts
After a while, you will have a fully functional ceph cluster.
Maybe you find some issues or bugs when running the playbooks.
There is a lot of efforts to fix issues on upstream repository. If a new bug is encountered, please, post a issue right here.
https://github.com/ceph/ceph-ansible/issues
You can check your cluster status with ceph -s. we can see all OSDs are up and pgs active/clean.
[root@ceph-mon1 ~]# ceph -s cluster 5ff692ab-2150-41a4-8b6d-001a4da21c9c health HEALTH_OK monmap e1: 3 mons at {ceph-mon1=192.168.200.141:6789/0,ceph-mon2=192.168.200.180:6789/0,ceph-mon3=192.168.200.232:6789/0} election epoch 6, quorum 0,1,2 ceph-mon1,ceph-mon2,ceph-mon3 osdmap e10: 9 osds: 9 up, 9 in flags sortbitwise pgmap v32: 64 pgs, 1 pools, 0 bytes data, 0 objects 102256 kB used, 896 GB / 896 GB avail 64 active+clean
We are going to do some tests.
Create a pool
[root@ceph-mon1 ~]# ceph osd pool create test 128 128 pool 'test' created
Create a file big file
[root@ceph-mon1 ~]# dd if=/dev/zero of=/tmp/sample.txt bs=2M count=1000 1000+0 records in 1000+0 records out 2097152000 bytes (2.1 GB) copied, 16.7386 s, 125 MB/s
Upload the file to rados
[root@ceph-mon1 ~]# rados -p test put sample /tmp/sample.txt
Check om which placement groups your file is saved
[root@ceph-mon1 ~]# ceph osd map test sample osdmap e13 pool 'test' (1) object 'sample' -> pg 1.bddbf0b9 (1.39) -> up ([1,0], p1) acting ([1,0], p1)
Query the placement group where you file was uploaded, a similar output will prompts
[root@ceph-mon1 ~]# ceph pg 1.39 query { "state": "active+clean", "snap_trimq": "[]", "epoch": 13, "up": [ 1, 0 ], "acting": [ 1, 0 ], "actingbackfill": [ "0", "1" ], "info": { "pgid": "1.39", "last_update": "13'500", "last_complete": "13'500", "log_tail": "0'0", "last_user_version": 500, "last_backfill": "MAX", "last_backfill_bitwise": 0, "purged_snaps": "[]", "history": { "epoch_created": 11, "last_epoch_started": 12, "last_epoch_clean": 13, "last_epoch_split": 0, "last_epoch_marked_full": 0, "same_up_since": 11, "same_interval_since": 11, "same_primary_since": 11, "last_scrub": "0'0", "last_scrub_stamp": "2016-03-16 21:13:08.883121", "last_deep_scrub": "0'0", "last_deep_scrub_stamp": "2016-03-16 21:13:08.883121", "last_clean_scrub_stamp": "0.000000" }, "stats": { "version": "13'500", "reported_seq": "505", "reported_epoch": "13", "state": "active+clean", "last_fresh": "2016-03-16 21:24:40.930724", "last_change": "2016-03-16 21:14:09.874086", "last_active": "2016-03-16 21:24:40.930724", "last_peered": "2016-03-16 21:24:40.930724", "last_clean": "2016-03-16 21:24:40.930724", "last_became_active": "0.000000", "last_became_peered": "0.000000", "last_unstale": "2016-03-16 21:24:40.930724", "last_undegraded": "2016-03-16 21:24:40.930724", "last_fullsized": "2016-03-16 21:24:40.930724", "mapping_epoch": 11, "log_start": "0'0", "ondisk_log_start": "0'0", "created": 11, "last_epoch_clean": 13, "parent": "0.0", "parent_split_bits": 0, "last_scrub": "0'0", "last_scrub_stamp": "2016-03-16 21:13:08.883121", "last_deep_scrub": "0'0", "last_deep_scrub_stamp": "2016-03-16 21:13:08.883121", "last_clean_scrub_stamp": "0.000000", "log_size": 500, "ondisk_log_size": 500, "stats_invalid": "0", "stat_sum": { "num_bytes": 2097152000, "num_objects": 1, "num_object_clones": 0, "num_object_copies": 2, "num_objects_missing_on_primary": 0, "num_objects_degraded": 0, "num_objects_misplaced": 0, "num_objects_unfound": 0, "num_objects_dirty": 1, "num_whiteouts": 0, "num_read": 0, "num_read_kb": 0, "num_write": 500, "num_write_kb": 2048000, "num_scrub_errors": 0, "num_shallow_scrub_errors": 0, "num_deep_scrub_errors": 0, "num_objects_recovered": 0, "num_bytes_recovered": 0, "num_keys_recovered": 0, "num_objects_omap": 0, "num_objects_hit_set_archive": 0, "num_bytes_hit_set_archive": 0, "num_flush": 0, "num_flush_kb": 0, "num_evict": 0, "num_evict_kb": 0, "num_promote": 0, "num_flush_mode_high": 0, "num_flush_mode_low": 0, "num_evict_mode_some": 0, "num_evict_mode_full": 0 }, "up": [ 1, 0 ], "acting": [ 1, 0 ], "blocked_by": [], "up_primary": 1, "acting_primary": 1 }, "empty": 0, "dne": 0, "incomplete": 0, "last_epoch_started": 12, "hit_set_history": { "current_last_update": "0'0", "history": [] } }, "peer_info": [ { "peer": "0", "pgid": "1.39", "last_update": "13'500", "last_complete": "13'500", "log_tail": "0'0", "last_user_version": 0, "last_backfill": "MAX", "last_backfill_bitwise": 0, "purged_snaps": "[]", "history": { "epoch_created": 11, "last_epoch_started": 12, "last_epoch_clean": 13, "last_epoch_split": 0, "last_epoch_marked_full": 0, "same_up_since": 0, "same_interval_since": 0, "same_primary_since": 0, "last_scrub": "0'0", "last_scrub_stamp": "2016-03-16 21:13:08.883121", "last_deep_scrub": "0'0", "last_deep_scrub_stamp": "2016-03-16 21:13:08.883121", "last_clean_scrub_stamp": "0.000000" }, "stats": { "version": "0'0", "reported_seq": "0", "reported_epoch": "0", "state": "inactive", "last_fresh": "0.000000", "last_change": "0.000000", "last_active": "0.000000", "last_peered": "0.000000", "last_clean": "0.000000", "last_became_active": "0.000000", "last_became_peered": "0.000000", "last_unstale": "0.000000", "last_undegraded": "0.000000", "last_fullsized": "0.000000", "mapping_epoch": 0, "log_start": "0'0", "ondisk_log_start": "0'0", "created": 0, "last_epoch_clean": 0, "parent": "0.0", "parent_split_bits": 0, "last_scrub": "0'0", "last_scrub_stamp": "0.000000", "last_deep_scrub": "0'0", "last_deep_scrub_stamp": "0.000000", "last_clean_scrub_stamp": "0.000000", "log_size": 0, "ondisk_log_size": 0, "stats_invalid": "0", "stat_sum": { "num_bytes": 0, "num_objects": 0, "num_object_clones": 0, "num_object_copies": 0, "num_objects_missing_on_primary": 0, "num_objects_degraded": 0, "num_objects_misplaced": 0, "num_objects_unfound": 0, "num_objects_dirty": 0, "num_whiteouts": 0, "num_read": 0, "num_read_kb": 0, "num_write": 0, "num_write_kb": 0, "num_scrub_errors": 0, "num_shallow_scrub_errors": 0, "num_deep_scrub_errors": 0, "num_objects_recovered": 0, "num_bytes_recovered": 0, "num_keys_recovered": 0, "num_objects_omap": 0, "num_objects_hit_set_archive": 0, "num_bytes_hit_set_archive": 0, "num_flush": 0, "num_flush_kb": 0, "num_evict": 0, "num_evict_kb": 0, "num_promote": 0, "num_flush_mode_high": 0, "num_flush_mode_low": 0, "num_evict_mode_some": 0, "num_evict_mode_full": 0 }, "up": [], "acting": [], "blocked_by": [], "up_primary": -1, "acting_primary": -1 }, "empty": 0, "dne": 0, "incomplete": 0, "last_epoch_started": 12, "hit_set_history": { "current_last_update": "0'0", "history": [] } } ], "recovery_state": [ { "name": "Started\/Primary\/Active", "enter_time": "2016-03-16 21:13:36.769083", "might_have_unfound": [], "recovery_progress": { "backfill_targets": [], "waiting_on_backfill": [], "last_backfill_started": "MIN", "backfill_info": { "begin": "MIN", "end": "MIN", "objects": [] }, "peer_backfill_info": [], "backfills_in_flight": [], "recovering": [], "pg_backend": { "pull_from_peer": [], "pushing": [] } }, "scrub": { "scrubber.epoch_start": "0", "scrubber.active": 0, "scrubber.waiting_on": 0, "scrubber.waiting_on_whom": [] } }, { "name": "Started", "enter_time": "2016-03-16 21:13:09.216260" } ], "agent_state": {} }
That’s all for now.
Regards, Eduardo Gonzalez