今天在搞Spring结合JedisCluster操作Redis集群的时候突然发生了异常,令人不得费解...
redis.clients.jedis.exceptions.JedisConnectionException Could not get a resource from the pool
在度娘上查了好久的资料也没有解决,最终把问题定位到了集群的身上,果然...集群中有个节点晾凉了.
好了开始解决问题吧(仅限于集群宕机或者与预期配置不符检测)
怎么确认我们的集群有问题呢?
1.连接你的任意一个集群节点客户端工具 ./redis-cli -p ip -c port -c
这么里边多多输入一些内容进行测试,看看会不会报如下的这个错误(CRC16算法会自动进行slot的匹配,简单的测试就是set a a set b b...) 127.0.0.1:8001> set nima nia -> Redirected to slot [16259] located at :0 Could not connect to Redis at :0: Name or service not known Could not connect to Redis at :0: Name or service not known not connected>
2.通过redis-itrib.rb进行验证(默认这个文件是在你解压redis的src目录下面)
./redis-3.0.0/src/redis-trib.rb check IP:8001|more 8001意思是你的集群的任意一个端口|more 可以裂解为无限制它自动去扫描
[root@rebirth redis-cluster]# ./redis-trib.rb check 169.254.18.18:8001 我的redis-trib.rb在集群目录下面 Connecting to node 169.254.18.18:8001: OK Connecting to node 169.254.18.18:8005: OK Connecting to node 169.254.18.18:8006: OK Connecting to node 169.254.18.18:8003: OK Connecting to node 169.254.18.18:8002: OK Connecting to node 169.254.18.18:8004: OK >>> Performing Cluster Check (using node 169.254.18.18:8001) M: d7d92360722c41bd710c9bfa8ebaff46df8230e2 169.254.18.18:8001 slots:0-5460 (5461 slots) master 1 additional replica(s) S: 6fa989430c84e882becc18af9064a89bf4a4d7de 169.254.18.18:8005 slots: (0 slots) slave replicates cdba77220c27f07a24e7d93e61441a3219fac88d S: 4804979442ac59b91558b2da228270b4218f1e90 169.254.18.18:8006 slots: (0 slots) slave replicates 842bc2d2fe084dbae685953806c2b5f30f016e0b M: 842bc2d2fe084dbae685953806c2b5f30f016e0b 169.254.18.18:8003 slots:10923-16383 (5461 slots) master 1 additional replica(s) M: cdba77220c27f07a24e7d93e61441a3219fac88d 169.254.18.18:8002 slots:5461-10922 (5462 slots) master 1 additional replica(s) S: f2ff252f8820291c9640474943ea7ebdccb08ddb 169.254.18.18:8004 slots: (0 slots) slave replicates d7d92360722c41bd710c9bfa8ebaff46df8230e2 [OK] All nodes agree about slots configuration. >>> Check for open slots... >>> Check slots coverage...
看看这里面
是否跟你配置的集群信息是否相符,是否有没有连接成功的,缺少的话就证明你集群配置有问题了...
如果是集群配置有问题请往下看
解决方案:
1.把每个节点redis里面的nodes.conf文件全部删除,只要看到这个在你集群文件里,你就rm -rf 干掉就ok了
2.重新配置集群 例如我们的一个6个redis 这里不过多讲解,请移居度娘
./redis-trib.rb create --replicas 1 169.254.18.18:8001 169.254.18.18:8002 169.254.18.18:8003 169.254.18.18:8004 169.254.18.18:8005 169.254.18.18:8006
ps:
思路就是这样的,还是不ok的话,另行研究咯,祝你早日脱离bug |