用solr做项目已经有一年有余,但都是使用层面,只是利用solr现有机制,修改参数,然后监控调优,从没有对solr进行源码级别的研究。但是,最近手头的一个项目,让我感觉必须把solrn内部原理和扩展机制弄熟,才能把这个项目做好。今天分享的就是:Solr是如何启动并且初始化的。大家知道,部署solr时,分两部分:一、solr的配置文件。二、solr相关的程序、插件、依赖lucene相关的jar包、日志方面的jar。因此,在研究solr也可以顺着这个思路:加载配置文件、初始化各个core、初始化各个core中的requesthandler...
研究solr的启动,首先从solr war程序的web.xml分析开始,下面是solr的web.xml片段:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
|
<web-app xmlns= "http://java.sun.com/xml/ns/javaee" xmlns:xsi= "http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation= "http://java.sun.com/xml/ns/javaee http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd" version= "2.5" metadata-complete= "true" > <!-- Uncomment if you are trying to use a Resin version before 3.0 . 19 . Their XML implementation isn't entirely compatible with Xerces. Below are the implementations to use with Sun's JVM. <system-property javax.xml.xpath.XPathFactory= "com.sun.org.apache.xpath.internal.jaxp.XPathFactoryImpl" /> <system-property javax.xml.parsers.DocumentBuilderFactory= "com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl" /> <system-property javax.xml.parsers.SAXParserFactory= "com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl" /> --> <!-- People who want to hardcode their "Solr Home" directly into the WAR File can set the JNDI property here... --> <!-- Solr配置文件的参数,用于Solr初始化使用 --> <env-entry> <env-entry-name>solr/home</env-entry-name> <env-entry-value>R:/solrhome1/solr</env-entry-value> <env-entry-type>java.lang.String</env-entry-type> </env-entry> <!-- org.apache.solr.servlet.SolrDispatchFilter Solr启动最重要的东东,所以针对solr源码分析,要对这个Filter开始,它主要的作用:加载solr配置文件、初始化各个core、初始化各个requestHandler和component --> <filter> <filter-name>SolrRequestFilter</filter-name> <filter- class >org.apache.solr.servlet.SolrDispatchFilter</filter- class > <!-- If you are wiring Solr into a larger web application which controls the web context root, you will probably want to mount Solr under a path prefix (app.war with /app/solr mounted into it, for example). You will need to put this prefix in front of the SolrDispatchFilter url-pattern mapping too (/solr/*), and also on any paths for legacy Solr servlet mappings you may be using. For the Admin UI to work properly in a path-prefixed configuration, the admin folder containing the resources needs to be under the app context root named to match the path-prefix. For example: .war xxx js main.js --> <!-- <init-param> <param-name>path-prefix</param-name> <param-value>/xxx</param-value> </init-param> --> </filter> |
SolrDispatchFilter 是继承BaseSolrFilter的一个Filter(Filter的作用是啥,大家应该清楚吧,一般web框架级别的产品源码分析都是从filter或者servlet开始)。在介绍SolrDispatchFilter之前,先介绍一下BaseSolrFilter(也许程序员都有刨根问底的习惯)。BaseSolrFilter,是一个实现Filter接口的抽象类,功能很简单,就是判断当前程序是否已经加载日志方面的jar。代码片段如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
|
/** * All Solr filters available to the user's webapp should * extend this class and not just implement {@link Filter}. * This class ensures that the logging configuration is correct * before any Solr specific code is executed. */ abstract class BaseSolrFilter implements Filter { static { // CheckLoggingConfiguration.check(); } } |
着于篇幅,我就不介绍CheckLoggingConfiguration.check() 这里面的东东了。OK,我们回到SolrDispatchFilter上。由于BaseSolrFilter是一个抽象类,所有作为非抽象类的SolrDispatchFilter必须要实现Filter接口。Filter接口如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
|
public interface Filter { //进行初始化 public void init(FilterConfig filterConfig) throws ServletException; //拦截所有的http请求 public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException; //进行注销的动作 public void destroy(); } |
根据上面的注释,我们知道在init方法中是进行初始化的。因此,今天咱们研究SolrDispatchFilter是如何初始化,是离不开这个方法的。接下来,咱们看看SolrDispatchFilter的init方法吧:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
|
@Override public void init(FilterConfig config) throws ServletException { log.info( "SolrDispatchFilter.init()" ); try { // web.xml configuration this .pathPrefix = config.getInitParameter( "path-prefix" ); //各位看客,乾坤尽在此方法中 this .cores = createCoreContainer(); log.info( "user.dir=" + System.getProperty( "user.dir" )); } catch ( Throwable t ) { // catch this so our filter still works log.error( "Could not start Solr. Check solr/home property and the logs" ); SolrCore.log( t ); if (t instanceof Error) { throw (Error) t; } } log.info( "SolrDispatchFilter.init() done" ); } |
咱们顺藤摸瓜,来看看createCoreContainer这个方法到底干了些什么。
1
2
3
4
5
6
7
|
protected CoreContainer createCoreContainer() {<br> //看好了SolrResourceLoader 是用来加载solr home中的配置文件文件的 SolrResourceLoader loader = new SolrResourceLoader(SolrResourceLoader.locateSolrHome()); //加载配置文件<br> ConfigSolr config = loadConfigSolr(loader); CoreContainer cores = new CoreContainer(loader, config);<br> //初始化Core cores.load(); return cores; } |
createCoreContainer这个方法是决定咱们今天能否弄懂Solr初始化和启动的关键。我们顺便简单分析一下这个方法中用到的几个类和方法:
SolrResourceLoader 类如其名,是solr资源加载器。
ConfigSolr 是通过SolrResourceLoader来读取solr配置文件的中信息的。
loadConfigSolr,加载配置信息的方法:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
|
private ConfigSolr loadConfigSolr(SolrResourceLoader loader) { //优先读取solr.solrxml.location配置的信息,往往是通过读取zookeeper中的配置信息进行初始化的,如果没有配置,就会读取solrhome配置项配置的信息(记得web.xml第一个配置项否,就是它) String solrxmlLocation = System.getProperty( "solr.solrxml.location" , "solrhome" ); if (solrxmlLocation == null || "solrhome" .equalsIgnoreCase(solrxmlLocation)) return ConfigSolr.fromSolrHome(loader, loader.getInstanceDir()); //ok 从zookeeper中读取配置信息吧,这是在solrcloud集群下用来solr初始化的 if ( "zookeeper" .equalsIgnoreCase(solrxmlLocation)) { String zkHost = System.getProperty( "zkHost" ); log.info( "Trying to read solr.xml from " + zkHost); if (StringUtils.isEmpty(zkHost)) throw new SolrException(ErrorCode.SERVER_ERROR, "Could not load solr.xml from zookeeper: zkHost system property not set" ); SolrZkClient zkClient = new SolrZkClient(zkHost, 30000 ); try { if (!zkClient.exists( "/solr.xml" , true )) //solr.xml里有描述的zookeeper相关的配置信息 throw new SolrException(ErrorCode.SERVER_ERROR, "Could not load solr.xml from zookeeper: node not found" ); byte [] data = zkClient.getData( "/solr.xml" , null , null , true );<br> //加载配置信息 return ConfigSolr.fromInputStream(loader, new ByteArrayInputStream(data)); } catch (Exception e) { throw new SolrException(ErrorCode.SERVER_ERROR, "Could not load solr.xml from zookeeper" , e); } finally { zkClient.close(); //关闭zookeeper连接 } } throw new SolrException(ErrorCode.SERVER_ERROR, "Bad solr.solrxml.location set: " + solrxmlLocation + " - should be 'solrhome' or 'zookeeper'" ); } |
CoreContainer 就是进行Core初始化工作的。我们主要看看load方法吧,这段方法有点长,代码如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
|
public void load() { log.info( "Loading cores into CoreContainer [instanceDir={}]" , loader.getInstanceDir()); //加载solr共享jar包库 // add the sharedLib to the shared resource loader before initializing cfg based plugins String libDir = cfg.getSharedLibDirectory(); if (libDir != null ) { File f = FileUtils.resolvePath( new File(solrHome), libDir); log.info( "loading shared library: " + f.getAbsolutePath());<br> //对classloader不熟的,可以进去看看 loader.addToClassLoader(libDir, null , false ); loader.reloadLuceneSPI(); } //分片相关的handler加载以及初始化 shardHandlerFactory = ShardHandlerFactory.newInstance(cfg.getShardHandlerFactoryPluginInfo(), loader); updateShardHandler = new UpdateShardHandler(cfg); solrCores.allocateLazyCores(cfg.getTransientCacheSize(), loader); logging = LogWatcher.newRegisteredLogWatcher(cfg.getLogWatcherConfig(), loader); hostName = cfg.getHost(); log.info( "Host Name: " + hostName); zkSys.initZooKeeper( this , solrHome, cfg); collectionsHandler = createHandler(cfg.getCollectionsHandlerClass(), CollectionsHandler. class ); infoHandler = createHandler(cfg.getInfoHandlerClass(), InfoHandler. class ); coreAdminHandler = createHandler(cfg.getCoreAdminHandlerClass(), CoreAdminHandler. class ); //zookeeper 配置信息初始化solr core coreConfigService = cfg.createCoreConfigService(loader, zkSys.getZkController()); containerProperties = cfg.getSolrProperties( "solr" ); // setup executor to load cores in parallel // do not limit the size of the executor in zk mode since cores may try and wait for each other.<br> //多线程初始化core 不熟悉多线的可以驻足研究一会 ExecutorService coreLoadExecutor = Executors.newFixedThreadPool( ( zkSys.getZkController() == null ? cfg.getCoreLoadThreadCount() : Integer.MAX_VALUE ), new DefaultSolrThreadFactory( "coreLoadExecutor" ) ); try { CompletionService<SolrCore> completionService = new ExecutorCompletionService<>( coreLoadExecutor); Set<Future<SolrCore>> pending = new HashSet<>(); List<CoreDescriptor> cds = coresLocator.discover( this ); checkForDuplicateCoreNames(cds); for ( final CoreDescriptor cd : cds) { final String name = cd.getName(); try { if (cd.isTransient() || ! cd.isLoadOnStartup()) { // Store it away for later use. includes non-transient but not // loaded at startup cores. solrCores.putDynamicDescriptor(name, cd); } if (cd.isLoadOnStartup()) { // The normal case Callable<SolrCore> task = new Callable<SolrCore>() { @Override public SolrCore call() { SolrCore c = null ; try { if (zkSys.getZkController() != null ) { //zookeeper模式 preRegisterInZk(cd); } c = create(cd); //普通创建模式 registerCore(cd.isTransient(), name, c, false , false ); } catch (Exception e) { SolrException.log(log, null , e); try { /* if (isZooKeeperAware()) { try { zkSys.zkController.unregister(name, cd); } catch (InterruptedException e2) { Thread.currentThread().interrupt(); SolrException.log(log, null, e2); } catch (KeeperException e3) { SolrException.log(log, null, e3); } }*/ } finally { if (c != null ) { c.close(); } } } return c; } }; pending.add(completionService.submit(task)); } } catch (Exception e) { SolrException.log(log, null , e); } } while (pending != null && pending.size() > 0 ) { try { //获取创建完成的core Future<SolrCore> future = completionService.take(); if (future == null ) return ; pending.remove(future); try { SolrCore c = future.get(); // track original names if (c != null ) { solrCores.putCoreToOrigName(c, c.getName()); } } catch (ExecutionException e) { SolrException.log(SolrCore.log, "Error loading core" , e); } } catch (InterruptedException e) { throw new SolrException(SolrException.ErrorCode.SERVICE_UNAVAILABLE, "interrupted while loading core" , e); } } <br> //solr core的守护线程,在容器关闭或者启动失败的时候,进行资源注销 // Start the background thread backgroundCloser = new CloserThread( this , solrCores, cfg); backgroundCloser.start(); } finally { if (coreLoadExecutor != null ) {<br> //初始化完成,关闭线程池 ExecutorUtil.shutdownNowAndAwaitTermination(coreLoadExecutor); } } if (isZooKeeperAware()) { //如果zookeeper可用 也就是solrcloud模式 // register in zk in background threads Collection<SolrCore> cores = getCores(); if (cores != null ) { for (SolrCore core : cores) { try {<br> //讲core的状态信息注册到zookeeper中 zkSys.registerInZk(core, true ); } catch (Throwable t) { SolrException.log(log, "Error registering SolrCore" , t); } } }<br> // zkSys.getZkController().checkOverseerDesignate(); } } |
在这段代码,关键部分我都做了注释。当你需要优化你的solr启动速度时,你还会来研究这段代码。下面,我们将研究solr的请求过滤处理的部分,我们需要关注doFilter那个方法了(关键部分我作以注释,就不细讲了):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
|
if ( abortErrorMessage != null ) { //500错误处理 ((HttpServletResponse)response).sendError( 500 , abortErrorMessage ); return ; } if ( this .cores == null ) { //solr core初始化失败或者已经关闭 ((HttpServletResponse)response).sendError( 503 , "Server is shutting down or failed to initialize" ); return ; } CoreContainer cores = this .cores; SolrCore core = null ; SolrQueryRequest solrReq = null ; Aliases aliases = null ; if ( request instanceof HttpServletRequest) { //如果是http请求 HttpServletRequest req = (HttpServletRequest)request; HttpServletResponse resp = (HttpServletResponse)response; SolrRequestHandler handler = null ; String corename = "" ; String origCorename = null ; try { // put the core container in request attribute req.setAttribute( "org.apache.solr.CoreContainer" , cores); String path = req.getServletPath(); if ( req.getPathInfo() != null ) { // this lets you handle /update/commit when /update is a servlet path += req.getPathInfo(); } if ( pathPrefix != null && path.startsWith( pathPrefix ) ) { path = path.substring( pathPrefix.length() ); } // check for management path String alternate = cores.getManagementPath(); if (alternate != null && path.startsWith(alternate)) { path = path.substring( 0 , alternate.length()); } // unused feature ? int idx = path.indexOf( ':' ); if ( idx > 0 ) { // save the portion after the ':' for a 'handler' path parameter path = path.substring( 0 , idx ); } // Check for the core admin page if ( path.equals( cores.getAdminPath() ) ) { //solr admin 管理页面请求 handler = cores.getMultiCoreHandler(); solrReq = SolrRequestParsers.DEFAULT.parse( null ,path, req); handleAdminRequest(req, response, handler, solrReq); return ; } boolean usingAliases = false ; List<String> collectionsList = null ; // Check for the core admin collections url if ( path.equals( "/admin/collections" ) ) { //管理collections handler = cores.getCollectionsHandler(); solrReq = SolrRequestParsers.DEFAULT.parse( null ,path, req); handleAdminRequest(req, response, handler, solrReq); return ; } // Check for the core admin info url if ( path.startsWith( "/admin/info" ) ) { //查看admin info handler = cores.getInfoHandler(); solrReq = SolrRequestParsers.DEFAULT.parse( null ,path, req); handleAdminRequest(req, response, handler, solrReq); return ; } else { //otherwise, we should find a core from the path idx = path.indexOf( "/" , 1 ); if ( idx > 1 ) { // try to get the corename as a request parameter first corename = path.substring( 1 , idx ); // look at aliases if (cores.isZooKeeperAware()) { //solr cloud状态 origCorename = corename; ZkStateReader reader = cores.getZkController().getZkStateReader(); aliases = reader.getAliases(); if (aliases != null && aliases.collectionAliasSize() > 0 ) { usingAliases = true ; String alias = aliases.getCollectionAlias(corename); if (alias != null ) { collectionsList = StrUtils.splitSmart(alias, "," , true ); corename = collectionsList.get( 0 ); } } } core = cores.getCore(corename); if (core != null ) { path = path.substring( idx ); } } if (core == null ) { if (!cores.isZooKeeperAware() ) { core = cores.getCore( "" ); } } } if (core == null && cores.isZooKeeperAware()) { // we couldn't find the core - lets make sure a collection was not specified instead core = getCoreByCollection(cores, corename, path); if (core != null ) { // we found a core, update the path path = path.substring( idx ); } // if we couldn't find it locally, look on other nodes if (core == null && idx > 0 ) { String coreUrl = getRemotCoreUrl(cores, corename, origCorename); // don't proxy for internal update requests SolrParams queryParams = SolrRequestParsers.parseQueryString(req.getQueryString()); if (coreUrl != null && queryParams .get(DistributingUpdateProcessorFactory.DISTRIB_UPDATE_PARAM) == null ) { path = path.substring(idx); remoteQuery(coreUrl + path, req, solrReq, resp); return ; } else { if (!retry) { // we couldn't find a core to work with, try reloading aliases // TODO: it would be nice if admin ui elements skipped this... ZkStateReader reader = cores.getZkController() .getZkStateReader(); reader.updateAliases(); doFilter(request, response, chain, true ); return ; } } } // try the default core if (core == null ) { core = cores.getCore( "" ); } } // With a valid core... if ( core != null ) { //验证core final SolrConfig config = core.getSolrConfig(); // get or create/cache the parser for the core SolrRequestParsers parser = config.getRequestParsers(); // Handle /schema/* and /config/* paths via Restlet if ( path.equals( "/schema" ) || path.startsWith( "/schema/" ) || path.equals( "/config" ) || path.startsWith( "/config/" )) { //solr rest api 入口 solrReq = parser.parse(core, path, req); SolrRequestInfo.setRequestInfo( new SolrRequestInfo(solrReq, new SolrQueryResponse())); if ( path.equals(req.getServletPath()) ) { // avoid endless loop - pass through to Restlet via webapp chain.doFilter(request, response); } else { // forward rewritten URI (without path prefix and core/collection name) to Restlet req.getRequestDispatcher(path).forward(request, response); } return ; } // Determine the handler from the url path if not set // (we might already have selected the cores handler) if ( handler == null && path.length() > 1 ) { // don't match "" or "/" as valid path handler = core.getRequestHandler( path ); // no handler yet but allowed to handle select; let's check if ( handler == null && parser.isHandleSelect() ) { if ( "/select" .equals( path ) || "/select/" .equals( path ) ) { //solr 各种查询过滤入口 solrReq = parser.parse( core, path, req ); String qt = solrReq.getParams().get( CommonParams.QT ); handler = core.getRequestHandler( qt ); if ( handler == null ) { throw new SolrException( SolrException.ErrorCode.BAD_REQUEST, "unknown handler: " +qt); } if ( qt != null && qt.startsWith( "/" ) && (handler instanceof ContentStreamHandlerBase)) { //For security reasons it's a bad idea to allow a leading '/', ex: /select?qt=/update see SOLR-3161 //There was no restriction from Solr 1.4 thru 3.5 and it's not supported for update handlers. throw new SolrException( SolrException.ErrorCode.BAD_REQUEST, "Invalid Request Handler ('qt'). Do not use /select to access: " +qt); } } } } // With a valid handler and a valid core... if ( handler != null ) { // if not a /select, create the request if ( solrReq == null ) { solrReq = parser.parse( core, path, req ); } if (usingAliases) { processAliases(solrReq, aliases, collectionsList); } final Method reqMethod = Method.getMethod(req.getMethod()); HttpCacheHeaderUtil.setCacheControlHeader(config, resp, reqMethod); // unless we have been explicitly told not to, do cache validation // if we fail cache validation, execute the query if (config.getHttpCachingConfig().isNever304() || !HttpCacheHeaderUtil.doCacheHeaderValidation(solrReq, req, reqMethod, resp)) { //solr http 缓存 在header控制失效时间的方式 SolrQueryResponse solrRsp = new SolrQueryResponse(); /* even for HEAD requests, we need to execute the handler to * ensure we don't get an error (and to make sure the correct * QueryResponseWriter is selected and we get the correct * Content-Type) */ SolrRequestInfo.setRequestInfo( new SolrRequestInfo(solrReq, solrRsp)); this .execute( req, handler, solrReq, solrRsp ); HttpCacheHeaderUtil.checkHttpCachingVeto(solrRsp, resp, reqMethod); // add info to http headers //TODO: See SOLR-232 and SOLR-267. /*try { NamedList solrRspHeader = solrRsp.getResponseHeader(); for (int i=0; i<solrRspHeader.size(); i++) { ((javax.servlet.http.HttpServletResponse) response).addHeader(("Solr-" + solrRspHeader.getName(i)), String.valueOf(solrRspHeader.getVal(i))); } } catch (ClassCastException cce) { log.log(Level.WARNING, "exception adding response header log information", cce); }*/ QueryResponseWriter responseWriter = core.getQueryResponseWriter(solrReq); writeResponse(solrRsp, response, responseWriter, solrReq, reqMethod); } return ; // we are done with a valid handler } } log.debug( "no handler or core retrieved for " + path + ", follow through..." ); } catch (Throwable ex) { sendError( core, solrReq, request, (HttpServletResponse)response, ex ); if (ex instanceof Error) { throw (Error) ex; } return ; } finally { try { if (solrReq != null ) { log.debug( "Closing out SolrRequest: {}" , solrReq); solrReq.close(); } } finally { try { if (core != null ) { core.close(); } } finally { SolrRequestInfo.clearRequestInfo(); } } } } // Otherwise let the webapp handle the request chain.doFilter(request, response); } |
文章转载请注明出处:http://www.cnblogs.com/likehua/p/4353608.html