garvincasimir / elasticsearch-azure-paas Goto Github PK
View Code? Open in Web Editor NEWVisual Studio Project which creates an Elasticsearch cluster on Microsoft Azure using worker roles
License: MIT License
Visual Studio Project which creates an Elasticsearch cluster on Microsoft Azure using worker roles
License: MIT License
This is not really an issue with the project, but something I learned that might save others time. Elasticsearch 2.4.1 (don't know about other versions) will bind to the loopback address by default if you don't specify an address in the configuration file. This is OK when running in the Emulator, but can be problematic when you deploy to Azure because even if you define the correct endpoint in the role settings, you won't be able to connect to Elasticsearch. Binding to the address '0' will allow the Elasticsearch instance to accept requests on the endpoint you have defined.
I have been testing the performance of Elastic Search using the Azure files share persistent storage for Elastic Search indexes and the performance is not good.
I ran two tests one using local storage and one using the file share, both tests indexed 13 million documents. When using local storage indexing took about 45 minutes, using the share it took over 2 hours.
There should be a config value specifying whether to use local storage or azure files.
For systems using an external data store to populate the index, there may not be any need for a persistent disk for the index.
With file based discovery, i think it will make it easier to upgrade to latest ES by getting rid of azure-runtime java plugin
https://www.elastic.co/guide/en/elasticsearch/plugins/current/discovery-file.html
It would be nice if there was a simple process for adding bulk data to the cluster when it is first created. A good sample dataset can be found at courtlistener.com. The bulk processing power shell script I created for processing the data is a good place to start.
In softwareManager.cs line 40 download is called using:
_artifact.Download(_binaryArchive);
This causes the download method to use the Azure Worker role temp folder. This folder has a size limit of 100mb witch is too small to download both Elastic Search and Java.
Changing the call to download to this fixes the problem:
_artifact.Download(_binaryArchive, false);
I added some more logging and have found the reason why Elastic search does not start on new instances during scaling.
When Elastic Search is started it complains about missing JAVA_HOME variable. This variable has been set, but is somehow not picked up.
By restarting the worker role everything works as expected.
By starting the Elastic Search process like this:
_process = new Process();
_process.StartInfo = new ProcessStartInfo
{
FileName = startupScript,
UseShellExecute = false,
RedirectStandardOutput = true
};
_process.Start();
_process.BeginOutputReadLine();
and the capturing the standardoutput like this:
_process.OutputDataReceived +=
delegate(object sender, DataReceivedEventArgs args)
{
var output = args.Data;
if (output == "JAVA_HOME environment variable must be set!")
{
Trace.TraceInformation("JAVA_HOME not set restarting");
throw new Exception("JAVA_HOME not set restarting");
}
};
It is possible to listen for the text "JAVA_HOME environment variable must be set!" which elasticsearch.bat file outputs and restart the worker role by throwing an exception. Not very elegant, but it works as a temporary solution.
When deployed to Azure the RoleRoot point to a drive not a folder like in the emulator.
This leads to an invalid path when constructing PackagePluginPath. The path will look like this:
E:approot\plugins. The code should append a valid separator character to the drive like:
roleRoot = roleRoot + @"";
Configure worker roles (Elasticsearch Nodes) to have a single internally load balanced endpoint on the default Elasticsearch port.
It would be nice if we moved marvel and any additional plugins to a configurable storage container. The only plugin required by this solution is the discovery plugin so we can make an exception and keep it in the solution. We don't want to be dictating what plugins should be used. We also don't want to make it difficult for people to add their own plugins for their purposes.
This project currently supports installing plugins by placing the zip files in a special storage container. However, using a local zip file is only one of the ways an Elasticsearch plugin can be installed. It would be nice if the project supported the following pattern
plugin --install <org>/<user/component>/<version>
I am not sure how this would work but we would need to store a list of these org/user/component/version combinations somewhere then feed it into the plugin installer for processing. It would be nice if the service configuration supported arbitrary lists of strings in a setting.
<ListSettings>
<ListSetting name="ElasticsearchPlugins">
<Setting>elasticsearch/marvel</Setting>
<Setting>elasticsearch/shield</Setting>
</ListSetting>
</ListSettings>
We need a way to set the es_heap_size. I think this value should be set automatically based on the available memory on the worker role. The recommended setting is 50% of the available memory. It should also be possible to override this setting through a config value.
Elastic search recommends setting this value using the environment variable es_heap_size, but I think it would be a better approach setting the Xms and xmx parameters when starting elastic search to minimize dependency.
I'm pretty unfamiliar with how this actually works. I have a really low traffic site but I need ES and was hoping I could just add it via a web job.
I have been trying to debug this part of the code trying to shutdown Elastic Search:
public virtual void Stop()
{
if (_process != null)
{
return;
}
if (_process.HasExited)
{
return;
}
_process.CloseMainWindow();
}
Too me it seems like _process is never null which means that _process.CloseMainWindow(); is never called.
Should the correct implementation be this?
public virtual void Stop()
{
if (_process == null)
{
return;
}
if (_process.HasExited)
{
return;
}
_process.CloseMainWindow();
}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.