sábado, 13 de dezembro de 2014

Randomizing with Elasticsearch: a practical example

This post explains how to return shuffle values returned by Elasticsearch. The use case is the situation where we want to avoid that users receive results of only one (or few) type. For instance, if you have a ecommerce and you want to return products of different brands, even though, you have some brands that dominates your dataset (ie., you have a brand that represents a large amount of your data).

1. In order to test that I created a small dataset with 15 products of 3 brands.

PUT /test

PUT /test/products/1
{
"name" : "product 1",
"brand" : "brand1"
}

PUT /test/products/2
{
"name" : "product 2",
"brand" : "brand1"
}

PUT /test/products/3
{
"name" : "product 3",
"brand" : "brand1"
}

PUT /test/products/4
{
"name" : "product 4",
"brand" : "brand1"
}

PUT /test/products/5
{
"name" : "product 5",
"brand" : "brand1"
}

PUT /test/products/6
{
"name" : "product 6",
"brand" : "brand2"
}

PUT /test/products/7
{
"name" : "product 7",
"brand" : "brand2"
}

PUT /test/products/8
{
"name" : "product 8",
"brand" : "brand2"
}

PUT /test/products/9
{
"name" : "product 9",
"brand" : "brand2"
}

PUT /test/products/10
{
"name" : "product 10",
"brand" : "brand2"
}

PUT /test/products/11
{
"name" : "product 11",
"brand" : "brand3"
}

PUT /test/products/12
{
"name" : "product 12",
"brand" : "brand3"
}

PUT /test/products/13
{
"name" : "product 13",
"brand" : "brand3"
}

PUT /test/products/14
{
"name" : "product 14",
"brand" : "brand3"
}

PUT /test/products/15
{
"name" : "product 15",
"brand" : "brand3"
}

2. I did three queries: (A) one without any sorting, (B) a sort script using the Java hashCode function and (C) the last one using Elasticsearch function of random_score.

POST /test/products/_search
{
"from": 0,
"size": 3,
"query": {
"match": {
"name": "product"
}
}
}

POST /test/products/_search
{
"from": 0,
"size": 3,
"query": {
"match": {
"name": "product"
}
},
"sort": {
"_script": {
"script": "(doc['_id'].value + seed).hashCode()",
"type": "number",
"params": {
"seed": "doc['name'].value"
},
"order": "asc"
}
}
}

POST /test/products/_search
{
"from": 0,
"size": 3,
"query": {
"function_score": {
"query": {
"match": {
"name": "product"
},
"functions": [
{
"random_score": {
"seed": "1"
}
}
],
"score_mode": "sum"
}
}
}
}

3. The results were:

A. brand 2, brand 2, brand 1

B. brand 1, brand 3, brand 2

C. brand 3, brand 3, brand 1

4. My conclusion is that using the Java hash code function is the best approach. The ramdon_score function is interesting if you want to keep the results consistents for a same user (you can use the user id as the seed of this function).

Best regards,

Luiz

sexta-feira, 7 de novembro de 2014

My new paper (in Portuguese):

Elasticsearch: desenvolvendo Big Data com PHP

quinta-feira, 11 de setembro de 2014

Popularity + most searched terms

In this post I will combine the popularity of a document with the most searched terms in order to boost the results based on past search issued against a particular index.

PUT blogposts

PUT statistics

PUT /blogposts/post/1
{
"title": "About popularity",
"content": "In this post we will talk about...",
"votes": 6
}

PUT /blogposts/post/2
{
"title": "About elasticsearch",
"content": "In this post we will talk about...",
"votes": 3
}

PUT /blogposts/post/3
{
"title": "About popularity",
"content": "In this post we will talk about...",
"votes": 7
}

PUT /statistics/queries/1
{
"user_query": "popularity"
}

PUT /statistics/queries/2
{
"user_query": "popularity in elasticsearch"
}

PUT /statistics/queries/3
{
"user_query": "boost"
}

PUT /statistics/queries/4
{
"user_query": "boost in elasticsearch"
}

PUT /statistics/queries/5
{
"user_query": "elasticsearch is the best search engine"
}

GET blogposts/post/_mapping

GET statistics/queries/_mapping

POST statistics/queries/_search
{
"query" : {
"match_all" : {}
},
"facets": {
"keywords": {
"terms": {
"field": "user_query"
}
}
}
}

POST blogposts/post/_search
{

"sort" : [
{ "votes" : {"order" : "desc"}},
"_score"
],
"query" : {
"match" : {
"title":{
"query":"elasticsearch popularity"

}
}
}
}

quarta-feira, 23 de abril de 2014

[ElasticSearch] An example of using SetFetchSource

The SetFetchSource is a new feature of ElasticSearch1.1. This is a example of how to use it:

String [] excludes = {"field0"};
String [] includes = {"field1","field2"};

SearchResponse searcher = client.getClient().prepareSearch(INDEX).setFetchSource(includes, excludes).setQuery(qb).execute().actionGet();

If you do not include the "addFields()" command, you will not be able to iterate over the fields of the hits. However, you can iterate using "sourceAsMap()".

for (SearchHit hit : searcher.getHits().getHits()) {
    Map hits = hit.sourceAsMap();

    for (String key : hits.keySet()) {
        //do something
    }

    Object [] fieldValues = hits.values().toArray();

    for (Object fieldValue:fieldValues) {
        //do something
    }
}

As you will notice, the excluded fields at "excludes" will not show up. Also, if you try "hit.getFields()" it will be empty.

References:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-source-field.html

https://github.com/elasticsearch/elasticsearch/blob/master/src/test/java/org/elasticsearch/search/source/SourceFetchingTests.java

terça-feira, 22 de abril de 2014

[ElasticSearch] QueryBuild vs. String: ElasticsearchParseException[Failed to derive xcontent from...

I had a problem while writing tests for use the new setFetchSource feature of ES 1.1. After some test, research and thinking (try and error). I finally realised that the problem rises when you are using a plain string query instead of a QueryBuilder.

org.elasticsearch.action.search.SearchPhaseExecutionException: Failed to execute phase [query_fetch], all shards failed; shardFailures {[KMpFYBpxRECgm3gJC_1-uw][content][0]: SearchParseException[[content][0]: from[-1],size[10]: Parse Failure [Failed to parse source [{"size":10,"query_binary":"VGVzdHRleHQ=","_source":{"includes":["CONTENTID","URL"],"excludes":["CONTENT"]},"fields":["CONTENTID","URL"]}]]]; nested: ElasticsearchParseException[Failed to derive xcontent from (offset=0, length=8): [84, 101, 115, 116, 116, 101, 120, 116]]; }
    at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.onFirstPhaseResult(TransportSearchTypeAction.java:272)
    at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$3.onFailure(TransportSearchTypeAction.java:224)
    at org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteFetch(SearchServiceTransportAction.java:307)
    at org.elasticsearch.action.search.type.TransportSearchQueryAndFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryAndFetchAction.java:71)
    at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216)
    at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:203)
    at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$2.run(TransportSearchTypeAction.java:186)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)

Makes sense, since the error says: "failed to derive xcontent from (offset". However, for a not native english speaker, took some time. Just change:

String query="";

SearchResponse response = client.prepareSearch(CONTENT_INDEX).setFetchSource(includes, excludes).setQuery(query).addFields(includes)
                    .setSize(10).execute().actionGet();

To:

String query="";

QueryStringQueryBuilder qb = QueryBuilders.queryString(query);

SearchResponse response = client.prepareSearch(CONTENT_INDEX).setFetchSource(includes, excludes).setQuery(qb).addFields(includes)
                    .setSize(10).execute().actionGet();

terça-feira, 15 de abril de 2014

SecurityException: WifiService: Neither user 10082 nor current process has android.permission.ACCESS_WIFI_STATE

If you are trying to get scan the networks through a mobile device, first of all you have to add the permissions in your AndroidManifest.xml:

public class MainActivity ... {

public void scanWiFi(View view) {
        mainWifi = (WifiManager) getSystemService(Context.WIFI_SERVICE);

        registerReceiver(receiverWifi, new IntentFilter(WifiManager.SCAN_RESULTS_AVAILABLE_ACTION));
        mainWifi.startScan();

}

    class WifiReceiver extends BroadcastReceiver {
        public void onReceive(Context c, Intent intent) {
            StringBuilder sb = new StringBuilder();
            wifiList = mainWifi.getScanResults();
            for (int i = 0; i < wifiList.size(); i++) {
                sb.append(new Integer(i + 1).toString() + ".");
                sb.append((wifiList.get(i)).SSID.toString());
                sb.append((wifiList.get(i)).frequency);
                sb.append((wifiList.get(i)).level);
                sb.append("\\n");
            }
            mainText.append(sb);
        }
    }

}

terça-feira, 8 de abril de 2014

How to use the Path Hierarchy Tokenizer in ElasticSearch

How to use the Path Hierarchy Tokenizer:

0. If you and just to check the output of this tokenizer, you can run:

curl -XGET 'localhost:9200/_analyze?tokenizer=path_hierarchy&filters=lowercase' -d '/something/something/else'

1. Put a map:

$ curl -XPUT localhost:9200/index_files3/ -d '
{
"settings":{
     "index":{
        "analysis":{
           "analyzer":{
              "analyzer_path_hierarchy":{
                 "tokenizer":"my_path_hierarchy_tokenizer",
                 "filter":"lowercase"
              }
           },
       "tokenizer" : {
              "my_path_hierarchy_tokenizer" : {
                  "type" : "path_hierarchy",
                  "delimiter" : "/",
                  "replacement" : "*",
                  "buffer_size" : "1024",
                  "reverse" : "true",
                  "skip" : "0"
               }
           }
        }
     }
},
"mappings":{
     "file":{
        "properties":{
           "path":{
              "analyzer":"analyzer_path_hierarchy",
              "type":"string"
           }
        }
     }
}
}'

2. Get the map:

curl -XGET 'http://localhost:9200/index_files3/file/_mapping'

3. Add a new document:

$ curl -XPUT 'http://localhost:9200/index_files3/file/1' -d '{
    "name" : "c1",
    "text" : "t1",
    "path" : "/c1/c2/c3"
}'

4. Check if it is there:

$ curl -XGET 'http://localhost:9200/index_files3/file/1'

5. Search:

Fail: curl -XGET 'http://localhost:9200/index_files3/_search?q=path:c1'

Success: curl -XGET 'http://localhost:9200/index_files3/_search?q=path://c1'

segunda-feira, 24 de março de 2014

Error: Could not find or load main class org.codehaus.classworlds.Launcher

$ which mvn

$ sudo apt-get install maven

$ mvn -version

The command apt-get install the Maven in /usr/share/maven, then

export M2_HOME=/usr/share/maven
export M2=%M2_HOME%\bin

PATH=$PATH:$JAVA_HOME
PATH=$PATH:$M2
export PATH

http://www.cubeia.com/2009/02/10-sure-signs-you-are-doing-maven-wrong/

terça-feira, 18 de março de 2014

Regular expression (regex) for JDBC connection string/URL

Regular expression (regex) for JDBC connection string/URL

String regex= "(^jdbc:"+databaseName.toLowerCase()+":){1,1}\\S*(:?)(//+)\\S*(/+)\\S+";

Good place to test:

http://java-regex-tester.appspot.com/

quinta-feira, 13 de março de 2014

[RASCUNHO] Cola (cheatsheet) para problemas na instalação de MySQL no Ubuntu

Purge
sudo apt-get purge mysql-server mysql-client mysql-common mysql-client-5.5 mysql-server-5.5 sudo apt-get purge mariadb-server-5.5
sudo apt-get purge mariadb

Remove

sudo rm -rf mysql.dpkg-bak
sudo apt-get remove mysql-server mysql-client mysql-common
sudo apt-get autoremove
sudo rm -rf /var/lib/mysql
sudo deluser mysql

Install

sudo apt-get install mysql-server-core-5.5
sudo apt-get install -f mysql-server (-fix)
sudo apt-get install -f libaio1

Install from deb pack

sudo dpkg -i mysql-5.6.13-debian6.0-i686.deb

Logs/status

service mysql status
ps ax | grep mysql
grep mysql /var/log/syslog
grep mysql /var/log/daemon.log
sudo netstat -tap | grep mysql
sudo service mysql restart
mysql –version
which mysql

DPKG

sudo dpkg --configure -a
sudo rm /var/lib/dpkg/lock
sudo rm /var/cache/apt/archives/lock
sudo dpkg -S etc/mysql

Check

sudo apt-get remove --purge mysql-server mysql-client mysql-common
sudo apt-get autoremove
sudo apt-get autoclean
sudo deluser mysql
sudo rm -rf /var/lib/mysql

Connection URL

jdbc:mysql://localhost:3306/test

Reference

https://rtcamp.com/tutorials/mysql/mysql-5-6-ubuntu-12-04/

terça-feira, 11 de março de 2014

Instalando Jenkins e o plugin JMeter (plugin Performance)

Jenkins

1. O procedimento de instalação pode ser encontrado em:

https://wiki.jenkins-ci.org/display/JENKINS/Installing+Jenkins+on+Red+Hat+distributions

2. Verifique se ele está funcionando :

[root@spike ~]# sudo service jenkins start
Starting Jenkins [ OK ]

Os logs de Jenkins prazo será colocado em:
/ var / log / Jenkins / jenkins.log
Encontre algo como:
INFO: Jenkins está totalmente instalado e funcionando
3 Acessando Jenkins :

A informação geral pode ser encontrada em :
https://wiki.jenkins-ci.org/display/JENKINS/Starting+and+Accessing+Jenkins
No entanto, é necessário criar um túnel na porta 8080.
ssh- f root @ spike- L 8080:127.0.0.1:8080 -N
Jenkins deve estar disponível em :
http://127.0.0.1:8080/

plugins JMeter

As páginas Jenkins recomenda a utilização do plugin do Desempenho em vez de JMeter plugin.

https://wiki.jenkins-ci.org/display/JENKINS/Performance+Plugin
A maneira mais fácil de instalar é Jenkins pela interface Plugin Representante: http://127.0.0.1:8080/pluginManager/
Vá na aba " disponível" e encontrar " plug-in de desempenho" ( figura abaixo)
Vá para " download agora e instale depois restart"
Depois de terminar o donwload , reinicie o serviço Jenkins : " jenkins serviço sudo restart"
instale JMeter
Faça o download :
wget http://apache.mirror.iphh.net//jmeter/binaries/apache-jmeter-2.11.tgz
Extraia os arquivos para :/ var / lib / Jenkins
Renomear ( mv ) apache- jmeter 2.11 como jmeter
Para verificar se ele está instalado , é necessário criar um novo emprego.
New Job
Projeto de software free-style
Sob Build, selecionar Invoke Formiga
Selecione Avançado
Público-alvo: todos
Build Path : / var / lib / Jenkins / jobs / MY_PROJECT_NAME / build.xml
Propriedades: jmeter.dir = / var / lib / Jenkins / JMeter ( referência para a etapa 5.c )
Down in Post- construir ações verifique a caixa de seleção Publicar Desempenho relatório de resultado de teste
Adicionar uma nova caixa de relatório e escolha JMeter
Definir o ramo de construir a JMeter
Para arquivos de relatório Especifique ** / *. JTL
Adicione o build.xml na raiz da pasta de trabalho ( / var / lib / Jenkins / jobs / MY_PROJECT_NAME / ) . No final eu adicionei um arquivo build.xml .
Pasta de Sub- the-job ( / var / lib / Jenkins / jobs / MY_PROJECT_NAME / ), crie a seguinte estrutura de diretório (a pasta da área de trabalho e todos os sub-pasta que tem que ser em chmod 777 " chmod -R 0777 / *") .
+ Scripts
+ Workspace
+ + Resultados
+ + + + JTL
+ + + + HTML
Na pasta scripts de carregar os scripts JMeter arquivos * . JMX

[RASCUNHO] Coleção de textos sobre como modificar a função de similaridade do ElasticSearch

It is necessary to extends the classes:
org.apache.lucene.search.similarities.Similarity;
org.elasticsearch.index.similarity.AbstractSimilarityProvider;

Since the ES 0.9 it is possible to change the similarity function to each field. This document explains how:
http://elasticsearchserverbook.com/elasticsearch-0-90-similarities/

General information:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-similarity.html

About the tf/idf and BM25:
https://www.found.no/foundation/similarity/
http://stackoverflow.com/questions/19423423/simple-explanation-of-different-elasticsearch-similarity-algorithms
https://groups.google.com/forum/#!topic/elasticsearch/UZYa49_9AFg

Implementation of a custom function:
http://elasticsearch-users.115913.n3.nabble.com/How-to-use-ElasticSearch-Custom-Similarity-provider-classes-td4047683.html
https://github.com/awnuxkjy/es-custom-similarity-provider/tree/master/src/main/java/org/elasticsearch/index/similarity
http://elasticsearch-users.115913.n3.nabble.com/Configuring-a-Custom-Similarity-td4034063.html

sexta-feira, 21 de fevereiro de 2014

Installing Jekins and the JMeter plugin (Performance plugin)

Jekins
1. The installation procedure can be found at:

https://wiki.jenkins-ci.org/display/JENKINS/Installing+Jenkins+on+Red+Hat+distributions

2. Check if it is working:

[root@spike ~]# sudo service jenkins start
Starting Jenkins [ OK ]

The logs of Jekins run will be placed at:

/var/log/jenkins/jenkins.log

Find something like:

INFO: Jenkins is fully up and running

3. Accessing Jekins:

The general information can be found on:
https://wiki.jenkins-ci.org/display/JENKINS/Starting+and+Accessing+Jenkins
However, it is necessary to create a tunnel in the port 8080.

ssh -f root@spike -L 8080:127.0.0.1:8080 -N

Jekins must be available at:
http://127.0.0.1:8080/

JMeter plugins

The Jekins pages recommends using the Performance plugin instead of JMeter plugin.

https://wiki.jenkins-ci.org/display/JENKINS/Performance+Plugin

The easiest way to install Jekins is by the Plugin Manager interface: http://127.0.0.1:8080/pluginManager/
Go in the tab "Available" and find "Performance plugin" (figure below)
Go in "Download now and install after restart"
After finish the donwload, restart jekins service: "sudo service jenkins restart"
Install JMeter
1. Download it:
  wget http://apache.mirror.iphh.net//jmeter/binaries/apache-jmeter-2.11.tgz
2. Extract the files to :/var/lib/jenkins
3. Rename (mv) apache-jmeter-2.11 as jmeter
To check if it is installed it is necessary to create a new job.
1. New Job
2. Free-style software project
3. Under Build, select Invoke Ant
4. Select Advanced
5. Target : all
  
  Path Build : /var/lib/jenkins/jobs/MY_PROJECT_NAME/build.xml
  
  Properties : jmeter.dir=/var/lib/jenkins/jmeter (reference to step 5.c)
6. Down in Post-build Actions check the Publish Performance test result report checkbox
7. Add a new report box and choose JMeter
8. Set the branch to build to JMeter
  
  For Report files specify **/*.jtl
Add the build.xml at the root of the job folder (/var/lib/jenkins/jobs/MY_PROJECT_NAME/). At the end I added a build.xml file.
Under the job folder (/var/lib/jenkins/jobs/MY_PROJECT_NAME/), create the following directory structure (the workspace folder and all it subfolder have to be in chmod 777 "chmod -R 0777 ./*").
+scripts
+workspace
++results
++++jtl
++++html
In the scripts folder upload the JMeter scripts files *.jmx

Usefull references

http://www.theserverlabs.com/blog/2009/04/23/performance-tests-with-jmeter-maven-and-hudson/
http://jlorenzen.blogspot.de/2008/03/automated-performance-tests-using.html
http://neyto.blogspot.de/2013/02/install-jenkins-with-jmeter-performance_2208.html

Build xml:




    name="jmeter"
    classname="org.programmerplanet.ant.taskdefs.jmeter.JMeterTask"
    classpathref="ant.jmeter.classpath" >






             jmeterhome="${jmeter-home}"
         resultlogdir="workspace/results/jtl">






            classpathref="xslt.classpath"
        basedir="workspace/results/jtl"
        destdir="workspace/results/html"
        includes="*.jtl"
        style="${jmeter-home}/extras/jmeter-results-detail-report_21.xsl">

terça-feira, 11 de fevereiro de 2014

Ótimo Webinar sobre Elastic Search 0.90.

http://info.elasticsearch.com/Recorded_0.90_Webinar.html

Como tornar-se um expert em Garbage Collector:

http://www.cubrid.org/blog/dev-platform/understanding-java-garbage-collection/
http://www.cubrid.org/blog/dev-platform/how-to-monitor-java-garbage-collection/
http://www.cubrid.org/blog/textyle/428187

Esses 3 artigos sao muito interessantes e completos.