Como NÃO fazer pesquisas usando LIKE Fabio Akita @akitaonrails CODE MINER Search está em todos os lugares SELECT * FROM PRODUCTS WHERE NAME LIKE '%Camisetas%' AND DESCRIPTION LIKE '%Camisetas%' AND NAME NOT LIKE '%Calças%' AND DESCRIPTION NOT LIKE '%Calças%' Camisetas INDEX SEEK Rápido Camisetas INDEX SEEK Rápido Camisetas% INDEX SCAN Quase Rápido Camisetas INDEX SEEK Rápido Camisetas% INDEX SCAN Quase Rápido %Camisetas% TABLE SCAN Indo pra trás Índices não vão te ajudar WordPress wp-includes/taxonomy.php (1256 até 1545) <?php function get_terms($taxonomies, $args = '') { ... if ( !empty($name__like) ) { $name__like = like_escape( $name__like ); $where .= $wpdb->prepare( " AND t.name LIKE %s", '%' . $name__like . '%' ); } if ( ! empty( $description__like ) ) { $description__like = like_escape( $description__like ); $where .= $wpdb->prepare( " AND tt.description LIKE %s", '%' . $description__like . '%' ); } ... if ( ! empty( $search ) ) { $search = like_escape( $search ); $where .= $wpdb->prepare( ' AND ((t.name LIKE %s) OR (t.slug LIKE %s))', '%' . $search . '%', '%' . $search . '%' ); } ... } ?> Magento AbstractHelper.php <?php public function getCILike($field, $value, $options = array()) { $quotedField = $this->_getReadAdapter()->quoteIdentifier($field); return new \Zend_Db_Expr($quotedField . ' LIKE ' . $this->addLikeEscape($value, $options)); } ?> Rankeamento, Relevância Rankeamento, Relevância Frases, Proximidade, Intervalos Rankeamento, Relevância Frases, Proximidade, Intervalos Sinônimos, "Stemmer" Rankeamento, Relevância Frases, Proximidade, Intervalos Sinônimos, "Stemmer" “More Like This" Rankeamento, Relevância Frases, Proximidade, Intervalos Sinônimos, "Stemmer" “More Like This" “Did you mean …?" Rankeamento, Relevância Frases, Proximidade, Intervalos Sinônimos, "Stemmer" “More Like This" “Did you mean …?" Faceting (Terms, Geolocation, etc) Pesquisa Não-Estruturada Pesquisa Não-Estruturada Sugestões Pesquisa Não-Estruturada Sugestões Ordenação Pesquisa Não-Estruturada Sugestões Ordenação Terms Facet Pesquisa Não-Estruturada Sugestões Ordenação Terms Facet Agregação Pesquisa Não-Estruturada Sugestões Ordenação Terms Facet Agregação Paginação SELECT * FROM PRODUCTS WHERE MATCH (NAME, DESCRIPTION) AGAINST ('+Camisetas -Calças' IN BOOLEAN MODE) Magento CatalogSearch/Model/Resource/Helper.php <?php public function chooseFulltext($table, $alias, $select) { $field = new \Zend_Db_Expr( 'MATCH (' . $alias . '.data_index) AGAINST (:query IN BOOLEAN MODE)'); $select->columns(array('relevance' => $field)); return $field; } ?> SELECT * FROM PRODUCTS WHERE CONTAINS( (NAME, DESCRIPTION), 'Camisetas AND NOT Calças') SELECT * FROM PRODUCTS WHERE TO_TSVECTOR(NAME || '' || DESCRIPTION) @@ TO_TSQUERY('Camisetas &! Calças') Cadeias de Markov Cadeias de Markov Índices Invertidos Cadeias de Markov Índices Invertidos Vector Space Model Cadeias de Markov Índices Invertidos Vector Space Model Okapi BM25 Vector Space Model http://u.akita.ws/vsm_example (Exemplo Simplificado) d1 “new york times" d2 “new york post" d3 “los angeles times" angeles log2(3/1)=1.584 los log2(3/1)=1.584 new log2(3/2)=0.584 post log2(3/1)=1.584 times log2(3/2)=0.584 york log2(3/2)=0.584 angeles los new post times york d1 0 0 1 0 1 1 d2 0 0 1 1 0 1 d3 1 1 0 0 1 0 angeles los new post times york d1 0 0 0.584 0 0.584 0.584 d2 0 0 0.584 1.584 0 0.584 d3 1.584 1.584 0 0 0.584 0 angeles los q 0 0 new (2/2)*0.584= 0.584 post times york 0 (1/2)*0.584= 0.292 0 q = “new new times" Distância d1 sqrt(0.584^2+0.584^2+0.584^2) 1.011 Distância d2 sqrt(0.584^2+1.584^2+0.584^2) 1.786 Distância d3 sqrt(1.584^2+1.584^2+0.584^2) 2.316 Distância q sqrt(0.584^2+0.292^2) 0.652 (0*0+0*0+0.584*0.584+0*0+0.584*0.292+0.584*0) / cosSim(d1,q) (1.011*0.652) 0.776 (0*0+0*0+0.584*0.584+1.584*0+0*0.292+0.584*0) / cosSim(d2,q) (1.786*0.652) 0.292 (1.584*0+1.584*0+0*0.584+0*0+0.584*0.292+0*0) / (2.316*0.652) 0.112 cosSim(d3,q) Douglass Cutting Lucene Nutch Hadoop ! Tika Solr ElasticSearch 150GB/hora 20%-30% tamanho do índice Apache Lucene HTML, XHTML, OOXML, ODF, XML, RSS, OLE2, iWorks (Pages, Numbers, Keynote), PDF, EPUB, RTF, Commons Compress (ar, cpio, Unix dump, tar, zip, gzip, XZ, Pack200, bzip2, 7z, arj e lzma), Audio (javax.sound, MIDI, Mp3), Image (javax.imageio, Tiff, Jpeg), Video (FLV, Flash), Mail (Mbox, RFC822), DWG, Font (TrueType), HDF, e plugins. InputStream is = new BufferedInputStream( new FileInputStream( new File("sample.pdf"))); ! Parser parser = new AutoDetectParser(); ContentHandler handler = new BodyContentHandler( System.out); ! Metadata metadata = new Metadata(); ! parser.parse(is, handler, metadata, new ParseContext()); ! for (String name : metadata.names()) { String value = metadata.get(name); ! if (value != null) { System.out.println("Metadata Name: " + name); System.out.println("Metadata Value: " + value); } } http://localhost:8983/solr/query?q=title:black http://localhost:8983/solr/query? q=*:* &fl=id,title,series_s,pubyear_i &sort=pubyear_i desc &group=true &group.main=true &group.field=series_s &facet=true &facet.field=cat curl "http://localhost:8983/solr/update/extract? literal.id=doc5&defaultField=text” --data-binary @tutorial.html -H 'Content-type:text/html' Solr ElasticSearch Coordination Solr ElasticSearch ZooKeeper Zen Discovery Solr ElasticSearch Coordination ZooKeeper Zen Discovery Shard Splitting Sim Não Solr ElasticSearch Coordination ZooKeeper Zen Discovery Shard Splitting Sim Não Automatic Shard Rebalancing Não Sim Solr ElasticSearch Coordination ZooKeeper Zen Discovery Shard Splitting Sim Não Automatic Shard Rebalancing Não Sim Schema +/- Sim Solr ElasticSearch Coordination ZooKeeper Zen Discovery Shard Splitting Sim Não Automatic Shard Rebalancing Não Sim Schema +/- Sim Nested Typing Não Sim Solr ElasticSearch Coordination ZooKeeper Zen Discovery Shard Splitting Sim Não Automatic Shard Rebalancing Não Sim Schema +/- Sim Nested Typing Não Sim Queries Key / Value JSON Solr ElasticSearch Coordination ZooKeeper Zen Discovery Shard Splitting Sim Não Automatic Shard Rebalancing Não Sim Schema +/- Sim Nested Typing Não Sim Queries Key / Value JSON Distributed Group By Sim Não Solr ElasticSearch Coordination ZooKeeper Zen Discovery Shard Splitting Sim Não Automatic Shard Rebalancing Não Sim Schema +/- Sim Nested Typing Não Sim Queries Key / Value JSON Distributed Group By Sim Não Percolation Queries Não Sim Setup cd ~ sudo apt-get update sudo apt-get install openjdk-7-jre-headless -y ### http://www.elasticsearch.org/download/ wget https://download.elasticsearch.org/elasticsearch/elasticsearch/ elasticsearch-0.90.7.deb sudo dpkg -i elasticsearch-0.90.7.deb sudo service elasticsearch start Setup # Bonsai heroku addons:add bonsai heroku config:add ELASTICSEARCH_URL=`heroku config:get BONSAI_URL` ! # Found heroku addons:add foundelasticsearch heroku config:add ELASTICSEARCH_URL=`heroku config:get FOUNDELASTICSEARCH_URL` ! # SearchBox heroku addons:add searchbox:starter heroku config:add ELASTICSEARCH_URL=`heroku config:get SEARCHBOX_URL` ! # reindex heroku run rake searchkick:reindex CLASS=Product Setup # Gemfile - bundle install gem "searchkick" ! # app/models/product.rb class Product < ActiveRecord::Base searchkick end ! # config/initializers/elasticsearch.rb ENV["ELASTICSEARCH_URL"] = "http://username:[email protected]" ! # no shell rails r "Product.reindex" # Search simples products = Product.search "Camisetas" products.each do |product| puts product.name end # Search simples products = Product.search "Camisetas" products.each do |product| puts product.name end # Search com campos Product.search "Camisetas", fields: [:name, :description] where: { in_stock: true, expires_at: {gt: 1.week.from_now}, or: [ [{in_stock: true}, {backordered: true}] ] }, order: {_score: :desc}, # relevant first limit: 10, offset: 50 # , page: params[:page], per_page: 20 # Sinonimos class Product < ActiveRecord::Base searchkick synonyms: [ ["pc", "computador pessoal"], ["word", "microsoft office"] ] end # Sinonimos class Product < ActiveRecord::Base searchkick synonyms: [ ["pc", "computador pessoal"], ["word", "microsoft office"] ] end # Sugestões class Product < ActiveRecord::Base searchkick suggest: ["name"] end ! products = Product.search "coldminer ", suggest: true products.suggestions # ["codeminer"] class City < ActiveRecord::Base searchkick autocomplete: ["name"] end ! City.search "Sao P", autocomplete: true # app/controllers/cities_controller.rb class CitiesController < ApplicationController def autocomplete render json: City.search(params[:query], autocomplete: true, limit: 10).map(&:name) end end # app/controllers/cities_controller.rb class CitiesController < ApplicationController def autocomplete render json: City.search(params[:query], autocomplete: true, limit: 10).map(&:name) end end # partial <input type="text" id="query" name="query" /> ! <script src="jquery.js"></script> <script src="typeahead.js"></script> <script> $("#query").typeahead({ name: "city", remote: "/cities/autocomplete?query=%QUERY" }); </script> products = Product.search "GPS", facets: [:type, :brand, :screen_size] puts products.facets class City < ActiveRecord::Base searchkick locations: ["location"] ! def search_data attributes.merge location: [latitude, longitude] end end ! City.search "Codemi", where: { location: {near: [-23, -46], within: "10mi" } } # ou 16km Próximos Capítulos SELECT … LIKE ‘%' SELECT … LIKE ‘%' OBRIGADO! slideshare.net/akitaonrails codeminer42.com @akitaonrails