Como NÃO fazer
pesquisas usando LIKE
Fabio Akita @akitaonrails
CODE MINER
Search está em todos
os lugares
SELECT * FROM PRODUCTS
WHERE NAME LIKE '%Camisetas%'
AND DESCRIPTION LIKE '%Camisetas%'
AND NAME NOT LIKE '%Calças%'
AND DESCRIPTION NOT LIKE '%Calças%'
Camisetas
INDEX SEEK
Rápido
Camisetas
INDEX SEEK
Rápido
Camisetas%
INDEX SCAN
Quase Rápido
Camisetas
INDEX SEEK
Rápido
Camisetas%
INDEX SCAN
Quase Rápido
%Camisetas%
TABLE SCAN
Indo pra trás
Índices não vão te
ajudar
WordPress
wp-includes/taxonomy.php (1256 até 1545)
<?php
function get_terms($taxonomies, $args = '') {
...
if ( !empty($name__like) ) {
$name__like = like_escape( $name__like );
$where .= $wpdb->prepare( " AND t.name LIKE %s",
'%' . $name__like . '%' );
}
if ( ! empty( $description__like ) ) {
$description__like = like_escape( $description__like );
$where .= $wpdb->prepare( " AND tt.description LIKE %s",
'%' . $description__like . '%' );
}
...
if ( ! empty( $search ) ) {
$search = like_escape( $search );
$where .= $wpdb->prepare( ' AND ((t.name LIKE %s) OR (t.slug LIKE %s))',
'%' . $search . '%', '%' . $search . '%' );
}
...
}
?>
Magento
AbstractHelper.php
<?php
public function getCILike($field, $value, $options = array())
{
$quotedField = $this->_getReadAdapter()->quoteIdentifier($field);
return new \Zend_Db_Expr($quotedField . ' LIKE ' .
$this->addLikeEscape($value, $options));
}
?>
Rankeamento, Relevância
Rankeamento, Relevância
Frases, Proximidade, Intervalos
Rankeamento, Relevância
Frases, Proximidade, Intervalos
Sinônimos, "Stemmer"
Rankeamento, Relevância
Frases, Proximidade, Intervalos
Sinônimos, "Stemmer"
“More Like This"
Rankeamento, Relevância
Frases, Proximidade, Intervalos
Sinônimos, "Stemmer"
“More Like This"
“Did you mean …?"
Rankeamento, Relevância
Frases, Proximidade, Intervalos
Sinônimos, "Stemmer"
“More Like This"
“Did you mean …?"
Faceting (Terms, Geolocation, etc)
Pesquisa Não-Estruturada
Pesquisa Não-Estruturada
Sugestões
Pesquisa Não-Estruturada
Sugestões
Ordenação
Pesquisa Não-Estruturada
Sugestões
Ordenação
Terms Facet
Pesquisa Não-Estruturada
Sugestões
Ordenação
Terms Facet
Agregação
Pesquisa Não-Estruturada
Sugestões
Ordenação
Terms Facet
Agregação
Paginação
SELECT * FROM PRODUCTS
WHERE MATCH (NAME, DESCRIPTION)
AGAINST ('+Camisetas -Calças'
IN BOOLEAN MODE)
Magento
CatalogSearch/Model/Resource/Helper.php
<?php
public function chooseFulltext($table, $alias, $select)
{
$field = new \Zend_Db_Expr(
'MATCH (' . $alias . '.data_index) AGAINST (:query IN BOOLEAN MODE)');
$select->columns(array('relevance' => $field));
return $field;
}
?>
SELECT * FROM PRODUCTS
WHERE CONTAINS( (NAME, DESCRIPTION),
'Camisetas AND NOT Calças')
SELECT * FROM PRODUCTS
WHERE
TO_TSVECTOR(NAME || '' || DESCRIPTION)
@@ TO_TSQUERY('Camisetas &! Calças')
Cadeias de Markov
Cadeias de Markov
Índices Invertidos
Cadeias de Markov
Índices Invertidos
Vector Space Model
Cadeias de Markov
Índices Invertidos
Vector Space Model
Okapi BM25
Vector Space Model
http://u.akita.ws/vsm_example (Exemplo Simplificado)
d1
“new york times"
d2
“new york post"
d3
“los angeles times"
angeles
log2(3/1)=1.584
los
log2(3/1)=1.584
new
log2(3/2)=0.584
post
log2(3/1)=1.584
times
log2(3/2)=0.584
york
log2(3/2)=0.584
angeles
los
new
post
times
york
d1
0
0
1
0
1
1
d2
0
0
1
1
0
1
d3
1
1
0
0
1
0
angeles
los
new
post
times
york
d1
0
0
0.584
0
0.584
0.584
d2
0
0
0.584
1.584
0
0.584
d3
1.584
1.584
0
0
0.584
0
angeles los
q
0
0
new
(2/2)*0.584=
0.584
post
times
york
0
(1/2)*0.584=
0.292
0
q = “new new times"
Distância d1
sqrt(0.584^2+0.584^2+0.584^2)
1.011
Distância d2
sqrt(0.584^2+1.584^2+0.584^2)
1.786
Distância d3
sqrt(1.584^2+1.584^2+0.584^2)
2.316
Distância q
sqrt(0.584^2+0.292^2)
0.652
(0*0+0*0+0.584*0.584+0*0+0.584*0.292+0.584*0) /
cosSim(d1,q)
(1.011*0.652)
0.776
(0*0+0*0+0.584*0.584+1.584*0+0*0.292+0.584*0) /
cosSim(d2,q)
(1.786*0.652)
0.292
(1.584*0+1.584*0+0*0.584+0*0+0.584*0.292+0*0) /
(2.316*0.652)
0.112
cosSim(d3,q)
Douglass
Cutting
Lucene
Nutch
Hadoop
!
Tika
Solr
ElasticSearch
150GB/hora
20%-30% tamanho do índice
Apache Lucene
HTML, XHTML, OOXML, ODF, XML, RSS, OLE2,
iWorks (Pages, Numbers, Keynote), PDF, EPUB,
RTF, Commons Compress (ar, cpio, Unix dump,
tar, zip, gzip, XZ, Pack200, bzip2, 7z, arj e lzma),
Audio (javax.sound, MIDI, Mp3), Image
(javax.imageio, Tiff, Jpeg), Video (FLV, Flash),
Mail (Mbox, RFC822), DWG, Font (TrueType),
HDF, e plugins.
InputStream is = new BufferedInputStream(
new FileInputStream(
new File("sample.pdf")));
!
Parser parser = new AutoDetectParser();
ContentHandler handler = new BodyContentHandler(
System.out);
!
Metadata metadata = new Metadata();
!
parser.parse(is, handler, metadata,
new ParseContext());
!
for (String name : metadata.names()) {
String value = metadata.get(name);
!
if (value != null) {
System.out.println("Metadata Name: " + name);
System.out.println("Metadata Value: " + value);
}
}
http://localhost:8983/solr/query?q=title:black
http://localhost:8983/solr/query?
q=*:*
&fl=id,title,series_s,pubyear_i
&sort=pubyear_i desc
&group=true
&group.main=true
&group.field=series_s
&facet=true
&facet.field=cat
curl "http://localhost:8983/solr/update/extract?
literal.id=doc5&defaultField=text”
--data-binary @tutorial.html
-H 'Content-type:text/html'
Solr
ElasticSearch
Coordination
Solr
ElasticSearch
ZooKeeper
Zen Discovery
Solr
ElasticSearch
Coordination
ZooKeeper
Zen Discovery
Shard Splitting
Sim
Não
Solr
ElasticSearch
Coordination
ZooKeeper
Zen Discovery
Shard Splitting
Sim
Não
Automatic Shard
Rebalancing
Não
Sim
Solr
ElasticSearch
Coordination
ZooKeeper
Zen Discovery
Shard Splitting
Sim
Não
Automatic Shard
Rebalancing
Não
Sim
Schema
+/-
Sim
Solr
ElasticSearch
Coordination
ZooKeeper
Zen Discovery
Shard Splitting
Sim
Não
Automatic Shard
Rebalancing
Não
Sim
Schema
+/-
Sim
Nested Typing
Não
Sim
Solr
ElasticSearch
Coordination
ZooKeeper
Zen Discovery
Shard Splitting
Sim
Não
Automatic Shard
Rebalancing
Não
Sim
Schema
+/-
Sim
Nested Typing
Não
Sim
Queries
Key / Value
JSON
Solr
ElasticSearch
Coordination
ZooKeeper
Zen Discovery
Shard Splitting
Sim
Não
Automatic Shard
Rebalancing
Não
Sim
Schema
+/-
Sim
Nested Typing
Não
Sim
Queries
Key / Value
JSON
Distributed Group By
Sim
Não
Solr
ElasticSearch
Coordination
ZooKeeper
Zen Discovery
Shard Splitting
Sim
Não
Automatic Shard
Rebalancing
Não
Sim
Schema
+/-
Sim
Nested Typing
Não
Sim
Queries
Key / Value
JSON
Distributed Group By
Sim
Não
Percolation Queries
Não
Sim
Setup
cd ~
sudo apt-get update
sudo apt-get install openjdk-7-jre-headless -y
### http://www.elasticsearch.org/download/
wget https://download.elasticsearch.org/elasticsearch/elasticsearch/
elasticsearch-0.90.7.deb
sudo dpkg -i elasticsearch-0.90.7.deb
sudo service elasticsearch start
Setup
# Bonsai
heroku addons:add bonsai
heroku config:add ELASTICSEARCH_URL=`heroku config:get BONSAI_URL`
!
# Found
heroku addons:add foundelasticsearch
heroku config:add ELASTICSEARCH_URL=`heroku config:get FOUNDELASTICSEARCH_URL`
!
# SearchBox
heroku addons:add searchbox:starter
heroku config:add ELASTICSEARCH_URL=`heroku config:get SEARCHBOX_URL`
!
# reindex
heroku run rake searchkick:reindex CLASS=Product
Setup
# Gemfile - bundle install
gem "searchkick"
!
# app/models/product.rb
class Product < ActiveRecord::Base
searchkick
end
!
# config/initializers/elasticsearch.rb
ENV["ELASTICSEARCH_URL"] = "http://username:[email protected]"
!
# no shell
rails r "Product.reindex"
# Search simples
products = Product.search "Camisetas"
products.each do |product|
puts product.name
end
# Search simples
products = Product.search "Camisetas"
products.each do |product|
puts product.name
end
# Search com campos
Product.search "Camisetas",
fields: [:name, :description]
where: {
in_stock: true,
expires_at: {gt: 1.week.from_now},
or: [
[{in_stock: true}, {backordered: true}]
]
},
order: {_score: :desc}, # relevant first
limit: 10, offset: 50
# , page: params[:page], per_page: 20
# Sinonimos
class Product < ActiveRecord::Base
searchkick synonyms: [
["pc", "computador pessoal"],
["word", "microsoft office"]
]
end
# Sinonimos
class Product < ActiveRecord::Base
searchkick synonyms: [
["pc", "computador pessoal"],
["word", "microsoft office"]
]
end
# Sugestões
class Product < ActiveRecord::Base
searchkick suggest: ["name"]
end
!
products = Product.search "coldminer ", suggest: true
products.suggestions # ["codeminer"]
class City < ActiveRecord::Base
searchkick autocomplete: ["name"]
end
!
City.search "Sao P", autocomplete: true
# app/controllers/cities_controller.rb
class CitiesController < ApplicationController
def autocomplete
render json: City.search(params[:query],
autocomplete: true,
limit: 10).map(&:name)
end
end
# app/controllers/cities_controller.rb
class CitiesController < ApplicationController
def autocomplete
render json: City.search(params[:query],
autocomplete: true,
limit: 10).map(&:name)
end
end
# partial
<input type="text" id="query" name="query" />
!
<script src="jquery.js"></script>
<script src="typeahead.js"></script>
<script>
$("#query").typeahead({
name: "city",
remote: "/cities/autocomplete?query=%QUERY"
});
</script>
products = Product.search "GPS",
facets: [:type, :brand, :screen_size]
puts products.facets
class City < ActiveRecord::Base
searchkick locations: ["location"]
!
def search_data
attributes.merge location: [latitude, longitude]
end
end
!
City.search "Codemi",
where: {
location: {near: [-23, -46],
within: "10mi" }
} # ou 16km
Próximos Capítulos
SELECT … LIKE ‘%'
SELECT … LIKE ‘%'
OBRIGADO!
slideshare.net/akitaonrails
codeminer42.com
@akitaonrails
Download

end - Cloudfront.net