ElasticSearch

  • use cases
    • log analysis
    • search documents
    • .
  • Cluster configuration

    • number of instances and types
    • possible to use dedicated master nodes (yes/no), more stable.
      • recommended 3 per cluster. don't hold data. they are used as backups
    • can be distributed across AZs.
  • Primary shard and Replica shards can be split across different AZs for reliability

  • Loading data using lambda:

    • post data in cloudwatch logs
    • use lambda to load from cloudwatch to firehose
    • lambda decorator processes firehose with additional data (e.g. when processing VPC flow logs)
      • e.g. adding country by querying service for geoip,...
    • then firehose sends data to elasticsearch.

RStudio

  • open source IDE for R
  • RStudio Server integrates with Athena, RDS, Redshift, ...
    • needs to install R packages first
  • run RStudio in EC2

AWS Elasticsearch

  • Amazon Elasticsearch Service is a managed service that makes it easy to deploy, operate, and scale Elasticsearch clusters in the AWS Cloud.
  • Elasticsearch is a popular open-source search and analytics engine for use cases such as log analytics, real-time application monitoring, and clickstream analytics
  • Elasticsearch provides
    • real-time, distributed search and analytics engine
    • ability to provision all the resources for Elasticsearch cluster and launches the cluster
    • easy to use cluster scaling options
    • provides self-healing clusters, which automatically detects and replaces failed Elasticsearch nodes, reducing the overhead associated with self-managed infrastructures
    • domain snapshots to back up and restore ES domains and replicate domains across AZs
    • data durability
    • enhanced security with IAM access control
    • node monitoring
    • multiple configurations of CPU, memory, and storage capacity, known as instance types
    • storage volumes for the data using EBS volumes
    • Multiple geographical locations for your resources, known as regions and Availability Zones
    • ability to span cluster nodes across two AZs in the same region, known as zone awareness, for high availability and redundancy
    • dedicated master nodes to improve cluster stability
    • data visualization using the Kibana tool
    • integration with CloudWatch for monitoring ES domain metrics
    • integration with CloudTrail for auditing configuration API calls to ES domains
    • integration with S3, Kinesis, and DynamoDB for loading streaming data
    • ability to handle structured and Unstructured data
    • HTTP Rest APIs

results matching ""

    No results matching ""